[MAINT] Network Work on Code Sunday, Oct 1, 1-4pm EDT

Derek Atkins derek at ihtfp.com
Wed Oct 4 20:22:52 EDT 2017


Hi,

On Wed, October 4, 2017 1:14 am, GWB wrote:
> I agree that the problem developing at midnight is too much of a
> coincidence.  If there are logs, I would try to read them in
> GMT/UTC+/-.

I'm not 100% sure it was at midnight; I was basing that on my memory of an
MRTG weekly graph which is no longer available.  It could have been +/- an
hour from midgnight.  But I don't have packet logs, I only have standard
syslog logs.  I did look through some of them but didn't see anything
suspicious.  Alas, I deleted my logwatch output, and my deleted mail gets
flushed after a week... and it had been more than a week.  :(

>    A /29 octal would be 6 unique IPv4 static IP addresses,
> correct?  So is there a host with a firewall on one of them?  If that
> were a bsd system, something like:

The network looks like:

  Arris <-> Edgerouter Pro 8 <-> Switch <-> [Internal Network]

The Arris provides a /29; the Edgerouter is sitting on one of those IPs. 
Then I have a class-C and a /48 routed to the edgerouter via tunnels. 
Code (and a bunch of my servers) sit on those tunneled networks.

> tcpdump -n -e -ttt -r /var/log/pflog port 80 | grep 2017-09-31- | less
>
> (or whatever date time format the OS has for pflog/tcpdump)

There is no pflog, AFAIK.

> would show what the firewall would have seen at that terrible midnight
> hour when things went bad.  But I am assuming that when time stamps
> get out of sync between routers packets will drop.  I don't know if
> that is necessarily the case.
>
> So yes, if AT&T has a router upstream that decided something strange
> about GMT/UTC (or LOCALE), then who knows, droppage could occur.

Huh?  I'm not sure I understand this logic?

> If this is a Debian system, then tcpdump probably has a nice front end
> and a gui, so even easier with whatever firewall it has (ipfilter?
> netfilter?).
>
> Or, perhaps put Kali Linux on one of the vm's, and point it toward
> whatever traceroute tells you is ATT's routers.  Then try the same
> with Kali Linux running as a vm on a laptop from outside your network.

I've been sitting running wireshark all day and testing my VMs.  I've
narrowed the issue down to two specific VMs.  When either (or both) of
these VMs are running, I see the problem.  If neither is running, then the
network seems to (mostly) behave itself.

Through the hours of packet dumps I've gone through, I've noticed a very
high correlation between the network wonkiness and duplicate/out-of-order
TCP packets.  Of course, determining the cause of the duplicate packets is
yet another issue.

I've got everything up right now, but expect more downtime as I debug. 
I'm even considering rebuilding the VMs and restoring from backup to see
if that will help!

-derek

> Gordon
>
> On Tue, Oct 3, 2017 at 12:31 PM, David Carlson
> <david.carlson.417 at gmail.com> wrote:
>> Derek,
>>
>> Is it possible to read event logs and determine that some pieces of
>> equipment are definitely not causing the problem?
>>
>> David C
>>
>> On Oct 3, 2017 10:04 AM, "Derek Atkins" <warlord at mit.edu> wrote:
>>>
>>> GWB <gwb at 2realms.com> writes:
>>>
>>> > So AT&T's network equipment works when nothing is connected to it?
>>> > Clearly this is a success story.  If the tech had been allowed to try
>>> > the new modem, that would have greatly sped up the troubleshooting
>>> > process.  If it had worked as before, then problem solved.  If not,
>>> > then you could debug your network topography on your side.
>>>
>>> Well, it works when I'm just connected with a laptop.  That proves (to
>>> them) that it's not their equipment.  It could be my equipment.  It
>>> could be a bad interaction between..  It's very hard to say.
>>>
>>> > That's actually more frustrating than I thought.  Maybe another phone
>>> > call to ask them to just please try a different modem before you
>>> spend
>>> > all that time chasing a problem that might not exist.
>>>
>>> The problem is my static IPs.  I can't just replace the modem.  And I
>>> can't just "take a random IP address" because it requires coordination
>>> to also migrate my tunnels, which is how code.gnucash.org gets its IP
>>> address.
>>>
>>> > I have managed to keep the static (fixed) IP I have, but not without
>>> a
>>> > struggle.  The solution is sometimes just having a "dumb" modem
>>> > connected to the LAN with the switching equipment at the telecomm set
>>> > to assign a fixed IP.  Sometimes I have to lease a fixed IP from a
>>> > different provider who then contracts with a local telecomm.  In any
>>> > event, it's worth it, and necessary.
>>>
>>> I have a /29 from them.
>>>
>>> > Check around.  Anywhere from Cambridge to Dedham should be any number
>>> > of telecomm trunk leasing outfits.
>>>
>>> Alas, Cambridge is about 1000 miles from here.  While I DID live in
>>> Somerville for ~14 years, I'm now much further south ;)
>>>
>>> > Gordon
>>>
>>> -derek
>>>
>>> PS: There is still a SMALL chance that it's not my VM system..  I'm
>>> considering taking it offline again this afternoon to test it again.
>>> --
>>>        Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
>>>        Member, MIT Student Information Processing Board  (SIPB)
>>>        URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
>>>        warlord at MIT.EDU                        PGP key available
>


-- 
       Derek Atkins                 617-623-3745
       derek at ihtfp.com             www.ihtfp.com
       Computer and Internet Security Consultant



More information about the gnucash-devel mailing list