[MAINT] Network Work on Code Sunday, Oct 1, 1-4pm EDT

GWB gwb at 2realms.com
Fri Oct 6 01:38:02 EDT 2017


Derek,

Yep, you covered it.  Wireshark does whatever tcpdump does, but
probably in a that's more friendly to log analyzers and guis.  The out
of sync time stamps or time zones or locale error is a long shot, and
probably not worth pursuing.  The tcp protocol measures Round Trip
Time and re-transmits when packets are dropped.  A router also MAY
drop packets with an out of sequence timestamp.  Here's a better
explanation than I can give:

https://serverfault.com/questions/307376/many-different-tcp-timestamps-through-nat-device-cause-server-to-drop-packets-p

Time stamps that are out of sequence behind a NAT MIGHT be dropped
because the router assumes they already went through (yes, it's way
more complicated than that, but that's all I'm getting out of the
explanation).  Here's the command line used from that URL to fix the
problem from the client side:

sysctl -w net.ipv4.tcp_tw_recycle=0

But, of course, don't try it unless you know it wont do any damage to
the things that are working.  However, if you're getting duplicate and
out of order packets, maybe it would not hurt.

Actually, it makes me wonder if there is some way to measure this in
IPv6, and see if the same results occur.  Of course, disabling IPv4
probably also disables any useful service you might have running, so
maybe don't try that (but hey, if you do, take video!).

I also cannot remember if time stamp bugs were mostly theoretical or
real.  They can be intentionally invoked on some devices, but I don't
remember which kinds.

The Edgerouter Pro should have an iptables firewall, and my guess is
that you can look at it with a VM (Dimension VM, or something like
that).  I gave up on using the router as anything other than a minimal
firewall some time ago.  But most people can make that work, and
router processor capacity gets better and better over time.  /48 looks
familiar, but that must be a different notation than IP octal.  /32
gives you exactly one host (2 to the 0=1, which is the 0 bit).  /31
gives you zero hosts (intuitively, 2 to the 1 should be 2 hosts, but
this isn't the way it works; 0 bit gives one host, 1 bit gives zero).
You could try "tightening up" your subnet masks on each segment of the
LAN (i.e., a netmask of 255.255.255.192 would be a /26, or 6 bit, or
64 IP addresses; 255.255.255.224 is a 5 bit, or /27, 32 IP addresses).
But that could break something else if the router thinks it has 256 IP
addresses.

Gordon


On Wed, Oct 4, 2017 at 7:22 PM, Derek Atkins <derek at ihtfp.com> wrote:
> Hi,
>
> On Wed, October 4, 2017 1:14 am, GWB wrote:
>> I agree that the problem developing at midnight is too much of a
>> coincidence.  If there are logs, I would try to read them in
>> GMT/UTC+/-.
>
> I'm not 100% sure it was at midnight; I was basing that on my memory of an
> MRTG weekly graph which is no longer available.  It could have been +/- an
> hour from midgnight.  But I don't have packet logs, I only have standard
> syslog logs.  I did look through some of them but didn't see anything
> suspicious.  Alas, I deleted my logwatch output, and my deleted mail gets
> flushed after a week... and it had been more than a week.  :(
>
>>    A /29 octal would be 6 unique IPv4 static IP addresses,
>> correct?  So is there a host with a firewall on one of them?  If that
>> were a bsd system, something like:
>
> The network looks like:
>
>   Arris <-> Edgerouter Pro 8 <-> Switch <-> [Internal Network]
>
> The Arris provides a /29; the Edgerouter is sitting on one of those IPs.
> Then I have a class-C and a /48 routed to the edgerouter via tunnels.
> Code (and a bunch of my servers) sit on those tunneled networks.
>
>> tcpdump -n -e -ttt -r /var/log/pflog port 80 | grep 2017-09-31- | less
>>
>> (or whatever date time format the OS has for pflog/tcpdump)
>
> There is no pflog, AFAIK.
>
>> would show what the firewall would have seen at that terrible midnight
>> hour when things went bad.  But I am assuming that when time stamps
>> get out of sync between routers packets will drop.  I don't know if
>> that is necessarily the case.
>>
>> So yes, if AT&T has a router upstream that decided something strange
>> about GMT/UTC (or LOCALE), then who knows, droppage could occur.
>
> Huh?  I'm not sure I understand this logic?
>
>> If this is a Debian system, then tcpdump probably has a nice front end
>> and a gui, so even easier with whatever firewall it has (ipfilter?
>> netfilter?).
>>
>> Or, perhaps put Kali Linux on one of the vm's, and point it toward
>> whatever traceroute tells you is ATT's routers.  Then try the same
>> with Kali Linux running as a vm on a laptop from outside your network.
>
> I've been sitting running wireshark all day and testing my VMs.  I've
> narrowed the issue down to two specific VMs.  When either (or both) of
> these VMs are running, I see the problem.  If neither is running, then the
> network seems to (mostly) behave itself.
>
> Through the hours of packet dumps I've gone through, I've noticed a very
> high correlation between the network wonkiness and duplicate/out-of-order
> TCP packets.  Of course, determining the cause of the duplicate packets is
> yet another issue.
>
> I've got everything up right now, but expect more downtime as I debug.
> I'm even considering rebuilding the VMs and restoring from backup to see
> if that will help!
>
> -derek
>
>> Gordon
>>
>> On Tue, Oct 3, 2017 at 12:31 PM, David Carlson
>> <david.carlson.417 at gmail.com> wrote:
>>> Derek,
>>>
>>> Is it possible to read event logs and determine that some pieces of
>>> equipment are definitely not causing the problem?
>>>
>>> David C
>>>
>>> On Oct 3, 2017 10:04 AM, "Derek Atkins" <warlord at mit.edu> wrote:
>>>>
>>>> GWB <gwb at 2realms.com> writes:
>>>>
>>>> > So AT&T's network equipment works when nothing is connected to it?
>>>> > Clearly this is a success story.  If the tech had been allowed to try
>>>> > the new modem, that would have greatly sped up the troubleshooting
>>>> > process.  If it had worked as before, then problem solved.  If not,
>>>> > then you could debug your network topography on your side.
>>>>
>>>> Well, it works when I'm just connected with a laptop.  That proves (to
>>>> them) that it's not their equipment.  It could be my equipment.  It
>>>> could be a bad interaction between..  It's very hard to say.
>>>>
>>>> > That's actually more frustrating than I thought.  Maybe another phone
>>>> > call to ask them to just please try a different modem before you
>>>> spend
>>>> > all that time chasing a problem that might not exist.
>>>>
>>>> The problem is my static IPs.  I can't just replace the modem.  And I
>>>> can't just "take a random IP address" because it requires coordination
>>>> to also migrate my tunnels, which is how code.gnucash.org gets its IP
>>>> address.
>>>>
>>>> > I have managed to keep the static (fixed) IP I have, but not without
>>>> a
>>>> > struggle.  The solution is sometimes just having a "dumb" modem
>>>> > connected to the LAN with the switching equipment at the telecomm set
>>>> > to assign a fixed IP.  Sometimes I have to lease a fixed IP from a
>>>> > different provider who then contracts with a local telecomm.  In any
>>>> > event, it's worth it, and necessary.
>>>>
>>>> I have a /29 from them.
>>>>
>>>> > Check around.  Anywhere from Cambridge to Dedham should be any number
>>>> > of telecomm trunk leasing outfits.
>>>>
>>>> Alas, Cambridge is about 1000 miles from here.  While I DID live in
>>>> Somerville for ~14 years, I'm now much further south ;)
>>>>
>>>> > Gordon
>>>>
>>>> -derek
>>>>
>>>> PS: There is still a SMALL chance that it's not my VM system..  I'm
>>>> considering taking it offline again this afternoon to test it again.
>>>> --
>>>>        Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
>>>>        Member, MIT Student Information Processing Board  (SIPB)
>>>>        URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
>>>>        warlord at MIT.EDU                        PGP key available
>>
>
>
> --
>        Derek Atkins                 617-623-3745
>        derek at ihtfp.com             www.ihtfp.com
>        Computer and Internet Security Consultant
>


More information about the gnucash-user mailing list