Unplanning network maintenance/outage
Ted Creedon
tcreedon at easystreet.net
Sun Mar 17 11:56:37 EDT 2013
Do you need a UPS?
Sounds like a power related problem
tedc
On Sun, Mar 17, 2013 at 5:18 AM, Derek Atkins <warlord at mit.edu> wrote:
> Good morning, GnuCashers,
>
> Some (many?) of you may have noticed the outage of 'code.gnucash.org'
> starting with a lot of packet loss on Thursday and escalating into a
> complete outage by Friday. This took out our Subversion, Wiki, Email
> List, everything server. Well, as of 2:15pm US/EDT on Saturday
> (yesterday) everything should be back to normal and operational. If you
> don't want to hear the gory details of what happened feel free to stop
> reading now.
>
> The issue was multiple simultaneous failures of multiple pieces of
> equipment. What I thought was a power outage turned out be caused by a
> failure in my main network switch. It started dropping ports, or
> causing ports to fail partially (dropping packets). This was also the
> main cause of the packet loss, too. However I didn't discover this
> until later.
>
> My main DHCP server was off the net; I swapped ethernet cables and it
> appeared to fix the problem.
>
> My main database server, however, lost its main network controller so I
> had to install a new one (I have a few on hand, so it was a relatively
> painless operation -- I just had to remember the magic voodoo to get the
> system to call the new card 'eth0', but that was also only a few
> minutes).
>
> It was only after I got this working that I realized that it was the
> switch that had failed -- many of the ports connected to actual hosts
> had a 'dead link'. I also noticed that my main DHCP server was
> bouncing. It would come on the net, stay for a bit, and then go dark.
> Luckily I also had a few extra (smaller) switches lying around so I
> linked a few of them together and moved all the non-working ports over.
> This also fixed the bouncing DHCP server.
>
> Last, but not least, the VM Server Host's network was wedged, requiring
> a complete reboot to reset. This also required resetting all the VMs,
> some of which required a bit of hand-holding to come back (and many of
> which required a virtual disk fsck as well, taking even more time). The
> last of the systems returned to service shortly after 2pm.
>
> I do plan to acquire a new switch to replace the failing one, but what I
> have now is working so I'll watch it closely for now.
>
> Thanks,
>
> -derek
>
> --
> Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
> Member, MIT Student Information Processing Board (SIPB)
> URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
> warlord at MIT.EDU PGP key available
> _______________________________________________
> gnucash-devel mailing list
> gnucash-devel at gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
>
More information about the gnucash-devel
mailing list