Unplanning network maintenance/outage

Ted Creedon tcreedon at easystreet.net
Sun Mar 17 11:56:37 EDT 2013


Do you need a UPS?

Sounds like a power related problem

tedc

On Sun, Mar 17, 2013 at 5:18 AM, Derek Atkins <warlord at mit.edu> wrote:

> Good morning, GnuCashers,
>
> Some (many?) of you may have noticed the outage of 'code.gnucash.org'
> starting with a lot of packet loss on Thursday and escalating into a
> complete outage by Friday.  This took out our Subversion, Wiki, Email
> List, everything server.  Well, as of 2:15pm US/EDT on Saturday
> (yesterday) everything should be back to normal and operational.  If you
> don't want to hear the gory details of what happened feel free to stop
> reading now.
>
> The issue was multiple simultaneous failures of multiple pieces of
> equipment.  What I thought was a power outage turned out be caused by a
> failure in my main network switch.  It started dropping ports, or
> causing ports to fail partially (dropping packets).  This was also the
> main cause of the packet loss, too.  However I didn't discover this
> until later.
>
> My main DHCP server was off the net; I swapped ethernet cables and it
> appeared to fix the problem.
>
> My main database server, however, lost its main network controller so I
> had to install a new one (I have a few on hand, so it was a relatively
> painless operation -- I just had to remember the magic voodoo to get the
> system to call the new card 'eth0', but that was also only a few
> minutes).
>
> It was only after I got this working that I realized that it was the
> switch that had failed -- many of the ports connected to actual hosts
> had a 'dead link'.  I also noticed that my main DHCP server was
> bouncing.  It would come on the net, stay for a bit, and then go dark.
> Luckily I also had a few extra (smaller) switches lying around so I
> linked a few of them together and moved all the non-working ports over.
> This also fixed the bouncing DHCP server.
>
> Last, but not least, the VM Server Host's network was wedged, requiring
> a complete reboot to reset.  This also required resetting all the VMs,
> some of which required a bit of hand-holding to come back (and many of
> which required a virtual disk fsck as well, taking even more time).  The
> last of the systems returned to service shortly after 2pm.
>
> I do plan to acquire a new switch to replace the failing one, but what I
> have now is working so I'll watch it closely for now.
>
> Thanks,
>
> -derek
>
> --
>        Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
>        Member, MIT Student Information Processing Board  (SIPB)
>        URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
>        warlord at MIT.EDU                        PGP key available
> _______________________________________________
> gnucash-devel mailing list
> gnucash-devel at gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
>


More information about the gnucash-devel mailing list