Re: Unplanning network maintenance/outage

Derek Atkins derek at ihtfp.com
Sun Mar 17 12:50:28 EDT 2013


 I have a AC-DC-AC UPS. I an fairly sure it is not a power related problem.  The switch is old and has already burned through one power supply. I think it just got too old and tired.  I think it burned out the network card, too, possibly in its flailing..  I think it all relates to the switch. 

-derek

Sent from my HTC smartphone

----- Reply message -----
From: "Ted Creedon" <tcreedon at easystreet.net>
To: "Derek Atkins" <warlord at mit.edu>
Cc: <gnucash-announce at gnucash.org>, <gnucash-devel at gnucash.org>, <gnucash-user at gnucash.org>
Subject: Unplanning network maintenance/outage
Date: Sun, Mar 17, 2013 11:56 AM
Do you need a UPS?

Sounds like a power related problem

tedc

On Sun, Mar 17, 2013 at 5:18 AM, Derek Atkins <warlord at mit.edu> wrote:

Good morning, GnuCashers,



Some (many?) of you may have noticed the outage of 'code.gnucash.org'

starting with a lot of packet loss on Thursday and escalating into a

complete outage by Friday.  This took out our Subversion, Wiki, Email

List, everything server.  Well, as of 2:15pm US/EDT on Saturday

(yesterday) everything should be back to normal and operational.  If you

don't want to hear the gory details of what happened feel free to stop

reading now.



The issue was multiple simultaneous failures of multiple pieces of

equipment.  What I thought was a power outage turned out be caused by a

failure in my main network switch.  It started dropping ports, or

causing ports to fail partially (dropping packets).  This was also the

main cause of the packet loss, too.  However I didn't discover this

until later.



My main DHCP server was off the net; I swapped ethernet cables and it

appeared to fix the problem.



My main database server, however, lost its main network controller so I

had to install a new one (I have a few on hand, so it was a relatively

painless operation -- I just had to remember the magic voodoo to get the

system to call the new card 'eth0', but that was also only a few

minutes).



It was only after I got this working that I realized that it was the

switch that had failed -- many of the ports connected to actual hosts

had a 'dead link'.  I also noticed that my main DHCP server was

bouncing.  It would come on the net, stay for a bit, and then go dark.

Luckily I also had a few extra (smaller) switches lying around so I

linked a few of them together and moved all the non-working ports over.

This also fixed the bouncing DHCP server.



Last, but not least, the VM Server Host's network was wedged, requiring

a complete reboot to reset.  This also required resetting all the VMs,

some of which required a bit of hand-holding to come back (and many of

which required a virtual disk fsck as well, taking even more time).  The

last of the systems returned to service shortly after 2pm.



I do plan to acquire a new switch to replace the failing one, but what I

have now is working so I'll watch it closely for now.



Thanks,



-derek



--

       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory

       Member, MIT Student Information Processing Board  (SIPB)

       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH

       warlord at MIT.EDU                        PGP key available

_______________________________________________

gnucash-devel mailing list

gnucash-devel at gnucash.org

https://lists.gnucash.org/mailman/listinfo/gnucash-devel


More information about the gnucash-devel mailing list