2006-02-23

Connectivity Problem-solving

If you're driving down the road and all of a sudden the engine starts sputtering and you coast to a stop, it could be for any number of reasons. If you drive a wide variety of vehicles for ten or more hours a day every day for years, you will probably start to develop a keen intuition for guessing what the problem is.

Expertise in a subject isn't a piece of paper, or letters you put after your name. It's the frequency with which your guesses are found to be correct. You may, for example, find yourself saying "when I drive a Chevy and the engine starts sputtering like that, it's often because there a problem with one of the spark plugs." You might also say "when I drive a Honda and the engine starts sputtering, it's usually because it's run out of gasoline."

Maybe you can even say "when I drive a Toyota and the engine starts sputtering, it's because I've gone and put a penny down into the carburetor first. Toyota engines never sputter like that on their own."

After many years of working with computers, I like to consider myself an expert on them. So when a machine just drops off my network for no reason, I have developed a sense of why, perhaps, it's gone tits up. A particular server at work, for instance, will kernel panic if it runs out of memory, say if a user or fifty decide they want to have a several-hundred megabyte inbox and crash their IMAP connections. Now the IMAP server can recover from this by spawning a new instance, but the lost memory of the killed instance is lost forever. Repeated abuse whittles away the available memory until finally the machine tries one time too many to allocate some more space and boom: blue screen.

If a Windows system drops off the network, it is more often than not because the machine itself has kamikazed. My first reaction when I can't ping a Windows box is to physically travel to that machine and check it for problems. This usually results in a reboot. At worst, it's a reboot of the "use the power switch" variety. The worst of the worst is the Linux appliance I have that further demands I disconnect the power cord for ten seconds.

When my OpenBSD server at home goes off the network, I scratch my head because my OpenBSD server should never crash. Shy of a power outage or rats chewing through some cables, my OpenBSD system has easily met an uptime of 100 days and should be able to handle 100 more. (After that, things get murky. I am apartment hunting, after all.)

So tonight, I plug my iBook into the network and run an rsync command. It fails. Pinging is equally fruitless, and all signs point to "your OpenBSD server may as well be turned off and sitting in a closet somewhere." But that's not good enough for me. I have to wonder what would cause this system to not respond to anything.

Turns out my iBook's network cable wasn't completely in the socket. Problem solved! I love what you do for me, Toyota.

No comments: