2005-01-31

Today, the 'R' Stood for "Rescue"

The problem: user complains that PC won't boot. Has been crashing or intermittently failing to provide a usable desktop after boot up for about a week.

The cause: user's hard disk drive is chirping like a finch that knows Morse code. Disk is ever so gradually grinding itself into powder. At least this isn't as bad as the time I walked into somebody's office and their PC was making a sound not entirely dissimilar to a coffee grinder. They could not specify how long it had been making that noise. When I opened the drive, fine magnetic dust poured out like graphite from the last couple shakes of emptying a pencil sharpener.

The heads had scored deep grooves onto the platters I could see. It looked as though someone had taken a car key to a vinyl record playing at 78 rpm. For half an hour.

Solution: a new drive, and pray there's enough ghost left in the drive to retrieve the urgent stuff. User tells me he keeps everything important on our network file server. Thanks to God immediately ensue. User is mysteriously nonplussed and amicable. He could not locate an important customized database application on the new drive (The new drive had an OS and nothing more. User left work while I was still troubleshooting, and I stayed late just getting it that far.), so he assumes it's toast. While most users would begin actually sobbing or, more likely, ranting and raving about how I didn't telepathically know that the disk was singing its swan song a week earlier and proactively secure a safe copy then, this user is rather peaceful. Turns out, the consultant who maintains the database application keeps a backup of it offsite. This user is Zen personified. More thanking of more deities follows.

But here's the sticky wicket: even if every important application can be reinstalled or retrieved from another storage medium, there remains something to be said for resuscitating an only-mostly-dead drive and getting the most recent copy of the goods from it if you can.

But how much effort can one put into that phrase "if you can"? The old ways are right out:

  1. Get tomsrtbt Linux.
  2. Boot it on a PC connected to both a good drive and the bad drive.
  3. # mkdir /mnt/hda
  4. # mkdir /mnt/hdb
  5. # mount -r -t auto /dev/hda1 /mnt/hda
  6. # mount -t auto /dev/hdb1 /mnt/hdb
  7. # mkdir /mnt/hdb/restored
  8. # cd /mnt/hda
  9. # cp -a -v * /mnt/hdb/restored

This only works on FAT and FAT32 partitions, and if the disk happened to run any flavor of NTFS, you're boned.

What about xcopy?

xcopy is just about worthless here, since even telling it to ignore errors won't convince it to keep trying after it gets a Really Big Error, something along the lines of "drive not found". That's pretty much a show-stopper as far as xcopy is concerned. xcopy doesn't even play it smart about remembering what files it's already got, so if you run the same xcopy command twice, it will either recopy all the files you already have (if you are using the /Y flag), or it will prompt you for each and every one of them. All that getting and regetting of the same files just can't be good for a busted drive's internals. Less redundant work for the bad drive equals more chances to get good stuff you skipped the first time.

So here was my next thought. rsync. rsync already does everything else, right? It copies, it compresses, it slices, it dices. Why not try to use it for disk rescuing, too?

First problem: rsync is designed to be a network tool, so it parses Windows paths improperly. rsync sees "c:\new" as a host called "c" and a directory called "\new". The "\new" could be translated as "literal n+ew", I-knew-what-you-meant-anyway "new", "newline+ew", or rsync might just throw a parsing error. It probably won't get that far because it's most likely that you don't have a machine called "c" on your network, and it will complain about that first and foremost.

Fortunately, there is a workaround. Every UNIX application ported to Windows has its own way of handling the incongruities between the two filesystems. If you use Microsoft Windows Services for Unix, you can use the old /dev/fs/C and /dev/fs/D semantics.

If you are using a version of rsync.exe made by the Cygwin project, you can rely on /cygdrive/C and /cygdrive/D. So now instead of having xcopy tank as soon as your bad drive starts misbehaving, you have a network tool designed to deal with shoddy connections pulling only the files it doesn't already have.

You want an example? Sure. Assume that you have a good drive with Windows installed as "C:" and the less-than-chipper chirping drive installed as "D:". The Cygwin executable rsync.exe requires a small compliment of .DLL files, but that is beyond the scope of this document. Write me if you're really curious.

To rescue everything from "D:":

c:\> rsync -vPrt /cygdrive/D new

That's it. It's just that easy, folks.

Now, rsync can't make a corrupted filesystem whole again, so it's going to miss files. It's going to miss the same files that xcopy misses. But where xcopy throws a "drive not ready" error and calls it a day, rsync simply reports that "the file has vanished" and keeps right on going. If you're brave, you can rerun the above command and rsync will know enough to skip the files it already has.

Warning: when dealing with a trashed filesystem, you can't really trust rsync to know which files it already has: files can very quickly change size, modification time, or they can disappear altogether. rsync tries to be smart about this sort of thing, as per its network-copy background. Newer files replace older ones, even if the file was OK on the first pass and garbage on the second. xcopy would not fare any better.

rsync isn't a disk rescuing tool, but it at least has a "never give up, never surrender" attitude that works wonders for dire situations like the one I've described. xcopy may in fact be more than capable of getting good data off of a bad drive if you're patient enough to remember where it died and ask it to try again in another location. If I had that kind of time, I wouldn't even want to waste it babysitting an xcopy process, telling it which files to skip and which to nab.

Instead, I think I'll take the 'r' out of rsync and put it in my emergency rescue toolbox.

No comments: