2005-12-19

stereopsis : strcmp4humans

Found on digg, here's strcmp4humans, a method by which you can "correctly" order numerical filenames. You know. If you have files (by which I usually mean porn) named "1.jpg", "2.jpg", and so on up to "60.jpg", you'll get "10.jpg" through "19.jpg" listed between "1.jpg" and "2.jpg".

That's just wrong. The workaround, of course, is to use fill zeroes: "1.jpg" becomes "01.jpg", where the number of extra zeroes is equal to Floor(log10(N)) where N is the greatest integer in the set. (I wondered why my pre-calc teacher spent so much time on logarithms. I guess they pop up in mathematics from time to time. This is the easiest, most definitive application for them I can think of.)

It would be great if you could alphabetize them such that numbers were handled in correct ascending order and not bit-wise comparisons of 1, 2, and 3 as ASCII bytes 0x31, 0x32, and 0x33 and so on. Well, here's the code to do that.

Too bad it's ASCII only. Whenever I see code like:

inline char tlower(char b)
{
  if (b >= 'A' && b <= 'Z') return b - 'A' + 'a';
  return b;
}

I think "Great. Let's hope we never need to use this outside of North America." Because let's face it. If you're still using pure ASCII in your code, it had better be completely black box or completely tutorial. It's the 21st century, guys. We can at least doff our caps to UTF-8. I love the "treat 'A' as an integer" trick as much as the next guy, but that's not going to win you any points in the Unicode camp. Psst! Here's a secret: the Unicode guys are winning.

Update: I like how if you google for "site:msdn.com unicode" you get almost exclusively the blogs of Michael "Internationalization-fu" Kaplan and Raymond "*-fu" Chen. Chen himself state that Kaplan has probably forgotten more about Unicode than most people know. Kaplan lays down the smack regarding comparing Unicode strings. Read and learn.

No comments: