2005-03-20

Parsing Filesizes in C#

In a nutshell, my first attempt at rudimentary filesize abbreviations looks OK. The code has naturally become more complex, but I think it's worth it. What used to be:

  Int64 count;
  count = Convert.ToInt64(args[1],10);

is now:

  string filesize = "";
  Int64 count;
  char ch = 'Z';

  filesize = args[1].ToLower();
  if (0 == filesize.Length) throw new ArgumentOutOfRangeException();

  count = 0;

  for (i = 0; i < filesize.Length; ++i) {
    ch = filesize[i];

    if ((!Char.IsDigit(ch)) && (0 == i)) { throw new ArgumentOutOfRangeException(); }

    if (Char.IsDigit(ch)) {
      count *= 10;
      count += (ch - 48);
      continue;
    }

    /* C# needs a more error-prone switch() statement */
    if ('g' == ch) { count *= 1024; ch = 'm'; }
    if ('m' == ch) { count *= 1024; ch = 'k'; }
    if ('k' == ch) { count *= 1024; ch = 'b'; }
    if ('b' == ch) { break; }
    if (',' == ch) { continue; }

    throw new ArgumentOutOfRangeException();

    }

Easy ways to improve this code:

  • move the validity check out of the for() loop, so that you just check (Char.IsDigit(filesize[0])) once, then either throw the exception or enter the loop.
  • Instead of modifying ch repeatedly, reduce each abbreviation to a single multiplication and then break. I never trust a pulled-out-of-thin-air hard-coded value and I think that three instances of "count *= 1024", while not necessarily efficient, at the very least makes it clear what is going on with little chance for error. If one of those became a "count *= 1034" by mistake, it would be a lot easier to catch it with the naked eye than a mistake between 1048576 and 1048676.
  • Speaking of hard-coded values, I could not find an efficient way of turning a char into its ASCII-equivalent digital value in the System.Text namespace. "(ch - 48)" is one of those hacks that seems like it will never break. I still don't like using it though. It cannot be beaten for speed.

Comments are welcome (they always are), and related horror stories, too.

No comments: