2020-05-02

Loopy (Part 3)

In the second half of October, my coworkers may have noticed strange behavior from me. I'd come in, close my door, and stay in there, leaving only for restroom breaks, coffee, food, and important meetings. I came in early (N.B.: "Early" for me is before noon.) and I stayed late (N.B.: "Late" for me is "whenever it's fucking done or I am too tired or hangry to keep going".) I was a man on a mission. I was dedicated to the idea of not having to repeat all the hand-holding I'd just done when the patching for November rolled around, and I had 20 days to do it.

It was during this period of time that I wrote "loop-deploy3.ps1". It's also the period of time in which I wrote loop-deploy1 and loop-deploy2, but I don't care to mention them. They were important steps in the process and I learned quite a bit about deployment automation in developing them. So, loop-deploy3: as its name suggests, was chock full of retry logic and was, really, just a super wrapper around Deploy-Machine for all of the eight permutations of VMs I needed to deploy. The elegant part of loop-deploy3 was that I took that five-fold decision matrix and boiled it down to one piece of needed information: ResourceGroupName.

I accomplished this by defining every resource group, and the different possible types a VM in the pool could be, as a hashtable:

$Pool = @{
  'Prod' = @{
    'RGNRED1' = @{
      'Location' = 'East US';
      'Color'    = 'Red';
    };
    'RGNRED2' = @{
      'Location' = 'West US 2';
      'Color'    = 'Red';
    };
    'RGNBLUE1' = @{
      'Location' = 'East US';
      'Color'    = 'Blue';
    };
  }
  'Test'  @{
    'RGN1REDTest' = @{
      'Location' = 'Central US';
      'Color'    = 'Red';
    };
  }
}

I generated this information partly by querying the VMs and partly by hand-editing the correct values where they belonged. I then exported this pool resource map data structure to disk, cleverly called "PoolTable.xml".

From the Pool Table, I know the ResourceGroupName, the geographical location, and the type/color that the VMs in that group should be. All that's missing is the name of the VM image that those VMs should use, information which I put into "ImageTable.xml":

$Images = @{
  'Red'   = 'WinVM_Win10_RS1-YYYYMMDD.vhd';
  'Blue'  = 'WinVM_Win10_RS2-YYYYMMDD.vhd';
  'Green' = 'Win8_1_IE11-YYYYMMDD.vhd';
}

There's one saving grace I didn't mention. Not only did Barney design cool features into the system before the provider could, er, provide them, he also built a strict naming convention into the deployment system. At first this seems like a debilitating limitation in the system, but when you can't name things whatever you feel like, you can accurately predict things based on what they're called. You can even go one step further and call a name the same as a definition.

And when you can define something by name, you don't need 20 lines to say "deploy this in accordance to its definition":

$color = $Pool[$ResourceGroupName]['Color']
$image = $Images[$color]

if ($ResourceGroupName.ToUpper() -match '^RGNRED(\d)+$') {
  Deploy-Red $ResourceGroupName $image
}
if ($ResourceGroupName.ToUpper() -match '^RGNBLUE(\d)+$') {
  Deploy-Blue $ResourceGroupName $image
}
if ($ResourceGroupName.ToUpper() -match '^RGNGREEN(\d)+$') {
  Deploy-Green $ResourceGroupName $image
}

I had a "Deploy-Red" function that would call "Deploy -Red:$True" with the right arguments for a red VM, and there was a "Deploy-Blue" and "Deploy-Green" and so on. All the heavy lifting was in the Deploy function which would:

  • go through the arguments provided to figure out the settings to use in the deployment
  • load my PS module and call Disable-Machine
  • check to make sure the VM was idle
  • stop the VM when safe to do so
  • delete the VM snapshot and VM
  • copy over the correct image VHD if not already present
  • rebuild a new VM
  • check VM for health
  • put healthy VM back into rotation with Enable-Machine

At every point in this process something could fail, and I have personally seen each of these steps fail more times than I care to remember.

N.B.

After more than two years of bending over backwards to avoid having to perform any maintenance more destructive than necessary on the pool, I eventually added a "-SeriouslyNaughty" flag to loop-deploy3. "loop-deploy3.ps1 -ResourceGroupName RGNRED1 -SeriouslyNaughty" was a one-liner equivalent to saying "Stop everything but what you're already doing. Wait until you've finished whatever you're working on right this second, and when it's done, tear yourself down and nuke absolutely everything including the resource group. Don't stop until everything is gone. Then make a new, empty resource group and start over from scratch."

I didn't put SeriouslyNaughty into loop-deploy3 until I desperately needed to raze something down to the ground and till the bodies of the dead back into the soil to maybe try to help whatever would get planted next. It was a very late-stage addition reserved for when the cloud was being so obstinate or unreliable that Papa would put on his mad face and need to break out the big guns. I even added a "-Force" flag for Really Bad Situations that would skip the waiting part and just start issuing immediate delete orders without saying "please" or "thank you" first. I'd end up using the SeriouslyNaughty flag on a regular basis in pool management and it improved the overall health of the pool the same way cutting back parts of a plant helps the rest of it grow.

Next time: Epicycles

No comments: