2020-05-06

Loopy (Epilogue)

I would end up writing a loop-deploy4 and a loop-deploy5. Having finally written a reliable[1] deployment automation tool, it didn't seem that much more difficult to extend it to deploying multiple resource groups concurrently. I'd been using ConEmu to run multiple instances (ConEmu: the lazy man's fork()) but I felt that one instance of loop-deploy should be able to take all given target resource groups, divide them up into their respective colors, and do one of each in parallel until all groups were deployed.

I tried a couple of different parallelization approaches, but neither of them worked. One of them, either loop-deploy4 or loop-deploy5, I can't remember which, deployed correctly, but buffered all status logging until the end, so that was a showstopper. One of them logged things OK, but couldn't actually deploy things correctly because however you choose to do concurrency, each method moves your workflows and values around in ways that are at times too slippery to grasp. Scoping, man. It'll get ya every time.

Try though I might, I could never pin down parallel execution, whether it be through background jobs or RunspacePools. Multiple RunspacePools were an idea I hammered on for a week or more, one RunspacePool for each color, with a maximum concurrency limit of just 1 or 2 for safety's sake. I could never get that working in a way that (a) did the work and (b) logged things correctly. It was a mess.

Months went by and loop-deploy3, sequential as it ever was, remained my workhorse deployment utility. When the redeploy automation service that the other guy wrote started failing, I wrote "redeploy_wrapper", which was a wrapper around loop-deploy3 that just looked for machines flagged as "Redeploy" every 12 or 24 hours (I fiddled with the frequency as needed) and rebuilt them automatically.

Deployments typically took between 20 minutes and an hour. Things hummed along well enough that a parallelization feature didn't seem urgent. By the end of the following year the team would transition to a different service and we ended up shrinking the pool piece by piece, before shutting it down entirely. loop-deploy4 and 5 never worked well enough for me to trust them enough to run them in any production capacity, so loop-deploy3 loyally served me for only another year before becoming another artifact of a bygone system.

And still, it remains one of my proudest achievements for no reason better than that it gave me back my time.

Postscript

In reminiscing about loop-deploy3, I went back and re-read everything I'd read back then on leveraging concurrency in PowerShell. There's a nice turnkey module called PoshRSJob that will probably do what you need, but in my case it was (a) overkill, (b) too complex for me to understand, and (c) didn't solve the "multiple runspaces" problem in a way I could grok. A hashtable of RunspacePools is the "right" solution here (insofar as a solution can have a moral value like right or wrong) so I teased out that loop-deploy6 would've been a wrapper script around loop-deploy3 and would've looked a little something like this:

Param(
  [Parameter(Mandatory=$False)] [UInt16] $ThrottleLimit = 1
)
$ErrorActionPreference = 'Stop'
Set-StrictMode -Version 2

# testing only:
#   strict naming conventions in service resources
#   means that in practice this table can be 
#   derived spontaneously as needed
$resource_groups = @{
  'RGNRED1'   = 'RED';
  'RGNRED2'   = 'RED';
  'RGNRED3'   = 'RED';
  'RGNRED4'   = 'RED';
  'RGNBLUE1'  = 'BLUE';
  'RGNBLUE2'  = 'BLUE';
  'RGNGREEN1' = 'GREEN';
}

$q   = [System.Collections.Queue]::Synchronized( (New-Object System.Collections.Queue) )
$tid = [AppDomain]::GetCurrentThreadId()

Function Now {
  Param()

  $now_utc = (Get-Date).ToUniversalTime()
  Return $now_utc.ToString('yyyy-MM-dd HH:mm:ss.fffZ')
}

Function Log {
  Param([Parameter(Mandatory=$True)] [String] $String)

  $out_str  = "{0} " -f ((Now))
  $out_str += "pid={0} " -f ($PID)
  $out_str += "tid=0x{0} " -f ($tid.ToString('x4'))
  $out_str += "HEAD "
  $out_str += $String

  if (-not ([String]::IsNullOrEmpty( $q.Count ))) {
    $q.Enqueue($out_str)
  }

  Return
}

Function Dequeue-Messages {
  Param()

  while(0 -lt $q.Count) {
    $out_str = $q.Dequeue()
    Write-Host $out_str

    # not a bad idea to write these things to a log file, too
    # Out-File -Append:$True -FilePath $log_file -Encoding 'UTF8' -InputObject $out_str
  }

  Return
}

Function Main {
  Param()

  $script_path = [System.IO.Path]::Combine($PSScriptRoot, 'loop-deploy3.ps1')
  if (-not (Test-Path -Path $script_path)) {
    Write-Error 'script path not found'
  }

  Log 'start'

  $jobs   = @()
  $psh    = @{}
  $rsp    = @{}
  $run_id = 0

  # given that loop-deploy3 can accept an array of ResourceGroupNames
  # and run them in serial, and that we typically only want to deploy
  # one resource group per color at once, a smart way to do this might
  # be to group all ResourceGroupNames by color and process one
  # PowerShellAutomation $psh object for each and pass in an array of
  # ResourceGroupNames as a single parameter...
  #
  # $resource_groups.GetEnumerator() | Group-Object -Property Value | % {
  #   @{ 'Count' = $_.Count;
  #      'Color' = $_.Name;
  #      'RGN'   = $_.Group.Name;
  #   }
  # }
  #
  # ...but this works too:

  $resource_groups.Keys | Sort-Object | foreach {
    $rgn   = $_.ToUpper()
    $color = $resource_groups[$rgn].ToUpper()
    $run_id++

    if ([String]::IsNullOrEmpty( $rsp[$color] )) {
      $rsp[$color] = [RunspaceFactory]::CreateRunspacePool(1, $ThrottleLimit) # (min, max)
      $rsp[$color].Open()
    }

    $psh[$run_id]              = [System.Management.Automation.PowerShell]::Create()
    $psh[$run_id].RunspacePool = $rsp[$color]

    [void] $psh[$run_id].AddCommand($script_path)
    [void] $psh[$run_id].AddParameter('Object', @{
      'Id'                = $run_id;
      'ResourceGroupName' = $rgn;
      'Color'             = $color;
      'Queue'             = $q;
    })

    Log ( ("queued: run_id={0} RGN={1} " -f ($run_id, $rgn)) ) # DEBUG
    $jobs += $psh[$run_id].BeginInvoke() # $jobs[0] == run_id 1
  }

  Log 'Everything has been queued'

  do {
    Start-Sleep -Milliseconds 250
    Dequeue-Messages
  } while ((0 -lt $q.Count) -or ($jobs.IsCompleted -Contains $False))

  Write-Host 'Cleaning up'

  # $psh keys are run_id ints
  foreach ($id in $psh.Keys) {
    $psh[$id].EndInvoke($jobs[$id - 1])
    $psh[$id].Dispose()
  }; $psh.Clear()

  foreach ($color in $rsp.Keys) {
    $rsp[$color].Close()
  }; $rsp.Clear()

  Return
}
. Main
Exit

This approach may not be ideal for you; in that case I'd recommend you use PoshRSJob or draft something simpler than this. Most concurrent programming examples you can find are of the type "If you have X different identical jobs, how do you run them in parallel up to at most Y at the same time until all of them are finished?" My situation, with a thousand machines, was a little different and my wish was for a system that could identify unrelated groups and run them simultaneously. Hence this design, which is "for every color of VM that needs to be deployed, run $ThrottleLimit deployments of that color at once, regardless of the total number of colors". There were only 8 permutations in the pool. If there had been 8,000 I'd have sung a different tune.

My big sticking point in writing loop-deploy4 or 5 back in the day was trying to pair up RunspacePools and PowerShell instances 1-to-1: one each for Red, one each for Blue, and so on. That's no good. The a-ha moment here was that I needed an M-to-N mapping of M PowerShell instances and N RunspacePools, where M is the quantity of resource groups to deploy and N is the quantity of colors of those resource groups. Seems totally obvious in retrospect. Programming experts and logisticians can feel free to laugh at me now.

I also looked into log management and the best thing I could get working was a synchronized queue that was passed to each invoked PowerShell instance to be hooked up to a lightly modified loop-deploy3 logging function:

Param(
  [Parameter(Mandatory=$True)] [PSObject] $Object
)
$tid = [System.AppDomain]::GetCurrentThreadId()

Function Log {
  Param([Parameter(Mandatory=$True)] [String] $String)

  $now_utc  = (Get-Date).ToUniversalTime()
  $out_str  = "{0} " -f ($now_utc.ToString('yyyy-MM-dd HH:mm:ss.fffZ'))
  $out_str += "pid={0} " -f ($PID)
  $out_str += "tid=0x{0} " -f ($tid.ToString('x4'))
  $out_str += "{0} " -f ($Object['ResourceGroupName'].ToUpper())
  $out_str += $String

  if (-not ([String]::IsNullOrEmpty( $Object['Queue'].Count ))) {
    $Object['Queue'].Enqueue($out_str)
  }
  Return
}

Test output looks promising:

2020-05-03 01:39:45.194Z pid=6832 tid=0x05dc HEAD start
2020-05-03 01:39:45.270Z pid=6832 tid=0x05dc HEAD queued: run_id=1 RGN=RGNBLUE1
2020-05-03 01:39:45.285Z pid=6832 tid=0x05dc HEAD queued: run_id=2 RGN=RGNBLUE2
2020-05-03 01:39:45.288Z pid=6832 tid=0x4534 RGNBLUE1 start
2020-05-03 01:39:45.324Z pid=6832 tid=0x05dc HEAD queued: run_id=3 RGN=RGNGREEN1
2020-05-03 01:39:45.360Z pid=6832 tid=0x3d04 RGNGREEN1 start
2020-05-03 01:39:45.409Z pid=6832 tid=0x05dc HEAD queued: run_id=4 RGN=RGNRED1
2020-05-03 01:39:45.428Z pid=6832 tid=0x05dc HEAD queued: run_id=5 RGN=RGNRED2
2020-05-03 01:39:45.428Z pid=6832 tid=0x05dc HEAD queued: run_id=6 RGN=RGNRED3
2020-05-03 01:39:45.428Z pid=6832 tid=0x05dc HEAD queued: run_id=7 RGN=RGNRED4
2020-05-03 01:39:45.428Z pid=6832 tid=0x05dc HEAD Everything has been queued
2020-05-03 01:39:45.428Z pid=6832 tid=0x14c0 RGNRED1 start
2020-05-03 01:39:46.317Z pid=6832 tid=0x4534 RGNBLUE1 end
2020-05-03 01:39:46.341Z pid=6832 tid=0x4534 RGNBLUE2 start
2020-05-03 01:39:48.347Z pid=6832 tid=0x4534 RGNBLUE2 end
2020-05-03 01:39:48.405Z pid=6832 tid=0x3d04 RGNGREEN1 end
2020-05-03 01:39:48.450Z pid=6832 tid=0x14c0 RGNRED1 end
2020-05-03 01:39:48.459Z pid=6832 tid=0x14c0 RGNRED2 start
2020-05-03 01:39:52.466Z pid=6832 tid=0x14c0 RGNRED2 end
2020-05-03 01:39:52.468Z pid=6832 tid=0x14c0 RGNRED3 start
2020-05-03 01:39:56.473Z pid=6832 tid=0x14c0 RGNRED3 end
2020-05-03 01:39:56.480Z pid=6832 tid=0x14c0 RGNRED4 start
2020-05-03 01:39:57.481Z pid=6832 tid=0x14c0 RGNRED4 end
Cleaning up

The $Object object used to pass things from loop-deploy6 to loop-deploy3 is just a lazy hashtable that contains whatever I need without having to explicitly Define All The Things. The 1,220 VMs don't exist anymore, but preliminarily running this on my Windows box shows enough promise that I'd have eagerly begun to run this in Prod under strict observation and, finally, get better concurrent deployment control.

Alas, better luck next time.

[1] I really can't overstate how reliable loop-deploy3 was. It was the culmination of 12 months of real-world deployment experience boiled down into one script. (And a module. And a suite of other scripts that loop-deploy3 called. But it knew what scripts to call and how, dammit.) I've written a ton of scripts in my day, and honestly, only about six of them have been the kind of thing that I'd stake my life upon. loop-deploy3 is one of them. Of all the life-staking scripts I've ever put into production, personally and professionally, loop-deploy3 is at the top of the list in terms of its biggest undocumented feature: "lets me sleep at night". If I ran loop-deploy3, I knew that it was either going to succeed, or that if it kept failing I was asking it to do something impossible. (The smartest thing loop-deploy3 did was sleep for ten seconds right at the start of its run. This was just so that I, the one who called the script, had a moment to think about what I'd just done and could Ctrl-C the script to end it before it did something stupid on my behalf. This was not a "day one" feature and was added after I made it do something stupid on my behalf.) loop-deploy3 was run, and patched, and run again, and again, and then again, vetted over the course of 1,220 machines, dozens of image updates, and countless reboots, repairs, redeployments, scale-ups, scale-downs, and a bunch of other service somersaults that naturally occur in organic, growing system usage. Its lifespan was short, but its impact was immeasurably vast. I may never look upon its likes again.

External links

https://blog.netnerds.net/2016/12/runspaces-simplified/

https://imanage.com/blog/quickly-and-easily-threading-your-powershell-scripts/

https://devblogs.microsoft.com/powershell/scaling-and-queuing-powershell-background-jobs/

https://devblogs.microsoft.com/powershell/powershell-foreach-object-parallel-feature/

https://adamtheautomator.com/powershell-multithreading/

https://blogs.technet.microsoft.com/heyscriptingguy/2015/11/26/beginning-use-of-powershell-runspaces-part-1/

https://github.com/proxb/PoshRSJob

No comments: