2019-05-10

Terraform is Terrible: Part 5

I spend a lot of time here urging about the benefits of lighting a candle instead of cursing the darkness. In that spirit I wanted to end on a high note for this week I've spent pointing out how bad and broken Terraform is when trying to use it on Azure.

But I have to be honest with you: I'm not sure I know how to light this candle.

It's not that I didn't try. On a typical Tuesday I sat down to play with Terraform and lost four hours catching just a speck, a sliver, of its vast and dimensionless horrors.

The next day I still wasn't OK with what I'd just experienced, so I wrote four days' of blog posts in a single sitting.



The words of Spider Jerusalem were echoing in my mind as I hammered out this week's Monday through Thursday entries: "Home entertainment system: give me fire." But I didn't have an ending. The day after my writing binge, I went back and relearned Terraform.

Seriously.

OpenBSD 6.5 came out just a week or two ago and I was excited to use Terraform to manage my OpenBSD Azure deployments, so I built a new 6.5 image and then I used Terraform to deploy it to Azure anyway. I paved over every pothole, hard-coded every vnet, and wrote up an epic set of main.tf files to put my OpenBSD VM out there. And it worked. Kinda.

There were still more pitfalls waiting for me to fall into and if, figuratively speaking, I'd been trying to run a marathon instead of a 100-meter dash, I imagine I'd have fallen into many more. But my scope was narrow and by Saturday I'd built a library of main.tfs that accurately (and verbosely) describes my infrastructure, and I used it. I'm running OpenBSD on Azure right now thanks to Terraform. And it was a miserable process.

But that misery helped me to think about what I'd improve. I had to walk the mile, or at least 100 meters, in a Terraform user's shoes before I could finally put the last nail in this piece of shit software's coffin. And I couldn't do it.

Terraform is such a nice idea. It's multi-platform. It's got an installer for OpenBSD for Christ's sake. I really want to see this disaster of a project succeed.

So I went digging and I found the AzureRM availability set file that has the garbage defaults. It'd be easy to file a Github issue to get those defaults changed, so I whipped up another main.tf to run so I could copy the exact error message to include in my Github issue for both posterity and search engines to find later.

And it deployed just fine. >:(

This is when I learned that the availability set resource object's defaults of unmanaged+3 fault domains is just fine... unless you want to use it to deploy a VM using Azure managed disks, which is the recommended config now. Huh? Yeah. The availability set defaults aren't actually wrong, they're just outdated. You need to try to deploy a VM, too, to hit the error:

Error: Error applying plan:

1 error(s) occurred:

* azurerm_availability_set.myavailset: 1 error(s) occurred:

* azurerm_availability_set.myavailset: compute.AvailabilitySetsClient#CreateOrUpdate:

Failure responding to request: StatusCode=400 --

Original Error: autorest/azure: Service returned an error.
Status=400 Code="InvalidParameter"

Message="The specified fault domain count 3 must fall in the range 1 to 2." 
Target="platformFaultDomainCount"

(Note that the error occurs in the azurerm_availability_set resource, but the error is that it's incompatible with your azurerm_virtual_machine resource, which isn't mentioned in the error. This garbage is how software support contracts get sold.)

So what's the fix here? I'm still not sure. Terraform would need to perform some kind of check with the cloud provider frontend to see if your settings are valid, but Terraform doesn't actually do that until you run terraform apply and by then, it's too late to ask for a do-over. You'd think that terraform would figure this out during its planning stage, but planning is just that: Terraform puts together an idea of what it needs to do to make your main.tf's wishes come true. It doesn't actually deal with all the logistics until you pull the trigger and apply the plan, at which point Terraform is happy to let your plans get shredded to bits faster than an army ground charge in World War I.

(Side note: I watched the movie Regeneration once twenty years ago and I really wish it would come out on DVD or BluRay because it's an excellent film that should show up one day on the Friendly Fire podcast. I'd rate it five armbands.)

Then in the same repo I found the data source for AzureRM virtual networks. This is the place that needs a filter so you can let Terraform get all your vnets and then return the one you need based on some user-defined criteria. Then I cross-referenced it against the AWS repo which has filters... and I couldn't make heads or tails of it.

Because Terraform is written in Go, it's pretty versatile with respect to being able to run on multiple different platforms. And because it's written in Go, I can't write patches for it, because I can neither read nor write Go.

Yet.

So my journey begins. I'm putting Rust aside for now and I'm going to teach myself Go. At least enough to be able to read what this filtration code is doing and maybe, somehow, write something similar for Azure. It's not going to be quick, or easy, or fun.

But I'm sick of this darkness.

1 comment:

Wawrzek said...

For anyone getting here, please check following bug report in Terraform pluing: https://github.com/terraform-providers/terraform-provider-azurerm/issues/944

Seems that the region needs to be changed.