Informed Terraform

terraform

At Foghorn, we love Terraform. And we write a lot of Terraform. We also code-review terraform as a matter of course, and we work with a lot of customers who use and author their own Terraform.

Over the years, we’ve been excited to see the ecosystem grow and evolve, through new feature additions, an ever-expanding public module registry, and the open-source approach to solving hard problems.

As such, we have some experience with what makes for elegant, easily reusable Terraform. In this post, I’d like to cover some specific approaches that can prove problematic, and then provide simple guidelines for writing Terraform well.

Avoid Deeply-nested modules

Modules are a fundamental part of Terraform. This is the core construct that allows for code re-use and parity across development, staging, and production environments. But too much of a good thing can be bad, too!

Modules calling other modules, ad nauseam, tends to cause headaches. The maintenance overhead from nested modules manifests in a few specific ways:

  • input variables and output values have to be repeated at each layer. Adding var.environment or a bucket_name output involves code changes in each module in the nested layers
  • Refactoring becomes more complicated. In particular, if your modules are sourced from other repositories, this turns into a series of coordinated pull requests, with the top-level apply either gated on all PRs, or temporarily pinned to feature branches
  • state manipulation (state rm, import) gets a bit burdensome. Consider having to write module.app.module.database.module.sec_group, for example. Granted this is a minor nuisance, but reviewing, reasoning about, and editing gets a bit tedious.

The last bit — tedium — is salient. Software is written to be used and read by humans. Eschew anything that introduces cognitive burden. Terraform allows large, complicated infrastructure deployments with a few keystrokes; let’s not add additional abstractions that disrupt that flow.

Use of files within Terraform configurations

A common pattern when writing Terraform configurations is to group Terraform objects of different types — variables, data sources, resources, outputs — into files, generally with those names:

providers.tf

variables.tf

resources.tf

outputs.tf

There’s nothing wrong with this approach. Terraform in fact doesn’t care how many files you use, nor which objects you place in each. However, this can lead to maintenance overhead for a few reasons.

  • Refactoring the configuration to move one part into a module becomes a guessing game. Which objects from variables.tf do I need to move? Answering this tends to be trial and errors (iterate on plan failures).
  • A pattern develops where many files contain only a few lines. Only one output? You’re still going to have outputs.tf. A side effect of this pattern is that any edit to the code requires opening most of these files.

As an aside, the providers.tf file is an exception here. Generally this file contains all provider definitions for the configuration, which will be used for all resources. Thus the objects in this file are used by objects in all other files, and can’t be removed. The two variations I routinely see in the wild are to have a providers.tf file, or else a versions.tf file (1, 2).

There’s no hard and fast rule here. The above approach works well if you have a large configuration, where for instance a variable for “environment” is used by multiple resources. And smaller files may mean less merge conflicts, when a larger team is working at a fast pace.

As an alternative, though, consider using Terraform files that contain all objects for a given purpose:

app.tf

database.tf

storage.tf

In each file, tell a story. Start with variables and locals, then data sources, then resources, and finally, what we built: outputs. This provides a narrative flow to the code so that it reads well. It also means each file tends to be a reasonable length and stands on its own. It also makes refactoring (moving components into modules), straight-forward: just move one file.

I have a set of files, in a “popup” directory, which I use to try out different services, deployments, and ideas, over time. The folder contains, for example, bastion.tf, jenkins.tf, lambda.tf. My litmus test for this narrative approach is this: can I rename any file to “backup”, and still run terraform plan without error.

Debugging

There’s no debugger for Terraform. You can’t step through an apply and inspect derived state. You can, however, use terraform console! It’s awesome. Use console. Another handy technique for isolating a problem within a large Terraform configuration bears mentioning. When there’s some small part of the larger configuration that isn’t working or doesn’t make sense, make a throw-away configuration (in a subfolder, say), with the bare minimum objects to reproduce the problem. This is essentially what you’d do if needing to reproduce the issue for a bug report, but it can also come in handy just for isolating an issue locally. In some cases, I’ll create what I call “synthetic” deployments which don’t even contain any terraform resources! The following is a complete, working deployment, for example:

$ cat main.tf
variable "env" { default = "dev" }
locals {
  app_name = "${var.env == "dev" ? var.env : "not-dev" }"
}
output "app-name" { value = local.app_name }

$ terraform init > /dev/null

$ terraform apply
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:
app-name = dev

Lack of Testing

Terraform isn’t a programming language. I talked earlier about the lack of a debugger. There’s also no inherent unit-testing capability. There are myriad ways to verify Terraform configurations. At Foghorn, we use Github Actions to provision AWS infrastructure, run Inspec integration testing against that infrastructure, and then destroy the environment (all within one pipeline). You can and should consider doing this! However, it can be expensive. The Terraform AWS provider docs even address this, directly. Integration testing is also slow; really, really slow. Especially when you’re provisioning RDS. No, seriously, it’s awful…ly slow. There isn’t a consensus right now on how to unit test Terraform code. The closest thing we’ve seen is policy enforcement, which Hashicorp provides via Sentinel. However, Sentinel is bundled with Terraform Enterprise, which doesn’t come cheap. Conftest, built upon Open Policy Agent, looks promising. I’m excited to dig into that, hopefully in the near future. Keep tabs on our blog for possible updates!

Closing

Let me just wrap up by saying that, at Foghorn, we love Terraform. Seriously, it’s a great solution for Infrastructure as Code. It’s also still a new tool (the current release is 0.12!) We’ve got a long way to go before a v1.0. So it’s understandable there are still some rough edges, and the community as a whole is still learning how best to use the tool. What I’ve presented here are specific lessons we’ve learned from  using Terraform day in and day out. Hopefully this post helps you to avoid some common pitfalls, too!

Previous

Next