You’re Doing It Wrong: The Path to Building Better Terraform Modules

| | | 0 comments

This is the first post in a series on Terraform modules.

Terraform logo

You’re doing it wrong; I know because I’ve built a lot of crap Terraform. I’m a connoisseur of terrible Terraform. Since the 0.5 release, I’ve tried every pattern — and anti-pattern — in the book.

In the following, I will explain the lifecycle common to a lot of companies we encounter when it comes to Terraform. You are probably somewhere on this timeline. Figure out where you are on it and save yourself a ton of heartache by skipping to the end.

The Codified Infrastructure Lifecycle

Step 0: “Oh look, a button!”

So at some point, someone on your team will get the bright idea to codify your infrastructure. This is a legitimately great thing to do. Point and clicking around the console is great for learning; but when it comes to tracking changes over time, redeploying your stack, or making upgrades, codified infrastructure is a requirement for accomplishing that with any sanity.

However, the first button people find is rarely the correct one. Whether it’s an Elastic Beanstalk template found on Github or CloudFormation code linked from some blog somewhere, chances are the first code you use is a bunch of stuff you don’t really understand because someone else put it together.

Step 1: “Okay but don’t make me learn anything.”

Later, when you decide that you do need to understand what’s going on under the hood, you’ll look for the resources in tools you already understand. For folks coming from data centers, this often comes in the form of some Ansible playbooks, Chef recipes, and the like.

However, these tools are from and for a different era. The needs of config management and why we use those tools are super distinct from the needs we have provisioning resources in the cloud. Remotely held state; compatibility with the latest cloud APIs; a programmatic understanding of resource attributes and their lifecycle: these needs call for purpose-built tooling.

Enter Terraform.

Step 2: Building stuff in Terraform

In your first Terraform-101 work, often your setup is a single repo, built around a single terraform-apply, and looks like this:

single-terraform-apply

Another route I see often is the “everything gets its own terraform-apply”-method that looks like:

terraform-apply

The issue with the former single-apply method is that it doesn’t provide enough separation of state between your production and non-production resources. A test gone awry in your lower environments can seriously hamper or even corrupt your production deployment.

The issue with the latter is that it provides too much separation and doesn’t allow Terraform to do what it does best: manage dependencies between resources.

Step 3: Enter Terraform Modules

Eventually, folks learn about modules and build something that looks like the following:

terraform-module

This approach still holds everything in a single git repository, but it at least separates state (via the `deployments/prod/` and `deployments/staging/` folders) and introduces the idea of modules in those environments.

Modules are great as they allow replication of resources in a systemized and repeatable way. By way of an example, imagine your application required four mostly similar RDS-Postgres database clusters. Each of those would require duplication of a number of resources; the cluster, cluster members, scaling and retention policies, security groups, etc. Rather than duplicating work, effort, and maintenance across four unique versions of mostly similar resources, a single best-practices module can be made, with variable options to capture whatever differences exist between instantiations.

The above layout starts to achieve those goals; however, there are three patterns with the above example I would recommend getting beyond as quickly as possible.

To start, the resources provisioned by the above modules are likely dramatically small in scope. A single security group, for instance, is unlikely to be more exhaustive than the resource it provides. If you find the number of resources in a module are few or the ratio of module-variables to resource-attributes is even close to 25%, then you are likely building modules of marginal benefit to that of just having called the resources directly. As an example, that very small security-group-rule module in the above example would look something like the following to be usable by the multitude of cases where it would be needed:

resource "aws_security_group_rule" "main" {
   type		      = var.rule_type
   from_port	      = var.from_port
   to_port	      = var.to_port
   protocol	      = var.protocol
   cidr_blocks	      = var.cidr_blocks
   security_group_id  = var.security_group_id
}

By scoping a module to a very limited number of resources, there are generally few best practices it can implement. Further, it creates an ongoing technical debt to keep up with feature releases on each product and the API for those resources. Scope your modules big enough such that they incorporate a suite of commonly needed services with sensible — if alterable — defaults.

Hand in hand with module scope, the above has a series of module interdependencies that will turn into a mess. It is likely that the above `autoscaling-group` module calls on the `security-group` module, which itself calls on the `security-group-rule` module. The predicament that will occur when the `autoscaling-group` module gets a sister module (say an `rds-postgres` module) that also needs to call on `security-group` but needs a different configuration than `autoscaling-group` needed can get gnarly.

In concert with keeping module scope to reasonably large size, avoid interdependencies between modules most of the time. I’ve seen and written a lot of Terraform and I have rarely seen occasion to introduce more than two layers of modules and most of the time, a single layer will suffice. Any more than that and changes will introduce a level of complexity that will have you pulling out mid-day scotch to try to sort out.

Finally, much like avoiding interdependencies between modules, so too avoid interdependencies between environments. In the above example, the modules are being held in the same commit as the environments they serve. However, when a change needs to be made to the `autoscaling-group` module and tested in staging, doing so is going to alter the same files currently being called upon in production. Avoiding this likely requires making a new copy of the files (potentially `modules/autoscaling-group-staging/`) and copy-pasting between folders, both of which are prone to error. Alternatively, a system could be built to coordinate applies between environments, a further and unnecessary complexity when git modules exist.

Step 4: Git-integrated Terraform Modules

The ability to store and refer to Terraform modules directly as git repositories is the absolute bomb. Such a call commonly looks like this (to use our networking module as a reference):

  provider "aws" {}
  module "aws_vpc" {
      source = "git@github.com:FogSource/m-vpc?ref=v0.2.2"
      nat_gateways = 1

      subnet_map {
        public    = 1
        private   = 1
        isolated  = 0
      }
}

Breaking down the `source =` line we can see a few things. We call the entire module by way of reference to its repository; we rely on the underlying ssh setup common across companies and stacks; and via that `?ref=` attribute, we can refer to specific versions of that repository to call.

More specifically, we can use commit-hashes, branch names, and –my favorite and what you see above– tag names as the reference. Doing so allows us to build, upgrade, and test modules completely independently from what is deployed in production. Production can indefinitely point to `v0.1.0` while work continues in lower environments on the `v0.2.x` line. In this way, a huge portion of the mess that occurs in trying to maintain environment independence is handled gracefully. So when one environment or application stack needs a new feature added to a module, it doesn’t need to be tested against every invocation of it as those are pinned to a previous version. Code can be altered, features added, and all done safely in the confines of `refs` that don’t affect previous invocations until those instances are independently upgraded.

Next Steps

In later posts in this series, we will dig into ideas around semantic versioning of modules, provider inheritance, how to build interoperable modules, and when to ignore some of the things I’ve suggested.

If you have any questions, please don’t hesitate to reach out. As the head of our Terraform Center of Excellence, I enjoy talking about this stuff all day long and would be very interested in hearing the experiences of others — especially if you disagree with the points above.

Azure DevOps YAML Pipeline with Terraform

Azure DevOps YAML Pipeline with Terraform

In my last post, I discussed the power of the Azure DevOps YAML pipeline with all of its built in features.  Today, I would like to focus on a specific use case for the Azure DevOps YAML pipeline with Terraform.  By combining these two great technologies, engineers...