Infrastructure as Code in Google Cloud

| | | 1 comment

Why Foghorn Codes Infrastructure

At Foghorn, we manage lots of customer infrastructure via our FogOps offerings.  Code is a great way to help make sure that we deliver consistent infrastructure that is repeatable and reusable.  We can version our infrastructure. We can spin up new environments (staging, qa, dev) in minutes with the confidence that they are exact duplicates.  We can even ‘roll back’ to a degree, or enable blue/green deploys.

Our Favorite Tool

Each cloud provider has their own tool(s) for defining, provisioning, and managing infrastructure as code.  For us, since we work in so many different environments, we’ve chosen to standardize on Terraform by HashiCorp.  Terraform has great provisioners for all of the major cloud providers, and has some really cool features that you won’t see across the board from the cloud specific options.   Although we are competent in all of the tools, Terraform gives us the unique opportunity to build multi-cloud infrastructure with a single deployment.

Example – Hello Google

Google Cloud Platform has come a long way in the last couple of years. My colleague Ryan Fackett recently put together a modified “hello world” example for Google Cloud, so I’ll use that to get us from the 10,000 foot view straight down to taking a peek at some code.  In order to spin up a workload in Google Cloud, we first need a network and a few other infrastructure dependencies. These include:

  • Network
  • Subnetwork
  • Firewall Ruleset

With these basics in place, we can spin up a server, configure it as a web server, and write our “hello Google” app.  To get traffic to it, we’ll write an IP forwarding rule, and note the IP address.  If all goes well, the code we write will create the resources, configure the server, and we’ll be able to hit the IP address with a web browser and see our site.

A Look at the Code

Let’s take a look at the code that creates the network.  We need a network and at least one subnet.  Since a subnet lives in a single region, we’ll create a few subnets in different regions to allow us to spin up a server in various parts of the world.  The code has been snipped for brevity, but it should give you a good idea of the code you may write to form your infrastructure.

The Network

resource "google_compute_network" "vpmc" {
name                    = "vpmc"
description             = "VPMC Google Cloud"
auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "vpmc-us-east1" {
name          = "vpmc-us-east1"
ip_cidr_range = "${var.cidr_block["net1"]}"
network       = "${google_compute_network.vpmc.self_link}"
region        = "${var.subnetworks["net1"]}"
}

resource "google_compute_subnetwork" "vpmc-us-central1" {
name          = "vpmc-us-central1"
ip_cidr_range = "${var.cidr_block["net2"]}"
network       = "${google_compute_network.vpmc.self_link}"
region        = "${var.subnetworks["net2"]}"
}

You might notice some variables in here instead of hard coded values ${var.foo[“bar”]}. We are using variables for the subnet CIDR blocks as well as the regions. This allows us to leverage the same code across multiple workloads, and set the variables accordingly.

You will also notice we use a reference to associate the subnets with the network. This is a requirement because as of the time we write the code, the network does not exist yet. ${foo.bar}.

Firewall Access

Next we will need a firewall policy to grant incoming connections to a server:

resource "google_compute_firewall" "http" {
name = "vpmc-http"
network = "${google_compute_network.vpmc.name}"

allow {
protocol = "tcp"
ports = ["80"]
}

source_ranges = ["0.0.0.0/0"]
target_tags = ["demo"]
}

By setting the target_tags, any instance with that tag will inherit the firewall policy.

Forwarding and Load Balancing

Next we need a front door with a public IP so we can hit our web site. Most web sites will be load balanced, so this code puts us in a position to run HA by spinning up multiple servers in multiple regions. I won’t go through it in detail, but it’s here for your reference:


resource "google_compute_http_health_check" "vpmc-healthcheck" {
name = "vpmc-healthcheck"
request_path = "/"
check_interval_sec = 30
healthy_threshold = 2
unhealthy_threshold = 6
timeout_sec = 10
}

resource "google_compute_target_pool" "vpmc-pool-demo" {
name = "vpmc-pool-demo"
health_checks = ["${google_compute_http_health_check.vpmc-healthcheck.name}"]
}

resource "google_compute_forwarding_rule" "vpmc-http-lb" {
name = "vpmc-http-lb"
target = "${google_compute_target_pool.vpmc-pool-demo.self_link}"
port_range = "80"
}

resource "google_compute_instance_group_manager" "vpmc-instance-manager-demo" {
name = "vpmc-instance-manager-demo"
description = "Hello Google Group"
base_instance_name = "vpmc-demo-instance"
instance_template = "${google_compute_instance_template.vpmc-template-demo.self_link}"
base_instance_name = "vpmc-instance-manager-demo"
zone = "${var.subnetworks["net1"]}-d"
target_pools = ["${google_compute_target_pool.vpmc-pool-demo.self_link}"]
target_size = 1

named_port {
name = "http"
port = 80
}
}

You’ll notice there is no IP address in here. That’s because google hasn’t assigned it yet. We’ll need to query it later to know where our site lives.

Our Web Server

Finally, we spin up a server and some associated resources. You’ll see that we boot from a default ubuntu template and configure the instance with a bootstrap script:

resource "google_compute_instance_template" "vpmc-template-demo" {
name_prefix = "vpmc-template-demo-"
description = "hello google template"
instance_description = "hello google"
machine_type = "n1-standard-1"
can_ip_forward = false
tags = ["demo"]
disk { source_image = "ubuntu-1404-trusty-v20160406" auto_delete = true boot = true } network_interface { subnetwork = "${google_compute_subnetwork.vpmc-us-east1.name}" access_config { // Ephemeral IP } } metadata { name = "demo" startup-script = <<SCRIPT #! /bin/bash sudo apt-get update sudo apt-get install -y apache2 echo '<!doctype html>
<h1>Hello Google!</h1>
' | sudo tee /var/www/html/index.html SCRIPT }   scheduling { automatic_restart = true on_host_maintenance = "MIGRATE" preemptible = false } service_account { scopes = ["userinfo-email", "compute-ro", "storage-ro"] } lifecycle { create_before_destroy = true } }

By adding the “demo” tag to the instance, we automatically associate it with the firewall rule we created earlier.

Google Authentication

In order to actually spin up an environment, we’ll need a google account, and we’ll need to give Terraform access to credentials. A simple test can be done with this code:


provider "google" {
credentials = "${file("/path/to/credentials.json")}"
project = "test-project-1303"
region = "us-east1"
}

Terraform Plan

Running terraform plan will prep us for running an apply. Terraform will look at our code, and compare the requested resources to the existing state file. New resources will be created. Absent resources will be destroyed. Plan tells us which changes will be made when the next apply is executed. It is especially for this reason that plan is so valuable. Let’s take a look at a portion of the response:

+ google_compute_subnetwork.vpmc-us-east1
    gateway_address: ""
    ip_cidr_range:   "10.1.0.0/18"
    name:            "vpmc-us-east1"
    network:         "${google_compute_network.vpmc.self_link}"
    region:          "us-east1"
    self_link:       ""
 
+ google_compute_target_pool.vpmc-pool-demo
    health_checks.#: "1"
    health_checks.0: "vpmc-healthcheck"
    instances.#:     ""
    name:            "vpmc-pool-demo"
    project:         ""
    region:          ""
    self_link:       ""
 
Plan: 16 to add, 0 to change, 0 to destroy.

Terraform Apply

Ok, plan told us all the changes Terraform will make when we run an apply. We approve of the list, so we run apply to make the changes. The response:

google_compute_address.vpmc-ip-demo: Creating...
  address:   "" => ""
  name:      "" => "vpmc-ip-demo"
  self_link: "" => ""
google_compute_network.vpmc: Creating...
 
...
 
google_compute_instance_group_manager.vpmc-instance-manager-demo: Still creating... (10s elapsed)
google_compute_instance_group_manager.vpmc-instance-manager-demo: Creation complete
 
Apply complete! Resources: 16 added, 0 changed, 0 destroyed.
 
The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.
 
State path: terraform.tfstate

We can now go to our Google Console, find the IP address for the forwarding rule that we created, and hit it in a web browser:

Automatic CMDB

ITIL processes recommend tracking all of our IT configuration items in a Configuration Management Database. The usual method to ensure this happens is to put in place a change control process. All changes go through the change control process, which includes updating the CMDB. As time goes by, human error tends to create drift between what is in the CMDB and what actually exists. This can cause difficulties in troubleshooting, and can create time-bombs like hardware that falls out of support, or production changes that have not been replicated to the DR environment.

Consider the .tfstate file that Terraform creates/updates after running an apply. It includes all of the details of the infrastructure “as built” by Terraform. This is, in effect, your CMDB. If Terraform is the only deployment tool used (this can be enforced with cloud API permissions), the accuracy of your CMDB is effectively 100%. Add this to the fact that this code can, sometimes in minutes, spin up from scratch a complete DR site (less your stateful data), and you can see the benefits. You also protect against needing ‘tribal knowledge’ to maintain your site.

In a Nutshell

The whole reason we didn’t treat infrastructure as code for many years is that we simply couldn’t. Cloud infrastructure API’s have completely abstracted the operator from the hardware. Although this creates constraints for the operator, it also creates opportunities. Infrastructure as Code is one of the major potential benefits of Cloud Infrastructure. If you aren’t doing it, you should be.

Foghorn Consulting offers FogOps, an alternative to managed services for cloud infrastructure. We build and manage our customer environments strictly with code, and our customers see the benefits in the form of lower ongoing management costs, higher availability, and more confidence in the infrastructure that powers their mission critical workloads.

Next Up – Pipelining your Infrastructure

There is a ton of additional benefits to treating infrastructure as code. In addition to having self documented, versioned, and re-useable infrastructure modules, we can extend our toolset to use CI/CD tools to build a full infrastructure test and deployment pipeline.  I’ll give an example in a future post.  Stay tuned!