It’s Alive! AWS Client-Side Monitoring

Background

Foghorn clients often ask us for help in defining a least-privilege environment for their existing applications. A significant piece of that goal involves defining AWS IAM policies that allow only the API calls necessary. In this post, I present one approach that we’ve recently added to our toolkit- a nifty open-source software package.

When creating a least-privilege IAM policy, identifying the necessary API calls for the Action stanzas can feel like a guessing game, or a tedious effort of trial and error. This is where AWS Client Side Monitoring can come in handy. That linked blog post is a quick read and provides a decent summary, but to paraphrase one key point:

Using this, you could potentially have as part of your unit testing, a record of every AWS call made, which could let you ensure that the IAM role of the application was restricted to only those privileges

Small steps, iterated quickly

This looks like a great fit for a common use case we have! How do I get started? As will most software development and infrastructure as code, I like to start small, and iterate once I have something working. So start by seeing whether we can observe the CSM events directly, on our local developer workstation.

local terminal to local terminal

Let’s start with no additional tooling, besides the AWS CLI. We can use netcat to listen for events on the CSM default port 31000:

$ nc -kluvw localhost 31000

Then, in another terminal, enable CSM and run any AWS CLI command (here, my go-to check of sts get-caller-identity):

$ export AWS_CSM_ENABLED=true

$ aws sts get-caller-identity

An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation: The security token included in the request is invalid.

The call failed because I don’t have an AWS profile set, but that’s fine. Netcat still receives the CSM event, so we now know CSM is working as intended!

two machines, same network

What I find really enticing about CSM is the ability to use a remote host for collection, the AWS_CSM_HOST environment variable. Once again, run netcat:

$ nc -kluvw 0 0.0.0.0 31000

Notice that instead of localhost, as we used in the local-to-local option, we’re now using 0.0.0.0 (listen on all interfaces). On another machine (laptop, PC) on the same local network, run the AWS CLI commands:

<!-- wp:paragraph -->
<p>$ export AWS_CSM_HOST=172.16.11.123 <em># other machine's IP</em></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>$ export AWS_CSM_ENABLED=true</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>$ aws sts get-caller-identity</p>
<!-- /wp:paragraph -->

Once again, the call fails (or if you use a valid default profile, it will succeed). But either way, the important bit here is that Netcat received the event — in this case, over the local network from the reporting machine.

iamlive utility

All of the prior examples are basic demonstrations of AWS CSM. However, netcat only gets us so far, and then we need a few lines of Bash to do more interesting work on the netcat output. Instead, let’s use a tool built for the purpose of inspecting CSM logs, the nascent iamlive from Foghorn’s friend and co-podcaster Ian Mckay.

Instead of netcat, we download the release of iamlive matching our computer architecture, and simply run $ ./iamlive

If you re-run the first example above (local-to-local) in two terminals, you’ll get clean output structured as a valid IAM policy document.

local terraform

Can we use this for more than CLI commands? Definitely! Let’s try observing CSM events from a local Terraform deployment. Use two terminals, in the first, run iamlive. In the other, enable CSM and then run terraform:

<!-- wp:paragraph -->
<p>$ export AWS_CSM_ENABLED=true</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>$ terraform init</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>$ terraform plan</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>$ terraform apply</p>
<!-- /wp:paragraph -->

Using iamlive in this way will generate an IAM policy document which serves as a starting-point for a least-privilege policy for that Terraform deployment (it should still be reviewed for additionally limiting Resources and adding conditions).

Video demonstrating what this looks like

Lambda to collection instance (bastion)

This is great! We now have a method to identify what AWS API calls are needed for a Terraform deployment (or really any activity in AWS). What if we want to use this approach for an AWS Lambda function, for instance? Maybe the simplest approach would be to run iamlive on a “collector” EC2 instance, such as a bastion host in your VPC.

As an aside, this is why we love Terraform at Foghorn: it allows quickly prototyping infrastructure of all sorts. To deploy this demo, I used my existing “popup” code, to deploy a VPC, a Lambda function, and a Bastion EC2 instance (or “jump host”):

<!-- wp:paragraph -->
<p>$ ls *.tf</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>bastion.tf lambda.tf main.tf versions.tf vpc.tf</p>
<!-- /wp:paragraph -->

With those pieces of infrastructure terraform apply’d, I’m ready to SSH to the bastion, and run iamlive there, to listen for the Lambda invocation. We can even hand the Bastion private IP in to the Lambda function’s environment variables, within our Terraform deployment:

module "lambda" {
  source         = "../lambda-module"
  # other attributes...
  environment_variables = {
    AWS_CSM_ENABLED = true
    AWS_CSM_HOST    = aws_instance.bastion.private_ip
  }
}

One additional value-add of Terraform is that we can use outputs in downstream shell commands:

ssh -l ubuntu $(terraform output -raw bastion_ip)

One small problem

However, for the collector EC2 instance to receive metrics from Lambda (or anywhere in my VPC really), we need one additional piece. Recall that in the local-network, remote machine example above, we had to change the netcat command from using localhost to using 0.0.0.0. The iamlive tool also needs to listen for remote hosts. At present, it doesn’t expose host address as an option.

And this is why I love open-source software: I created a small pull request to add that command-line flag! While waiting for the project maintainer(s) to review the PR, I recompiled the iamlive binary myself.

Since iamlive requires Go version 1.16 or above: This project requires Go 1.16 or above to be built correctly (due to embedding feature). We have to be sure to use an (unstable) 1.16.x release. My existing Go executable:

$ go version
go version go1.15.7 darwin/amd64

Versus the unstable 1.16rc1 which I just installed:

$ /usr/local/go/bin/go version
go version go1.16rc1 darwin/amd64

Since I’m working on an Apple laptop, but targeting an Ubuntu EC2 instance, I need to make sure the binary is built for the correct target architecture. Fortunately, Go makes it really easy to create a cross-platform build:

$ export GOOS=linux
$ export GOARCH=386
$ /usr/local/go/bin/go build

$ file iamlive
iamlive: ELF 32-bit LSB executable

The main event

Now we’re ready to rumble! Let’s copy this newly compiled binary to the bastion:

scp iamlive ubuntu@$(terraform output -raw bastion_ip):

Recalling that we set the Lambda function’s environment variables AWS_CSM_ENABLED and AWS_CSM_HOST via Terraform above, all that’s left is to test the Lambda function (through the web console, for now). I made a quick video demonstration of what this looks like. The bastion host running iamlive is on the left, and the Lambda web console on the right.

from os import environ
import boto3

def lambda_handler(event, context):
    client = boto3.client("sts")
    response = client.get_caller_identity()

    return {
        "isBase64Encoded": False,
        "statusCode": 200,
        "headers": {},
        "body": f"OK: {response['UserId']}",
    }

For this demo, the Lambda function code is trivially simple — we just want to induce any AWS API call, so again we use get-caller-identity. The complete Python code for AWS Lambda is as follows:

And the IAM policy that iamlive generated from this example, copied directly out of my terminal, is as follows:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sts:GetCallerIdentity"
            ],
            "Resource": "*"
        }
    ]
}

Final Thoughts

In my day to day work, I assist companies with large-scale design and build activities, and work with both Foghorn engineering teams and our peers on the customer’s side. These are large projects involving many resources, sometimes spanning months of effort. It’s really important to me to stay close to the work, and a simple exploration like the one above helps me understand the nuances of implementation, which in turn informs how I scope even the largest projects.

Previous

Next