Democratizing Security

| | | 0 comments

I was quite happy at the AWS Summit Santa Clara when keynote speaker Dr. Werner Vogels, while wearing a t-shirt labeled “Encrypt Everything”, stood in front of a slide that read “Security is everyone’s job”.  The slide is after all 100% right. The challenge though is not realizing security is your job, it’s simplifying doing your job more securely.

We frequently see two common patterns related to security and public cloud.  

  • Established companies with existing security policies and procedures usually act as a checkpoint and blocker for deploying production workloads.  The workflow tends to follow this basic premise: developers work tirelessly getting a workload ready for production, it goes into security review at which point (and usually for the first time) many of the components are audited from a security context.  At this point the developers (and likely also operations or SRE) are added to work on remediation. This feedback loop (which can be quite long) is not the real problem, nor is it that security is protecting production deployments.
  • The second pattern we run into is customers looking for outside security auditing of a deployed workload.  In either of these cases though, the pain of security remediation is that it’s happening so late in the lifecycle. That’s the problem.

When given the opportunity we try to push security as early in the release pipelines as we can, this is referred to as shift-left security.  Integrate it into build systems so it’s no longer a blocker, but simply a part of the normal build cycle. Test early, test often, release frequently.  Everyone can’t be an expert in security though, so what can we specifically do to help democratise security?

Whether the workload is containerized or on VMs, we advocate that customers leverage an immutable build pipeline.  While this is not the hammer to rule all nails, it has a lot of advantages. Further, these pipelines makes it easier to architect security as build stages enabling security teams as well as developers to work independently on their own priorities, but still collaboratively with respect to security (note I’m avoiding the term DevSecOps).  So what does this all look like? The key components of audit and remediation are at a basic level built on scan, compare values to a known database, report on the status and then remediate.

We can take existing and open source tools to script pipeline stages to integrate security into our aforementioned immutable pipeline.  We deploy a CICD platform (in this case let’s assume Jenkins), which creates our AMI and container image. Upon completion of those jobs, we spawn off security pipelines to scan those artifacts (AMI scanned via Inspector, container image scanned via Klar).  We then notify slack when new scans are ready for review. In our next iteration, we will parse for critical vulnerabilities and send that in the slack message:

So let’s take a look at this in detail:

Take the following basic example where our pipeline is using Packer to execute configuration management code to build public cloud VM images (in this case an AWS AMI).  We add a downstream pipeline on successful packer build that triggers an AWS Inspector scan of the image. This is handled by triggering Terraform to create the Inspector resources required, we use a userdata script to install the agent (if it’s not part of the AMI build already), and we run all supported tests and post to slack when the report is ready.  This AMI can be picked up and used by other services of the pipeline either before or after the scan runs depending on the criticality of the updated release.

If you graduate this basic example to a more complex environment where the security team is tasked with providing hardened and instrumented base images for development teams to build from, you can start to imagine a factory where development team pipelines simply lookup the latest security approved image for the OS distribution (think Terraform data resource) and can always be developing against the approved and hardened base image.  Teams can then elect to re-use these scan pipelines by security to pass in their completed images for scanning (when they are ready to add remediation work to their next development planning session).

This same workflow applies to containers as well.  A pipeline stage for running container scanning (e.g. Klar), can easily be added to scan for vulnerabilities with each build (or prior to deploying images to certain environments).  All this CICD security work doesn’t negate the need for analyzing deployed environments, but it offers security auditing as a build service, increasing availability.

The amount of tools you can execute as build stages related to security are only limited by how long of a build job you are willing to tolerate.  AWS Inspector is one example, you could run linters, integration tests, infrastructure tests, compliance analysis, etc. Treating each component as a pipeline stage makes building custom security pipelines easy.  Development teams can opt in to those that are most meaningful, again increasing availability of security features to development teams.

Extending this modular pipeline approach to security services, we can start to create more complex systems that leverage the pipelines.  Perhaps the choice to use security hardened AMIs is not strictly enforced, you could have a serverless function (e.g. AWS Lambda) describe EC2 instances to compile a list of unique source AMI IDs.  These IDs could be compared to a custom store of security approved IDs (e.g. DynamoDB table) and if an anomalous ID is discovered, it could post to a security job queue (e.g. AWS SQS) to trigger a one off security pipeline where the un-approved AMI ID is scanned for out-of-band analysis.  The purpose of a system such as this is again not to think of security as a barrier, but as a service. There could be a good reason an unapproved AMI was used (e.g. a developer testing a new service or tool), or it could simply have been a user mistake. Restricting what users can do creates friction which slows creativity and velocity.  Trust-but-verify allows both development creativity and security oversight.