All Aboard the Release Train

| | |

Release Train
Continuous deployment is great.  And in the case of development environments, continuous deployment is arguably a requirement. However, that can be problematic in situations where there is a large amount of coordinated effort to document a software release for regulatory compliance. To solve for this, we created a release workflow based on a gitops model leveraging Argo Workflows.  This solution had the following key benefits:
  • A single source of truth for the release per environment that is tracked and versioned in git
  • An approval process for a release through existing GitHub pull request and branch protection features
  • A single workflow that supported various code types and could be extended to support new code types
  • Automation that triggers on new commits and carries out the release

The Product & Teams

The product in this case is a mobile application, but consists of 100s of services which span numerous development teams, each with their own development schedules and velocity.

In this case, the release is a lot like a train map.  You have all these various routes which converge at select locations. The convergence point is the release.  Each of the routes are the various development teams working on their features for this release. I’m not going to get into how the teams decided what to include in their respective plans, but rather to highlight what happens once they converge.

Back to our map, we have the team responsible for the API Swagger (we’ll put them on the L line). We have the backend microservices team, in this case Java Springboot (they are the M line). We have the iOS and Android mobile teams (they are the N line).  And then we have the test automation teams authoring postman collections and test frameworks (the K line).  Last but not least we have the infrastructure team, who also owns the deployment of the release to non development environments (the J line). As these lines all hit the Powell station, the infrastructure team is now responsible for coordinating the 100 source repositories into a consolidated release.

Thankfully, the infrastructure team prefers a gitops style for their work. They already have a Kubernetes cluster running. The microservices are continuously deployed via ArgoCD in dev, and managed manually via ArgoCD in upstream environments.  Each non microservice repository does its own build and deploy to dev via repository-specific automation (GitHub Actions). This all works great to keep development teams working efficiently. However, at the Powell station, we are no longer continuously deploying new commits into the main branches.  We have to now coordinate the release. Enter Argo Workflows.  ​​Argo Workflows is an open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes (from their documentation).

The Release Workflow

At a high level, we have 3 components to this release workflow:

  1. A git-based definition of our release for Argo Workflows to use as the source of truth
  2. An Argo Workflow job that can process that definition and trigger our parallel jobs to process each source repository of varying code types (Lambda, API Gateway, S3 content, Web Portal code, Microservice Image Tags etc.)
  3. The code-specific job templates that can process each source repository of that code type (where each unique environment all use the same code-specific job template).

The Components – The Definition of the Release

Let’s dive into each component, first the definition of a release we can store in the infrastructure git repository. Think of this as the passenger manifest for the J line. We decided to use YAML and keep the contents as straightforward as possible.  We have a basic environment block (where this is releasing into) and then a list of product blocks (combining various source repository types with their respective product team).  Next is the product release version and a release candidate which is a simple incrementing number each time a product release changes in that environment.  Finally we have the actual artifacts to release, in our case a list of deployments followed by a list of images (containers):

 name: staging-us-east-2
 shortName: stage-us
   destinationTagPrefix: stage_us
 - name: product1
   releaseVersion: 1.9.0
   releaseCandidate: 2
     - gitRepoName: cloudfront-edge-s3-lambda
       gitRepoShortName: cf-edge-s3
       gitCommit: abc123…
       workflowBase: lambda-job
       deployTarget: lambda
     - gitRepoName: license-verification-lambda
       gitRepoShortName: lic-verf
       gitCommit: abc123…
       workflowBase: lambda-job
       deployTarget: lambda
     - gitRepoName: private-resources
       gitRepoShortName: resources
       gitCommit: abc123…
       workflowBase: s3-upload-resources
       deployTarget: s3
     - gitRepoName: public-assets
       gitRepoShortName: assets
       gitCommit: abc123…
       workflowBase: s3-upload-public-assets
       deployTarget: s3
     - gitRepoName: public-portal
       gitRepoShortName: portal
       gitCommit: abc123…
       workflowBase: s3-upload-portal
       deployTarget: s3
     - gitRepoName: main-aws-api-gateway-swagger
       gitRepoShortName: apigw-main
       gitCommit: 0930014f81f655d0c50f8172a575fc50128716bd
       workflowBase: api-gw-deploy
       deployTarget: apigw
     - repoName: product1/microservice-1
       sourceTag: dev_123
     - repoName: product1/microservice-2
       sourceTag: dev_456
     - repoName: product1/microservice-3
       sourceTag: dev_789

The Components – The Workflow of the Release

As a workflow, it looks something like this:
  1. A product team user creates a Pull Request with an updated release YAML file.
  2. The product lead validates the contents
  3. Infrastructure team (J Line) reviews and approves
  4. On merging to main branch, an Argo Trigger fires
  5. Argo Workflow launches the release deployer workflow, which takes the release YAML and then spawns off all the workflows as their own jobs:
    1. lambda jobs (1 per source repo of type lambda)
    2. s3 jobs (1 per source repo of type s3-upload, broken out into various types like private vs. public, ReactJS portal in S3, etc.)
    3. api gateway jobs (1 per source repo of type swagger deploying to API Gateway)
    4. image tagger job
      1. This job is unique as its purpose is to tag all the dev image versions that are part of the release with a new tag for the environment, in this case “stage_us”. So our dev_123 image is now also tagged “stage_us_123”
      2. This has a secondary value which is we can clearly see which images were deployed to upstream environments simply by looking at the registry in dev.
      3. In this instance, we aren’t actually using the dev container registry for non dev environments, we chose to use ECR replication to sync new images such that each environment can pull from their own local read-only copy of the registry.  That said in non ECR environments, we could have extended the features of the image tagger push to upstream registries.
  6. Argo CD now has new images to sync in stage-us, which can be done manually or automatically depending on the team preference.
    The release deployer is responsible for putting all the environment and release context into the parallel workflows such that we have a single definition of the lambda-job that all environments can use.  The release deployer then launches those workflows for that environment by applying them into the Argo cluster. Each of the workflow jobs uses the Git Commit ID and repository name to checkout the source code version we want, and then release that into the environment we have specified, in our case stage-us.

    The Components – Additional Workflow Benefits

    The release workflow is idempotent.  A user can re-run the same release definition over and over again and achieve the same result.  In fact, instead of adding logic to skip running a workflow which was already processed, we instead run all the workflows each time to ensure that the declared state in git is what is running any time a release is executed.

    The Declarative Release

    So this may all seem great, but you’re probably still wondering why bother?  Why not just have each source repository deploying the desired code.  Why add this layer of complexity?  In doing all this, the customer can now declare what they want (at the git commit level, or tag, or branch) in a release and then perform extensive quality control and documentation against that while the development teams move on to the next release.  Further, this declared release can easily be duplicated in any existing environment or new environment.  Since the source of truth is the release YAML, the manifest is itself a release package containing all of the release artifacts needed.

    In Conclusion

    This Argo Workflows-based release system provides a declarative gitops approach to coordinating numerous (and growing) components of a complicated product.  This solved the issue of release coordination and ensured the release definition and execution.  The customer can now more easily execute releases and have a greater confidence in what that release was composed of.  If you have difficulties with release coordination or are similarly constrained, give us a call to learn more.
    Foghorn Achieves AWS Security Competency

    Foghorn Achieves AWS Security Competency

    Foghorn Cloud Consulting Reaffirms Their Leading Position with AWS Security Competency Achievement [San Francisco, CA, December 14, 2023] - Foghorn Cloud Consulting, a long-standing AWS Premier Tier Partner, is proud to announce that they have once again achieved the...