- A single source of truth for the release per environment that is tracked and versioned in git
- An approval process for a release through existing GitHub pull request and branch protection features
- A single workflow that supported various code types and could be extended to support new code types
- Automation that triggers on new commits and carries out the release
The Product & Teams
The product in this case is a mobile application, but consists of 100s of services which span numerous development teams, each with their own development schedules and velocity.In this case, the release is a lot like a train map. You have all these various routes which converge at select locations. The convergence point is the release. Each of the routes are the various development teams working on their features for this release. I’m not going to get into how the teams decided what to include in their respective plans, but rather to highlight what happens once they converge.
Back to our map, we have the team responsible for the API Swagger (we’ll put them on the L line). We have the backend microservices team, in this case Java Springboot (they are the M line). We have the iOS and Android mobile teams (they are the N line). And then we have the test automation teams authoring postman collections and test frameworks (the K line). Last but not least we have the infrastructure team, who also owns the deployment of the release to non development environments (the J line). As these lines all hit the Powell station, the infrastructure team is now responsible for coordinating the 100 source repositories into a consolidated release.
Thankfully, the infrastructure team prefers a gitops style for their work. They already have a Kubernetes cluster running. The microservices are continuously deployed via ArgoCD in dev, and managed manually via ArgoCD in upstream environments. Each non microservice repository does its own build and deploy to dev via repository-specific automation (GitHub Actions). This all works great to keep development teams working efficiently. However, at the Powell station, we are no longer continuously deploying new commits into the main branches. We have to now coordinate the release. Enter Argo Workflows. Argo Workflows is an open-source container-native workflow engine for orchestrating parallel jobs on Kubernetes (from their documentation).
The Release Workflow
At a high level, we have 3 components to this release workflow:
- A git-based definition of our release for Argo Workflows to use as the source of truth
- An Argo Workflow job that can process that definition and trigger our parallel jobs to process each source repository of varying code types (Lambda, API Gateway, S3 content, Web Portal code, Microservice Image Tags etc.)
- The code-specific job templates that can process each source repository of that code type (where each unique environment all use the same code-specific job template).
The Components – The Definition of the Release
Let’s dive into each component, first the definition of a release we can store in the infrastructure git repository. Think of this as the passenger manifest for the J line. We decided to use YAML and keep the contents as straightforward as possible. We have a basic environment block (where this is releasing into) and then a list of product blocks (combining various source repository types with their respective product team). Next is the product release version and a release candidate which is a simple incrementing number each time a product release changes in that environment. Finally we have the actual artifacts to release, in our case a list of deployments followed by a list of images (containers):
environment:
name: staging-us-east-2
shortName: stage-us
images:
destinationTagPrefix: stage_us
products:
- name: product1
releaseVersion: 1.9.0
releaseCandidate: 2
releaseArtifacts:
deployments:
- gitRepoName: cloudfront-edge-s3-lambda
gitRepoShortName: cf-edge-s3
gitCommit: abc123…
workflowBase: lambda-job
deployTarget: lambda
- gitRepoName: license-verification-lambda
gitRepoShortName: lic-verf
gitCommit: abc123…
workflowBase: lambda-job
deployTarget: lambda
- gitRepoName: private-resources
gitRepoShortName: resources
gitCommit: abc123…
workflowBase: s3-upload-resources
deployTarget: s3
- gitRepoName: public-assets
gitRepoShortName: assets
gitCommit: abc123…
workflowBase: s3-upload-public-assets
deployTarget: s3
- gitRepoName: public-portal
gitRepoShortName: portal
gitCommit: abc123…
workflowBase: s3-upload-portal
deployTarget: s3
- gitRepoName: main-aws-api-gateway-swagger
gitRepoShortName: apigw-main
gitCommit: 0930014f81f655d0c50f8172a575fc50128716bd
workflowBase: api-gw-deploy
deployTarget: apigw
images:
- repoName: product1/microservice-1
sourceTag: dev_123
- repoName: product1/microservice-2
sourceTag: dev_456
- repoName: product1/microservice-3
sourceTag: dev_789
The Components – The Workflow of the Release
As a workflow, it looks something like this:- A product team user creates a Pull Request with an updated release YAML file.
- The product lead validates the contents
- Infrastructure team (J Line) reviews and approves
- On merging to main branch, an Argo Trigger fires
- Argo Workflow launches the release deployer workflow, which takes the release YAML and then spawns off all the workflows as their own jobs:
- lambda jobs (1 per source repo of type lambda)
- s3 jobs (1 per source repo of type s3-upload, broken out into various types like private vs. public, ReactJS portal in S3, etc.)
- api gateway jobs (1 per source repo of type swagger deploying to API Gateway)
- image tagger job
- This job is unique as its purpose is to tag all the dev image versions that are part of the release with a new tag for the environment, in this case “stage_us”. So our dev_123 image is now also tagged “stage_us_123”
- This has a secondary value which is we can clearly see which images were deployed to upstream environments simply by looking at the registry in dev.
- In this instance, we aren’t actually using the dev container registry for non dev environments, we chose to use ECR replication to sync new images such that each environment can pull from their own local read-only copy of the registry. That said in non ECR environments, we could have extended the features of the image tagger push to upstream registries.
- Argo CD now has new images to sync in stage-us, which can be done manually or automatically depending on the team preference.