Architecting for Cost Optimization in the Cloud

| | |

Welcome Back!

This is the 4th post in our series on controlling and optimizing costs in the cloud.  If this is your first time here, I invite you to take a look at the previous posts in this series, Cost Control Analysis Realizes Valuable ROI,  Monitoring, Pruning, RightSizing and Reporting on Cloud Resources, and Optimizing Purchasing Options in the Cloud.

Architecting for Cost Optimization in the Cloud

Prospective and existing clients often approach us to discuss optimizing an existing cloud-based application. They are usually trying to implement performance and security improvements, and are looking for cost savings as well. Trying to re-architect on-the-fly can be a daunting task due to high visibility and perceived risk that comes with modifying a working application.

We typically start these engagements with a discovery and interview process. We meet with application architects, developers, dev-ops staff, and business sponsors. We ask them to explain the current application architecture and workflow and tell us their biggest pain points.  By interviewing stakeholders with a variety of roles and backgrounds, we are able to get a better understanding of both the application profile and the client’s business needs and wants. Collaborating with the business in this way results in a higher likelihood of success.

When we are wearing our architects and designer hats, we find we can get right to the heart of the matter by asking the following questions:

  1. Can you make use of more cloud managed services?
  2. What workloads can you automate and move from managed-pets to cattle?
  3. What workloads can utilize dynamic scaling, or can be transformed to tolerate lost resources?

Typical client responses:

  1. “We want to migrate from running actively managed databases to a cloud-managed database service, or we want to use a SaaS logging service for log aggregation, analysis, and alerting.”
  2. “We want to be able to deploy to our app several times a day without downtime.”
  3. “We were looking at using spot instances to cut our records/ data processing time down and save costs.”

Helping our clients solve these types of problems produces tangible benefits, both in terms of capabilities and cost optimization. For example, if we help a client automate and streamline code deployments, they can ship code much faster and spend less time in administrative overhead managing multiple /manual deployment processes.

Once discovery is completed, the real fun begins. We start the process of implementation.

Let’s examine each of the example client responses above in more detail, and see how we help make improvements.

“We want to migrate from running actively managed databases to a cloud-managed database service, or we want to use a SaaS logging service for log aggregation, analysis, and alerting.”

I often see client deployments that are running in the cloud, but aren’t taking full advantage of the cloud’s capabilities. For example, in AWS, I’ve seen multiple environments running postgres and/or mysql databases on dedicated EC2 instances. When I ask the account owners why they aren’t running on RDS, their answers are typically something along the lines of: “Well the app was deployed some time ago”, or “We wanted to try and save costs by running multiple services on one server”, etc. They are failing to take advantage of all of the value-adds that a service like RDS provides. 
Automated snapshots/backups, automated failover if they run multi-AZ, easily create read replica, automated minor version upgrades, etc. When the costs and management overhead are factored into the above capabilities, it quickly becomes apparent that taking advantage of the managed services offering is the better deal.

The same example works for building a log aggregation solution versus using one of the several excellent cloud-native SaaS offerings in this domain. Elastic, DataDog, Splunk, Loggly, etc.

AS AWS is fond of saying, let the cloud do the undifferentiated heavy lifting for you, so you can focus on what you do best.

By far and away the biggest cost savings this approach offers to our clients is freeing up administration time for their developers and dev-ops staff.

“We want to be able to deploy to our app several times a day without downtime.”

This is a pretty common request across several of our clients. Clients realize without automated deployment capabilities, they are spending way too much time managing, verifying and fixing deployment jobs. This consumes a lot of valuable time for developers, QA, and devops staff.

We also often see them running multiple versions of their app across different environments, such as development, QA, staging and production. Over time each environment usually has some drift. They may use slightly different deployment processes for each environment, or they may still use some manual deployment processes. They may not have standardized on a way to securely store and pass in environment parameters.  

If they have the skill-set and are open to a re-architecture, we may recommend building out a micro-services version of their application using technologies like Kubernetes or Docker, and using managed services such as Amazon’s EKS or ECS. Building out micro-services is a powerful solution because the underlying orchestration engine can manage capabilities such as auto-scaling, load-balancing and failure recovery. This also allows them to run the new and old environments side-by-side, enabling them to focus on updating and migrating single services one at a time.  

We usually see clients have hard and soft cost savings when they transition to micro-services.  Hard-dollar savings are realized when they can run the same number or more services on less compute instances.  Soft-dollar savings are realized when they spend less time tending to the care and feeding of “managed-pets”, a.k.a individual-instances, and instead manage a fleet of “cattle”. By “cattle” I mean largely undifferentiated instances that the orchestration engine manages, creating and deleting instances to meet scaling thresholds.

But regardless of running with “managed-pets” or “cattle”, automated deployments can bring innumerable benefits, including increased efficiency and collaboration, and quicker time to diagnose and repair issues.

We often incorporate infrastructure as code into CI/CD deployments, by having the CI/CD pipeline run infrastructure as code updates, (for example, with a Terraform apply.) The CI/CD pipeline may also run an image bakery, which pulls the latest image, along with the latest code build, so that there is one artifact to deploy. By utilizing git repos, tags, and environment variables, we are often able to apply the same build process across multiple accounts and environments. This creates great benefits and cost reductions for our clients. Combine all of this with a blue-green style deployment, and the application will stay online even when a deployment fails.

“We were looking at using spot instances to cut down both our records processing time and cost.”

This seems to be a pretty universal expectation of IT across all organizations, “do more, with less”. When our clients approach us with a request like this, we usually focus on how the application itself can be re-architected.  By understanding the current application and its workflow, we can look for areas to improve performance and cut costs. What parts of the applications can move to serverless? Can nightly or batch processing be moved to utilize spot-instances (Amazon and Azure, preemptible VMs at GCP), which can offer 80 or 90% cost reductions compared to on-demand instances? The initial answer to this question is almost always a “No.”, but on further review, and by digging into business requirements, we find that many applications can, in fact, move data-processing from a real-time to a near-time processing model. By introducing queues and data-pipelines, ephemeral instances can be utilized to replace more expensive on-demand or reserved instances. 

I hope the above process and examples have been illustrative as to how Foghorn approaches collaborating with our clients, and works with them to architect and build solutions that add lasting value to their operations.

If you have some upcoming cloud initiatives or existing applications, and would like some help with architecture updates and cost savings, please reach out to us.  We’d love to hear from you.

Cloud Cost Models

Cloud Cost Models

Building a viable budget for cloud expenses is probably a simple matter for individuals who want a persistent record of their family photos.  Business cloud expenses tend to be a bit more complex, however, especially for companies that operate using only cloud-based...

Fargate Costs

Fargate Costs

Amazon Web Services (AWS) has dozens of tools in its belt that allow businesses to leverage cloud computing for more efficient infrastructure and extended reach. AWS Fargate fits in the modern mold of a pay-as-you-go SaaS solution, which provides infrastructure and...