Capacity Management – Cloud Style

| | | 0 comments

The Perfect Capacity Curve

We’ve all seen the picture which clearly depicts the easiest cloud computing benefit to visualize – capacity on demand. Here it is one more time:

At first this image invokes feelings of joy – nirvana!  No longer do we need to worry about capacity management.  No more forecasting demand months ahead of time.  No more watching expensive hardware depreciate in the datacenter when demand is slower than expected.  No more dire consequences if demand exceeds planned capacity. Infrastructure, when we need it, adjusted hourly.

So, if we move to public cloud infrastructure, can we ditch the ITIL processes for capacity management?

Reasons for Capacity Management Reviewed

Why do we worry about capacity management?  The two main reasons to build capabilities around capacity management are:

  1. Ensure availability of our applications
  2. Minimize cost of infrastructure required to do so

No surprises here.  So, let’s revisit the question.. Do we need to worry about capacity management if we are using public cloud as our infrastructure?  At first glance, it appears that we are in pretty good shape.  For simplicity, we’ll assume that our infrastructure deployment has been fully automated so that our use of on demand IaaS can exactly match our capacity requirements.  In this case, our applications will always be available, and we will only be procuring exactly the amount of infrastructure needed.

But something is missing.  We succeeded in purchasing the minimum volume of infrastructure required to ensure application availability, but the goal was to minimize cost, not volume.  So, have we minimized cost?  Well, that depends on the pricing options that our cloud infrastructure provider offers.

Cloud Pricing Models

Just as conventional enterprise data centers and managed hosting providers tightly manage their capacity, so do public cloud providers.  The more stable and forecast-able the demand, the higher the utilization, and the lower the cost becomes for unused capacity.   Cloud providers have quickly learned that they can offer lower prices if their customers are willing to guarantee usage.  If you are willing to guarantee some level of usage, you will be rewarded with lower cost.  The more the volume guarantee, and the longer the term guarantee, the better the price.  As an example, I’ve listed hourly pricing for an AWS EC2 M1 medium Linux instance at various price and term commitment levels (us-east):

Volume Term Hourly Price
None None $0.12
Light 1 year $0.068
Light 3 year $0.054
Medium 1 year $0.042
Medium 3 year $0.033
Heavy 1 year $0.028
Heavy 3 year $0.023

Going back to our example of only committing to pay for what you use, along with no term or volume commitment comes the highest hourly price.  So, by minimizing the volume of cloud resources purchased in the model above, have we actually minimized our cost?  As you can see, if we have stable, forecast-able workloads, we can save up to 80% over on-demand pricing.  But the cost of this savings is lack of flexibility.  You are locked in to a 3 year commitment, very similar to purchasing your own hardware (although probably at a much lower cost).  Enter capacity management for the public cloud.

Public Cloud Capacity Management Concepts

By being smart about how you purchase public cloud infrastructure, you can greatly accelerate your savings, and free up huge chunks of infrastructure budget to use for more innovative IT initiatives.  I’ve given a few tips below:

1. Standardize compute resources used

There are dozens of different EC2 configurations to choose from, and each of your workloads might be optimal for a different configuration.  But volume and term pricing usually applies to a specific configuration.  By standardizing on a few configurations, your committed instances become re-usable, and you will be able to commit to higher volumes with less risk of unused capacity.

2. Track your usage trends

By understanding your past usage, you will quickly become comfortable committing to term and volume in exchange for lower pricing.  You can quickly commit to your annual minimum usage, but you can do better!  In most cases, the least expensive combination of on-demand and committed reservations results in some overcapacity.  By trusting your spreadsheets, you’ll quickly become comfortable over-committing to resources, confident that you are actually minimizing your cost by doing so.

3. Plan your batch workloads

If you’ve done a good job at minimizing your cost as described above, you very well may have unused capacity at certain times of the day, week, or month.  You can view this capacity as “free compute”, since it costs more to not buy it than it does to buy it.  If you have batch workloads, you can leverage this “free compute” to further reduce your infrastructure requirements and optimize your cost.

4. Play the market

All of the options we’ve discussed so far involve the buyer committing to pay for resources, which requires the supplier to commit to provide those resources.  The more we commit to pay for, the better the value.  But can we do better?  In order to guarantee availability of resources, our IaaS vendor has to over-provision to some degree.   At some point it becomes obvious that these resources will not be purchased, and the provider may want to sell these at firesale prices at the last minute.  Consider it similar to working for an airlines and flying standby.  If there is a seat available, you pay almost nothing, but if the flight sells out, you don’t get on the plane.

AWS offers ‘spot instances’, a type of cloud compute resource whose price is determined by supply and demand, and can change on an hour by hour basis, depending on utilization in the region.  At the time of this article, a medium spot instance in US East region was $0.013 / hour, almost a 50% discount from committing to full utilization for a 3 year term, all without any volume or term commitments.  The catch?  It might not be there when you need it, or the price may change.  Prices have been known to spike to several dollars per hour for spot instances, much higher than the on demand price.  If your bid is below the current market price, you lose your instance! Probably not a smart idea to use a spot instance as your SMTP gateway, but by getting creative you can use spot instances to further lower your infrastructure costs.

5. Sell your Over Subscriptions

Uh oh, bought too many reserved instances, and now you are stuck paying for resources you aren’t using?  All is not lost.  Often (as is the case with AWS) you can sell your reserved instances to others and recoup most of your investment, minimizing the risk of over committing, and allowing you to be more aggressive in your procurement strategy.

Conclusion

As we’ve discovered, although capacity management in the cloud is not required to ensure availability of service, it is key to optimizing your cost.  The process and capabilities are different from conventional capacity management, but the rewards can be great, offering you savings of up to 80% of your infrastructure cost.  If you are going to move to the cloud,  a solid capacity management strategy and process are well worth the effort to develop.