Get the most out of your VPC


A brief history of VPC

Amazon Web Services introduced the Virtual Private Cloud for general availability back in 2009, but VPC has undergone a major transformation. VPC was originally designed to meet the requirements of enterprise customers with legacy applications and hybrid operating environments.  It only supported a subset of AWS features, and opened AWS as an option for many workloads that previously could not run on cloud infrastructure.  Fast forward to January of 2014, when AWS stopped offering “EC2-Classic”, or EC2 outside of a VPC, for new accounts.  Now not only does VPC support all features of EC2 Classic, it offers many features not available otherwise.  Clearly VPC is the future of EC2.   It’s not hard to use, in fact all new accounts come with a default VPC.  I’m not going to make this post a beginners guide to VPC.  Instead, I’d like to share a few different ways that you can get the most out of your VPC.

Plan before you build, document what you plan

Although the default VPC is a good start, you are severely limiting the benefits you can realize from the available tools.  As you begin to build, there is no tool in the AWS portal that allows you to visualize what you have designed.  If you are ‘building as you go’, it will become increasingly difficult to visualize your environment.  When designing VPCs for our customers, we always do so with network diagrams.  These diagrams can quickly illustrate the purpose for each component, and ensure that what we are building will meet our customers’ needs.  We diagram both at the infrastructure and at the data flow level.  These diagrams come in handy long after we build the environment, as references to allow any competent admin to quickly get up to speed and manage systems or troubleshoot issues.

Security vs Functionality

Many of the VPC tools can be used to increase security, but make sure you are using the tools as they are intended. With tools like network ACLs and Security groups available to you, it might not make a lot of sense to try to ‘lock down’ your internal subnets by crippling route tables between them.  Belt and suspenders are great, and security should be implemented in layers, but avoid crippling the flexibility of your enviornment in exchange for little (if any) additional security.

Minimize instances in public subnets

One of the best ways to minimize exposure is to minimize instances that run in public subnets. No public IP drastically reduces the threat vectors that your servers are exposed to.  There are great reasons to run instances in public subnets, but you should always be asking, “Does this box really need a public IP?”  We’ve built many environments where the only resources running in public subnets are elastic load balancers, NAT, and security devices.  Do you need more than that?  Challenge yourself and your team to decide.

NAT as a bottleneck

Ok, you have a great design, with most of your instances in private subnets.  Traffic begins picking up, and your workloads are starting to slow down.  The confusing part is that the boxes that are responsible for the tasks that are slowing down seem to be running just fine.  Memory, CPU, network, IO.  Every monitor is showing green.  Time to follow the data trail.  If your servers are using services like S3, or other region based services, your instance will be communicating with the public endpoint of those services, which means that the traffic will need to flow through your NAT.  Time to either ditch those T1 micros, or consider isolating the service and moving it to a public subnet. Also consider the potential for your NAT to be a single point of failure if your production servers need to initiate connections with public endpoints.

 Protecting you from … yourself

We all think of security groups as a layer of protection from hackers.  Same goes for network ACLs. But many times the biggest threat to our production environment is accidental access.  Accidentally leaving a production hostname in a staging deployment, or vica-versa, can cause down time or irreversible damage to production data.  Consider using these tools to isolate production from staging, as well as from evil doers.  To make it very easy to track, we often refer to security groups within security group policies (i.e. Only servers in the Production-Web security group can access the Production-DB security group on port 3306, etc.)

 When one VPC just isn’t good enough

AWS offers many fine grained controls to allow you to limit access to specific resources, but coverage is not universal.  Because of this, you may find yourself struggling to create seemingly simple permissions policies that allow users to freely experiment and test while still protecting production workloads from accidental portal clicks.  In many cases, the solution is multiple VPCs.  With the new VPC peering features, you can construct elegant multi-vpc infrastructures and more easily offer a flexible environment where developers are free to experiment without worrying about accidentally making the front page of slashdot.

 Configuration Management via Cloud Formation

We’ve found cloud formation a great tool to use to deliver custom VPCs to our customer’s accounts.  We can build and test without interfering with their account, and once complete, we can deliver in a format that allows our customers to easily rebuild their environment from scratch without relying on us.  The speed at which a cloud formation template builds allows these templates to double as a great DR tool, allowing our customers to quickly rebuild their environment in a different region should a regional outage occur.

How can you benefit?

Although most VPCs are about 80% alike, it’s the 20% that is unique to your environment that can really unlock the potential of AWS for your specific workloads.  Want to learn more about how Foghorn can help you with your cloud initiative?  Give us a call!