A while back AWS EBS encryption moved to using KMS (Key Management Service). This was a welcome change as KMS is a great service that enables some interesting security models around different AWS customers sharing KMS keys and allowing each other to encrypt items they may hold on their behalf. That said, we did find something interesting that eludes to what is going on behind the scenes.
tl;dr, if you are using KMS, you can’t whitelist by SourceIP in IAM policies (unless you whitelist all of AWS’ public IP space, including other AWS customers).
One of our customers was using custom AMIs with encrypted EBS volumes which were originally created prior to the release of KMS. These AMIs were launched via knife-ec2 as this customer uses Chef. We were provisioning new instances and the instances came up in Terminated state. The State Transition Reason was Internal Server Error.
This led is through a series of follow up tasks to see what changed and why our AMI that was widely used in dev/test/prod was failing. Those tasks (in order) were:
- Launch the AMI manually in the AWS console, this worked
- Launch a community AMI via knife-ec2 (no EBS encryption), this worked
- Create a new custom AMI using the same EBS encryption (built off the latest Ubuntu 14.04 AMI), this had the same failure condition in knife-ec2, but worked in the AWS console
- Create a new custom AMI not using EBS encryption and launch via knife-ec2, this worked
At this point we started working with Chef support and the knife-ec2 team to see if this was an issue with the tool or with us. Those teams were not able to reproduce our issues so we decided to start looking in less obvious places.
We started looking at security in more detail. It was odd that the AWS console worked to launch the AMI but not the knife-ec2 command which uses AWS access keys instead of an AWS console login. Furthermore, we had not made any changes to security settings in IAM for a long time, so it was not an obvious place to look. For this customer, AWS console logins are using on premise active directory federation and user logins are assuming roles. The IAM users (console login not enabled) who have AWS access keys (for the sole purpose of knife-ec2 provisioning) were using IAM groups. Those IAM groups were more restricted than the IAM roles (for federated console logins), but in both cases, access to KMS was allowed in IAM.
So we launched the encrypted EBS AMI using the aws cli run-instances command instead of knife-ec2 to see if the problem followed the use of the AWS access keys or if it stayed with knife-ec2. The aws cli replicated the same end state as knife-ec2 (I should have tried the CLI before reaching out to the Chef / knife-ec2 team!).
That finally led us to the one IAM policy that was unique to IAM users with AWS access keys. This policy was a blanket deny if certain conditions were not met. Those conditions were:
- The AWS access keys were being used from an instance inside our VPC
- The AWS access keys were being used from a Source IP that is owned by the customer
- The AWS access keys were being used from a Source IP that is our AWS EIPs used for NAT
I removed this deny condition and tested using the aws cli using the the encrypted EBS AMI. This worked. I tested the same AMI using knife-ec2, this worked.
So back to what is going on behind the scenes at KMS…
We opened a ticket with AWS to confirm that the KMS service is acting on our behalf, which was confirmed. This means that unless we open our whitelist policy to the entire AWS global IP range, the calls will fail. We also asked if we could get the public IP range for KMS to add them to our whitelist policy, which AWS was not able to provide. Opening the whitelist policy in order to leverage KMS is a compromise in our security posture, but required to leverage the service.
Denying actions when conditions aren’t met, like Source IP or VPC ID are very powerful. But they also bring you back to the reality of the public service that is AWS. While it would be great if AWS could group their public IP usage by service (like they do in some cases), it’s obvious why that would be difficult for them to do for all services (especially those that leverage EC2). An alternate solution would be for AWS to offer a VPC endpoint for KMS, but this has not happened yet. Keep AWS in mind when you start thinking about how to impose global security controls.
Need help with your AWS security?
Although this example is very specific, providing increased security layers can be a powerful tool when implemented correctly. Foghorn is here to help you with your cloud security, don’t hesitate to reach out!