A few years ago, I was working with a customer, whose application team solely used Cloudwatch Logs. As a heavily regulated company, their security team required a SIEM (Security Information and Event Management) to correlate, aggregate, monitor, and audit their AWS servers as well as their on-prem network. As a result, the security team implemented ElasticSearch (ES) & Kibana and mandated that all future application logs be delivered to the ES cluster. To make the transition easier, the security team provided a Lambda that you could subscribe to and a Cloudwatch Log Groups that would stream to the ES cluster.
In this blog post, I’ll share the solution we built to streamline Cloudwatch Logs into ElasticSearch in a cost effective way.
Due to the transition to a new SIEM, the application team needed a solution to stream future logs. A few key elements were taken into account:
- Given that the customer was under strict regulatory compliance, there were administrative policies prohibiting manual changes to the production environment. An exception would be granted under a break fix situation, and any changes needed to be automated and implemented via their IAC (Infrastructure as Code) of choice, Terraform.
- Within the customer’s AWS account, the application team had used many AWS services including API Gateway, AWS WAF, and EC2 with the CloudWatch agent installed. At the time, there were limited native integrations for many of these services. The only option was to stream logs to Cloudwatch Log Groups. Thus, requiring use of a Lambda.
- As an added challenge, the CloudWatch Logs Group from many of these services, were not generated by the application team’s IAC, but dynamically by AWS. As a result, Terraform was not able to add a subscription to the Cloudwatch Log Group, requiring a creative solution.
Since the customer used Terraform, we built a Terraform module that created the following AWS resources:
- Cloudwatch Event running every x minutes
- Environment Variables
- ElasticSearch Cluster ARN (Amazon Resource Number)
- Desired filter on Logs being sent
- Length of retention of logs
- Lambda Code
- Environment Variables
- ElasticSearch Cluster
The Lambda code was the heart of the solution. The code did the following:
- Gets all Cloudwatch Log Groups
- Check each LogGroup to see if there is a subscription
- If so check to see if is the filter want (hashed_name == filter_names), if not then remove it
- If there are no subscriptions or a subscription was removed
- Attach the subscription
- Put retention on the CloudWatch Logs group
By building this out in a Terraform module, not only did we meet the requirements of attaching the subscription to the ElasticSearch cluster in an automated fashion but also other application teams were able to utilize the module. With this solution, we ensured that we had subscribed all current Log Groups and caught any new Log Groups that were created and will be subscribed the next time the Lambda runs. As an added bonus, by utilizing a serverless Lambda solution, we were also able to keep the costs extremely low, driving effectiveness and efficiency. I hope this architecture helps those of you looking to build similar solutions. Please feel free to reach out for additional information or guidance.