As alluded to in a previous blog post, We Saw. We Hacked. We Conquered. this is the follow up with the architectural details.
Thorn builds technology to defend children from sexual abuse with the goal of bringing this tech to every company that needs it. Their work focuses on helping companies detect and remove child abuse content from their platform. For the Non-Profit Hackathon we focused on how to make removing known child sexual abuse content easy for everyone using AWS and S3.
Our solution had to account for these requirements:
- Easy to deploy
- Easy to use
Additionally, we wanted our solution to include the following features:
- Support large video and image files
- Support multiple notification subscribers
- Fast malicious object detection
- Automatically delete objects if enabled
- Support scanning all new objects
- Support configuring all new buckets
- Support scanning any existing bucket objects
- Support retroactive scanning when hashes change
So what did we build?
And how do you deploy it?
An AWS user launches a CloudFormation stack and provides some initial information as parameters:
- Bucket regex (what bucket filter criteria you want to use, defaults to *)
- File regex (what object filter criteria you want to use, defaults to *)
- Scan buckets? (does the user want to retroactively scan all existing buckets or not, defaults to no)
- Delete items? (does the user want to enable object deletion in addition to object notification, defaults to no)
- Email address (the email distribution group to subscribe the notifications to)
There is also a Slack webhook that can be updated to support pushing notifications to this AWS account’s Slack channel, this requires editing some variables in that lambda python script. If the user wishes to use SMS subscribers, they must launch in the us-east-1 region.
And what does it do?
I will describe the full feature state, though some of these workflows may not exist if the user elects to not scan buckets. Once deployed, the retroactive bucket scan starts via a CloudWatch Event Rule for a cron entry that kicks off a Lambda . Once this bucket scan is completed, the Lambda function disables the CloudWatch Event Rule. This initial Lambda function that finds all the existing buckets then triggers two additional Lambda functions; the first is the configure bucket Lambda, the second is the scan bucket Lambda.
The configure bucket Lambda has a very basic function, it takes a bucket via an invocation input and enables an S3 notification event for object creation. This event invokes the hash Lambda function.
The scan bucket lambda processes all the objects in the bucket and invokes the hash Lambda function.
The hash Lambda function is one of two main workflow streams where new object md5 hashes are created and stored in DynamoDB. Finally the hash Lambda function invokes the validate hash Lambda function.
The validate hash Lambda function is the core engine in this workflow. It queries DynamoDB, validates the hash against the known bad hashes and then notifies SNS (and deletes the object if enabled).
In addition the above retroactive workflow, our CloudFormation template also creates a CloudWatch Event Rule for new bucket creation which invokes the configure bucket Lambda function. This way any newly created buckets after the initial CloudFormation deployment will be picked up into the scanning workflow.
And what do the notifications look like?
At the core, we wanted the notification payload to easily be consumed by any upstream service. While we only integrated and demonstrated Email, SMS, and Slack, the JSON payload could very easily be consumed by a web service so custom applications could ingest and act on this information.
The basic email notification generates a message like the following:
The message includes a pre-signed URL to the S3 object for verification, this pre-signed URL expires for obvious reasons.
If Slack is configured, the Slack message looks like the following (with the same pre-signed URL structure):
I felt compelled to call out how great an experience the hackathon was. The day was long and the team was pretty much working nonstop up to the last 30 minutes before judging. But to think that something you contributed to could positively influence change for something so critical as this is truly rewarding. Lastly, this was me working on this post at re:invent.