Reducing Attack Surface with AWS Allowlisting

Mar 07, 23
Reducing Attack Surface with AWS Allowlisting

AWS currently has somewhere between two- and three-hundred unique services, thirty-going-on-thirty-five regions and almost thirteen thousand unique APIs.

Not to mention the “The 17 Ways to Run Containers on AWS

This variety offers companies building on the platform enormous flexibility on architecture and deployment. It also contains numerous point solutions like Ground Station (satellite management as a service) and Braket (quantum computing).

Risks Abound

This breadth is a double-edged sword. AWS’s services operate on an opt-out model, and are enabled by default. Regions followed the same model prior to March 20, 2019, at which point all new regions were made opt-in.

These decisions mean that there is a set of risks to customers across unused services and regions.

By way of comparison, GCP services are disabled by default and require explicit adoption.

Attackers take advantage of this fact:

  1. They often leverage unused regions (T1535): These regions may lack the monitoring of adopted regions. They also can show gaps in enablement and application of security services like GuardDuty. In the worst case, some services are not supported at all in a new region. For example, eu-north-1 was GA in December 2018, and didn’t have GuardDuty available until May 2019.
  2. Their goals may include services that are otherwise unneeded: Attackers, especially opportunistic (and potentially automated) ones, have a variety of exploitation, escalation, and impact vectors in the wild. Notably, this has recently included targeting of SES. Another example would be Demonia specifically targeting Lambda.

Reducing Attack Surface

AWS customers are provided the ability to guardrail their environments. The mechanism is to use Service Control Policies to allowlist both regions and services.

These controls offer multiple benefits:

  1. They reduce the attack surface, allowing you to focus on sophisticated, layered controls against a small subset of regions and services.
  2. They serve as a service inventory, which is especially useful for reference when developing detections against uncommon services. (h/t Rodrigo Montoro (Sp0oKeR) in AWS Threat Detection for NOT SO COMMON AWS Services on Cloud Security Podcast)
  3. They lend themselves to a clear production promotion process. Managing these allowlists offers a clean touch point for engineering to discuss and implement the necessary controls before enabling a new service or region in production.
  4. They decrease the burden of compliance analysis, communication, and enforcement. Completely eliminating unused services and regions provides a blanket resolution to compliance conversations and concerns on configuration. This also ensures that there is no organic adoption of services or regions that may not be in scope for compliance. This is a more rigorous form of controls like aws-allowlister that only disable services that are not in scope for AWS’ various compliance reports.

At Figma, we work to provide safe infrastructure primitives and guardrails within our cloud-native architecture to help maintain the pace of innovation while expanding our scale. Rolling out these allowlists is an example of a project we’ve tackled to continue to improve our cloud security posture beyond the AWS Security Maturity Roadmap.

p.s. we’re hiring!

Prior Work

There are a number of existing resources that we leaned on for this project. AWS has “deny region” as an example SCP. This provides partial data on global services, which are an edge case. Single-Region services were also enumerated as part of the recent Fault Isolation Boundaries whitepaper. Other SCP examples are available for regions from asecure.cloud and for both services and regions from cloudsecdocs.com.

Scott Piper has a great resource on SCP Best Practices. It includes discussion of region and service allowlists.

Service and Region Allowlists in Practice

Despite the available templates, each organization must consider their specific usage patterns. We found it straightforward to identify used and unused regions, and remarkably challenging to generate an allowlist for services.

If you can manage to accomplish this early in your AWS environment’s lifecycle, it will be much easier than retrofitting!

Enumerating the services for all the resources used

The best guide to enumerating all AWS resources is Michael Kirchner’s. He “compared the number of different resource types that various methods are capable of finding,” including Resource Explorer, AWS Tag Editor, AWS Config, aws-nuke, CloudQuery, and the AWS Cloud Control API. His tool aws-list-resources leverages the Cloud Control API, which provides uniform API actions to list 522 different AWS resources.

We blended this service data with that from AWS Organization last accessed, Billing, and Tag Editor to provide maximal coverage. We attempted to minimize the resultant list by removing undesired services with minimal or accidental usage. For example, there were several recently-released services that had minor activity from users investigating their main UI.

Enumerating the event sources for all APIs used

This was necessary as not all “services” are associated with resources.

We were able to write a simple query using the Data Explorer feature of our SIEM to compile unique event sources used in the proceeding month. permissions.cloud was an exceptional resource for validating the right AWS permissions for each service.

Rolling out SCPs

Figma leverages a multi-account architecture based around AWS Organization. As per good architectural practice, we segment production and non-production use cases across different dedicated AWS accounts. This structure allowed us to take advantage of our Staging environment as a test bed for our SCPs. Before applying the Staging Service Allowlist, we communicated broadly to users to ensure they were aware of the upcoming change and had the necessary context in case of any issues.

This is critical, as AWS does not provide any “audit” mode for SCPs, or other means of determining whether an SCP will introduce breaking changes. One improvement in recent years has been the introduction of transparency on the type of policy responsible for access denied errors - which allows end users to identify an SCP as the source of access issues.

The Pain Points

Despite a careful approach to SCP development, we did encounter oversights during testing in staging. Some significant takeaways:

  1. Allowlists do not play well with scanning and observability tooling. These tools are not generally SCP-aware, and often demonstrate naive retry behavior upon encountering Access Denied errors. Consider the tradeoffs in allowlisting the roles for tools in your SCPs.
  2. The inconsistency across IAM permissions and API methods requires community tools like permissions.cloud to disambiguate.
  3. As forewarned, AWS’s sample SCP does not capture all necessary global services.
  4. Despite our multiple techniques for permission enumeration, we ended up identifying permissions during staging testing that were missed in the draft policy. Some like Service Quotas and the Resource Group Tagging API were simply infrequent. With others, we missed the connection between permitted services and their secondary IAM permissions, like ssmmessages for SSM and execute-api for API Gateway. Finally, we missed certain features of core services, notably the Autoscaling permissions in some cases. We offer these examples of both specific potential gaps and evidence of the importance of a careful, collaborative rollout plan.

Risks Abound

Overall, this project substantially reduced our AWS attack service, removing over 80% of AWS services without any production or productivity disruption. Now, we can focus our energies on providing a paved, well protected road across the services we actually use, without leaving attackers with so many options.