Attribute-based access control (ABAC) bases fine-grained authorization on user and resource attributes. In AWS, this is mediated by Tags, that can store these attributes attached to both Principals and Resources.
ABAC can often be handy for shrinking complex IAM policies, simpifying their conditionals, and dynamically keeping permissions accurate against changing resources.
Let’s take a look at the specific limitations Scott found.
2020 Problem | 2024 Status |
---|---|
Lack of privilege support: only ~43% of privileges for creating resources on AWS allow you to both tag the new resources and to restrict what tags are used for that. Notable exclusions: lambda:CreateFunction , dynamodb:CreateTable , kms:CreateKey , logs:CreateLogGroup , s3:CreateBucket , sqs:CreateQueue , and iam:CreateRole . |
🚧 Some improvement! We now get ~58% using the same methodology, despite the total number of create permissions almost doubling. Of the explicitly listed actions, only dynamodb:CreateTable and s3:CreateBucket still lack support for RequestTag . |
Lack of tooling: tools for analyzing IAM are not ABAC aware | 🚧 PMapper has added some minor support for tags. Other popular IAM tools like Cloudsplaining still completely lack support. Modern vendors tend to have strong support for global condition keys (aws:PrincipalTag , aws:ResourceTag and the like). Vendors may not support all resource-specific keys. |
SimulatePrincipalPolicy requires you to explicitly specify tags within the “context keys,” reducing its value | ❌ Unchanged. |
Zelkova | ❌ Unchanged. |
Limited capabilities of Tag Policies: you cannot enforce that a resource is tagged | ❌ Unchanged. |
Lack of support for working with multiple tag values | ❌ Unchanged. |
Lack of transitive tags for the creation of resources | ❌ Unchanged. |
AWS offers a useful overview of Services that work with IAM. This may be a more actionable lens on the limitation Scott noted. Be particularly wary of “Partial” support.
Andreas Wittig adds some color to this problem over \on LinkedIn:
Only a few services -EC2 for example- allow you to restrict adding tags for new resources only. For example, it is not possible to restrict the kms:TagResource action in a way, so that tags can only be added when creating a new key. So controlling access to KMS keys by using tags is only possible in very static scenarios with a lot of manual effort.
In addition to lack of pervasive support, AWS also ships their two pizza teams through an inconsistent interface for Tag management.
Here is just a sample of the CRUD permissions across various services:
TagResource / UntagResource / ListTagsForResource
sns:ListTagsForResource, lambda:ListTags, dynamodb:ListTagsOfResource, apigateway:GetTags, kms:ListResourceTags
AddTagsToResource / RemoveTagsFromResource / ListTagsForResource
AddTags / RemoveTags / ListTags, used by CloudTrail
CreateTags / DeleteTags / DescribeTags, used by EC2 and Workspaces
iam:TagUser, sqs:UntagQueue, iam:ListRoleTags
acm:AddTagsToCertificate / acm:RemoveTagsFromCertificate / acm:ListTagsForCertificate
The CLI offers a similarly disjointed syntax for tag management.
As NCC Group’s Rennie deGraaf diagnoses it it:
Tagging support was added to many AWS services well after they were first released and ABAC support was grafted on later still
AWS has tried to address this via AWS Resource Groups Tagging. This has its own (relatively expansive) set of supported services.
Some of the oldest services have even more complicated limitations. For example, S3’s PutObjectTagging
and PutBucketTagging
work on tag-sets, replacing all tags versus mutating a single key-value pair.
Service Control Policies (SCPs) are a critical element of an AWS ABAC implementation. This reliance exposes AWS ABAC to the common pain points of AWS’s SCP limits (discussed previously: SCP Quotas).
There is no generic set of actions and conditions to “protect sensitive tags,” making SCPs verbose. This is exacerbated by the inconsistent interfaces mentioned above, which introduces substantial per-service configuration within SCP guardrails.
One appealing application of ABAC would be for dynamic or temporary permissions. However, in practice I’ve found AWS ABAC ill suited to this purpose.
For dynamic, just-in-time grants, AWS ABAC is difficult to operationalize when you’re also using federated access via IAM Identity Center.
It’s uncommon that IdPs offer support for programmatically modifying the relevant user attributes, especially not at scale.
Another possible case for ABAC is two-party access (2PA), also known as the “two-party rule”.
Picture this:
This would be magical! Unfortunately, AWS ABAC doesn’t offer any way to store TTLs on tags.
The existance of the aws:CurrentTime
global condition paired with Date condition operators almost gets you there. But, I haven’t found a way to force creation of a tag with either a set time in the future (i.e “tag with currenttime + 10m”) or to do the comparison against a set window (i.e “tag value is within the last 10m”). The Date condition operators also don’t support policy variables (h/t Scott Pack!).
Let me know if you figure it out!
Both The True Power of AWS Tags: How to Use ABAC at Scale and AWS themselves seem to recommend you implement the TTLs out of band of ABAC. It’s a cool solution, but adds complexity that would be great to avoid!
Looking at the progress since 2024, where do you think we’ll be in 2028?
My realistic hope:
AWS Resource Groups Tagging seems a safer bet for comprehensive coverage than a mass refactor, so that’s where I’d place my bets. Ideally, AWS will also launch a way to natively and generically enforce tagging on resource creation, and will offer a standardized pattern for protecting ABAC-related tags without having to account for all of those disjointed old API actions. Paired with some clean mechanism for temporary access or TTLs, and I think ABAC starts to look a lot more viable.
Until then, it’s a useful tool to be aware of, but the constraints make it hard to recommend broadly or unreservedly.
I figured I’d take it a step further and offer you a set of references to bootstrap your work on any of these ideas! Take inspiration, steal liberally, and share back
From 2019, Rhino Security’s AWS CloudFormation and Resource-Injection Walkthrough launched a Pacu module cfn__resource_injection.
Scott Piper also has written up Stack Set phishing over at tldrsec.
Check out TrustOnCloud’s Threat Models, such as “The last S3 security document that we’ll ever need, and how to use it”, as a maximalist take on this idea. I have my own minimalist version for Lambda over on my “Lambda risks” Wiki
Cloud Conformity is a solid knowledge hub, as is the Datadog Cloud Security Atlas
This is all about recon, so see:
Check out AWS Reference Notes!
Here’s someone 10 years ago setting up massscan
Here’s 2020 research hunting specific targets based on TLS certificates
Here’s a time someone did this and focused on open ElasticSearch and Kibana in 2022.
Check out Chris Farris’ “Public Access Keys - 2023” project, and the follow up The Consistently Inconsistence response to Access Key Leaks. Also, Orca’s 2023 Honeypotting in the Cloud Report.
Ian McKay is a master of this art form, with his List of expensive / long-term effect AWS IAM actions. Corey Quinn also frequently explores this topic in Last Week in AWS, such as in The Cloud Genie
Fewer pointers here, but the implies Nick Fritchette’s work. Research like Enumerate AWS API Permissions Without Logging to CloudTrail is often based on error messages and codes. I’d especially recommend you brush up on A Look at AWS API Protocols
Ah, Nick again for this one! He’s found a few bugs in undocumented AWS APIs: Amplify example, Bypassing CloudTrail in AWS Service Catalog, and Other Logging Research. Same thing for Gafnit Amiga’s AWS ECR Public Vulnerability
You could use aws-api-models, a repository of documented and undocumented AWS API models extracted from the AWS console, originally compiled by Nick, as a starting point. Keep an eye out for more work from him in this area!
Auto-tagging? I always think of bridgecrew’s open source yor
Who should fix it? Take some inspiration from Matt Fuller’s OG fwd:cloudsec talk It’s Time to Rethink the Shared Security Responsibility Model
Not to talk my own book, but I cover a very high level of this in my Beyond the AWS Security Maturity Roadmap - Asset Inventory & CSPM Section slides
On the technical side, I’d probably start with steampipe (or a competitor), dump the data into Snowflake (or similar), then add a couple IAM tools like Cloudsplaining and PMapper to get better coverage on Identity. I’d focus on the three “perimeters” to start: Identity, Network, and Data - these are where issues are most likely to cause a breach
I’ve mentioned Gafnit, Ian, and Ben already!
Aidan is hard to compete with, but one of his magic tricks is beating AWS to delivering clients and open source implementations of new features. He did this with openrolesanywhere, for example. So the recipe would be: 1) Pick a new feature, that lacks an open source client, architecture, implementation, etc. 2) Implement it 3) Share both the implementation, and what it taught you about the service, internals, etc.
Scott is maybe my most direct inspiration in cloud security research. I’d think about:
This one feels risky! It would be easy to punch down here at folks who are genuinely trying to help. This is also complicated by the fact that AWS controls have improved over the years – any recommendation from a couple years ago may only be wrong due to improvements.
A few tips though:
I’m not aware of much on the backdoor side, but for Terraform would point to some great existing work on attacks:
Access Advisor is a tool within AWS IAM that surfaces last used time of various permissions by an IAM identity (user or role). This is helpful for reconciling access granted with access needed. Over time, permissions inherently accumulate. It is risky to remove permissions from production identities, due to the potential disruption. Access Advisor derisks reducing privilege by highlighting unnecessary grants.
Access Advisor was launched in 2015, and only initially supported last used data at the Service level. In December 2018, AWS added an API. Action level data has since been added, starting with S3 in June 2020. EC2, AWS IAM, and AWS Lambda followed in April 2021.
In the last year, action level data has been rapidly expanded. This included additional support for 140 new services in September, and then 60 more in November.
At re:Invent 2023, AWS launched unused service and action findings for Access Analyzer (further guaranteeing my confusion of the two services) - at $0.20 /IAM role or user analyzed/month.
While Access Advisor can provide action level data for hundreds of AWS services, it has some limitations:
comprehendmedical
is), Sagemaker is not (sagemaker-geospatial
is), and Bedrock is not.resource: "*"
.aws_iam_access_advisor
While Access Advisor offers APIs, their ergonomics aren’t immediately fit for purpose at scale. GenerateServiceLastAccessedDetails must be run per-identity, and then GetServiceLastAccessedDetail must be called as a followup.
I want to use Access Advisor to provide:
Steampipe, which provides a SQL interface over APIs such as AWS’s, proved a perfect fit for this case.
Note: These queries can be pretty slow in sizable accounts, let me know if you find optimizations!
This query will produce a sorted list of all principals in the account paired with their number of unused actions:
SELECT principal_arn, COUNT(elem->>'ActionName') AS unused_actions
FROM aws_iam_access_advisor
CROSS JOIN jsonb_array_elements(tracked_actions_last_accessed) AS elem
WHERE elem->>'LastAccessedTime' IS NULL
AND principal_arn IN (
SELECT arn
FROM aws_iam_role
)
GROUP BY principal_arn
ORDER BY unused_actions DESC;
With a tweak, the query will only include roles that are recently used:
SELECT principal_arn, COUNT(elem->>'ActionName') AS unused_actions
FROM aws_iam_access_advisor
CROSS JOIN jsonb_array_elements(tracked_actions_last_accessed) AS elem
WHERE elem->>'LastAccessedTime' IS NULL
AND principal_arn IN (
SELECT arn
FROM aws_iam_role
WHERE DATE_TRUNC('day', role_last_used_date) > (CURRENT_DATE - INTERVAL '90 days')::timestamp
)
GROUP BY principal_arn
ORDER BY unused_actions DESC;
To get the list of actions to remove from a single principal, you can run
SELECT service_name, array_agg(elem->>'ActionName') AS unused_actions
FROM aws_iam_access_advisor
CROSS JOIN jsonb_array_elements(tracked_actions_last_accessed) AS elem
WHERE elem->>'LastAccessedTime' IS NULL
AND principal_arn = 'FILL_IN_PRINCIPAL_ARN'
ORDER BY service_name DESC;
To tie things together, here is a case from applying this approach to a production environment.
We wanted to focus on least privileging task roles. To do so, first we found the roles with the highest density of unused actions.
SELECT principal_arn, array_agg(elem->>'ActionName') AS unused_actions
FROM aws_iam_access_advisor
CROSS JOIN jsonb_array_elements(tracked_actions_last_accessed) AS elem
WHERE elem->>'LastAccessedTime' IS NULL
AND principal_arn IN (
SELECT arn
FROM aws_iam_role
WHERE name ILIKE '%task-role%'
AND DATE_TRUNC('day', role_last_used_date) > (CURRENT_DATE - INTERVAL '90 days')::timestamp
)
GROUP BY principal_arn
One thing that immediately jumped out was a high rate of overprivileged Amazon EC2 and Elastic Load Balancer permissions. Reviewing the implicated identities, it quickly jumped out that they all had AmazonEC2ContainerServiceRole
attached.
It turns out that this managed IAM policy is phased out, replaced by the Amazon ECS service-linked role. However, this policy was still baked into a core shared ECS Terraform module. The implicit perpetuation of this policy was a downside of Infrastructure as Code.
Removing this policy alone reduced thousands of unused permissions. This one policy was responsible for >50% of unused actions in the environment.
]]>In AWS, Service Control Policies are a powerful mechanism for creating centralized guardrails in AWS.
For example, at Figma we’ve rolled out both Service and Region allowlisting using SCPs, dramatically reducing our attack surface.
For general guidance on SCPs, I highly recommend Scott Piper’s guides: AWS SCP Best Practices (Summit Route, 2020) and Using Service Control Policies to protect security baselines (Wiz, 2023)
When making extensive use of SCPs, you’re bound to start bumping up against AWS’ relevant quotas.
Value | Quota |
---|---|
OU maximum nesting in a root | Five levels of OUs deep under a root. |
Maximum SCPs attached to root, per OU, per Account | 5 |
Maximum size of a policy document | Service control policies: 5120 characters |
There is an interesting quirk to that maximum policy size: unlike in IAM, with SCPs whitespace counts towards this character limit!
All characters in your SCP count against its maximum size. The examples in this guide show the SCPs formatted with extra white space to improve their readability. However, to save space if your policy size approaches the maximum size, you can delete any white space, such as space characters and line breaks that are outside quotation marks.
– AWS Organizations: Maximum Size of SCPs
Policies have a maximum size between 2048 characters and 10,240 characters, depending on what entity the policy is attached to. For more information, see IAM and AWS STS quotas. Policy size calculations do not include white space characters.
– AWS Identity and Access Management: Policy Grammer Notes
This limit is particularly disruptive when managing SCPs programmatically, including using Terraform. This is because, while the AWS Management Console minimizes whitespace automatically, the SDKs and CLI do not.
Once we first hit this limit, I decided to offer an improved paved road for our SCPs to avoid this issue.
It took a little research, but the end result was straightforward. Now, we use a minimal-scp
module with the following definition:
variable "name" {
type = string
}
variable "description" {
type = string
}
variable "content" {
type = string
description = "The aws_iam_policy_document .json content to use for the policy"
}
resource "aws_organizations_policy" "policy" {
name = var.name
description = var.description
content = jsonencode(jsondecode(var.content))
}
output "id" {
value = aws_organizations_policy.policy.id
}
Usage is simple, and matches the aws_organizations_policy
syntax:
module "example_scp" {
source = "./modules/minimal-scp"
name = "example_scp"
description = "This is an example SCP"
content = data.iam_policy_document.example_scp.json
}
The simple trick?
]]>The jsonencode command outputs a minified representation of the input.
–jsonencode
Function
This complemented a recent conversation a coworker started around Private Access Tokens as a potential replacement for exisitng human interaction proofs.
Together, these led me into a refresher on the history of CAPTCHAs and related solutions.
1996 Moni Naor wrote an (unpublished) manuscript entitled “Verification of a human in the loop, or Identification via the Turing Test” [download link].
It proposes ‘using a “Turing Test” to verify that a human is the one making a query to a service over the web.’ Example tests, some problematic, are drawn from the disciplines of Vision and Natural Language Processing:
The following year, two teams concurrently patented similar “turing test” concepts:
These “turing test” bot deterence solutions are predicated on a very basic concept: there are certain tasks that are easy for humans yet still hard for computers. The classic form relied on the known difficulty of OCR (“Optical Character Recognition”), by presenting distorted text for the user to identify.
Carnegie Mellon University researchers Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford started The CAPTCHA Project in 2000. This coined CAPTCHA, standing for Completely Automated Public Turing Test To Tell Computers and Humans Apart.
David Gausebeck and Max Levchin, while trying to combat fraud at Paypal, deployed one of the first practical CAPTCHAs in 2001. They named it the Gausebeck-Levchin test, which displayed distorted text.
Yahoo worked with the CAPTCHA team to create and adopt Gimpy, which displayed distorted words from a 850 word dictionary. In 2002, a team from UC Berkley came up with the first automated attack against Gimpy. They showed an 83% success rate against an easy Gimpy, and a slower and less successful (30%) rate against harder Gimpy. This started a long pattern of researchers both deriving harder CAPTCHAs, and breaking them.
In 2003, the original CMU team behind CAPTCHA published CAPTCHA: Using Hard AI Problems for Security. The paper genericizes a theoretical side benefit of using hard AI problems as the basis for CAPTCHA, such as was done with OCR. It proposes that doing so it will spur research and investment into solving those problems, advancing the field of AI.
A 2005 paper introduces IMAGINATION, which moves beyond distorted text to CAPTCHAs based on distorted images.
Microsoft came up with a CAPTCHA system in 2007 that relies on the ability to distinguish between cats and dogs. Asirra leveraged a partnership with Petfinder.com to source the images. Asirra closed permanently in 2014.
The CMU CAPTCHA project came up with ReCAPTCHA in 2007. This was an extension of OCR CAPTCHAs, that additionally served to digitize text that OCR could not. It started with a focus on digitizing old editions of the New York Times.
Luis von Ahn led a spin out of ReCAPTCHA in 2007 to a dedicated company of the same name. Google acquired ReCAPTCHA in 2009, and went on to apply it to text from Google Books.
Google first expanded ReCAPTCHA in 2012, adding in images of street names and addresses from Google Maps.
In 2013, they released No CAPTCHA reCAPTCHA. This “v2” of ReCAPTCHA moved to performing behavior analysis for risk assessment, at times allowing a “single checkbox” verification for low risk users. This also brought along with it a more rapid iteration cycle for new challenges, starting with image labeling.
In 2017, this was further enhanced into an “invisible” reCAPTCHA that leveraged similar behavior analysis, but fully backgrounded and a chance of no challenge at all in low risk cases.
ReCAPTCHA v3 then lauched in 2018, which made the scores for various requests transparent to the site embedding the CAPTCHA, allowing for more granular control over when and how CAPTCHAs were served.
The behavior analysis approach to human interaction proofs has raised privacy concerns. In order to perform this analysis, device profiling and cookie-based data collection are made even more prevelant. Coverage of privacy conerns have especially targetted Google’s ReCAPTCHA.
hCAPTCHA launched in 2018, and positioned itself as a privacy focused ReCAPTCHA alternative. Initially, hCaptcha showed images from datasets companies are paying to label. This allowed websites that host hCaptcha to be compensated. Starting around 2022, labeling was spun off into Human Protocol.
Privacy Pass is one protocol that emerged around 2018 to try to provide a privacy-oriented CAPTCHA alternative. The basic premise allows for each challenge solution or proof of human interaction to grant many anonymous tokens that can be exchanged in the future in lieu of a new challenge. These “passes” are blindly signed and anonymously redeemed. To work cross-site, the protocol uses a browser extension.
Privacy pass never got substantial adoption - Cloudflare (who were the main industry proponent of the protocol) and hCAPTCHA are the only two providers who supported passes.
Recent history has shown increasing pressure on the security of CAPTCHA challenges, and a continued focus on coming up with privacy preserving soltuons to human interaction proofs.
Cloudflare continues to be a vocal participant in this discourse. In 2021, they prototyped a “Cryptographic Attestation of Personhood” approach. This solution relies on a WebAuthn Attestation to “prove you are in control of a public key signed by a trusted manufacturer.”
Another alternative to traditional CAPTCHAs that has been investigated are Proof-of-Work Challenges. Several implementations have emerged in the past few years, such as mCAPTCHA. However, PoW Challenges are not actually a human interaction proof, and so may not serve as a direct replacement.
Cloudflare took another swing in 2022, partnering with Apple to expand on Privacy Pass with Private Access Tokens. Only Fastly and Cloudflare are Private Access Token Issuers, while Apple attests to the user’s identity.
Cloudflare’s late 2022 move in CAPTCHAs was the launch of “Turnstile.” This rolls up other technologies, including behavior analysis of telemetry and client behavior. It also builds in Private Access Tokens. It additionally, as was done with ReCAPTCHA, offers a platform for them to test additional challenge components. Up front they’ve suggested “proof-of-work, proof-of-space, probing for web APIs, and various other challenges for detecting browser-quirks and human behavior”.
For almost thirty years, CATPCHA have been a fact of the web, a small necessary evil. To reduce the cost on service providers, accessibility and user experience has been sacrificed. Where has this left us?
Just a few years after their invention, creator Luis von Ahn realized:
he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles
In 2022, Demystifying the Underground Ecosystem of Account Registration Bots looked at top sites and found the following break down of Human verification methods in use:
2023’s An Empirical Study & Evaluation of Modern CAPTCHAs provided concrete evidence that Bots now can outperform humans across common forms of CAPTCHA.
The findings of How Secure is Your Website? A Comprehensive Investigation on CAPTCHA Providers and Solving Services complement the research, concluding “CAPTCHA providers are failing to stop automated solvers. All selected popular third-party CAPTCHAs except FunCaptcha can be solved by CapSolver with a high success rate at a low price.”
Together, recent research shows that traditional CAPTCHAs cannot sufficiently distinguish humans from bots, and initiatives like Private Access Tokens do not seem to have the broad adoption necessary to offer a replacement.
I expect we’ll continue to see bimodal efforts: increasing behavior analysis and device profiling, like is done by Cloudflare Turnstile and ReCAPTCHA, but also more adoption of privacy-preserving protocols. So far however, the latter have relied on centralized providers and often on expensive ecosystems. While vendors like Arkose Labs (behind FunCaptcha) show that we can continue to use hard AI problems as CAPTCHAs, the chasm of “easy for humans, hard for bots” seems to have mostly closed.
Today - I’d like to discuss a fifth AWS specific phishing attack: Simple Email Service (SES) Verification Phishing.
Remember, if you get an AWS-themed phishing email you can report them to AWS online or via email at stop-spoofing@amazon.com. Phishing is a general risk that primarily resides on the customer side of the shared responsibility model. This post does not describe any vulnerabilities in AWS itself.
AWS SES requires you verify any domain or email address that you use to send or receive email.
The stated intended outcome is that “Amazon SES confirms that you own it and helps prevent unauthorized use.”
This verification email can be requested for any domain or email address. The verification process is insecure, solely reliant upon any GET request to the link contained in a verification email.
Because it only requires a GET request, it is suceptible to both human error and any automated systems that may retrieve the URL. Phishing prevention should rely on technical controls making it safe for users to click links. AWS’ SES Verification process’ use of a GET request precludes this.
Not every email service conducts even this fallible verification. Generic protections against email spoofing will not prevent SES sending emails if verification is phished. This includes email authentication methods like DKIM, SPF, and DMARC.
In general, AWS SES’ email verification will not inherently impact these controls. However, if you have authenticated AWS SES broadly, an attacker who has successfully verified one of your email addresses might be able to bypass some of these authentication methods.
The link sent by AWS SES Email Verification should return a page that requires user confirmation before completing verification.
This attack was previously discussed by Ophion Security in Phishing the anti-phishers: Exploiting anti-phishing tools for internal access. Another risk in AWS SES is described in badshah’s The Risk You Can’t Afford to Ignore: AWS SES and Email Spoofing.
]]>tl;dr make sure you’re familiar with ssm:SessionDocumentAccessCheck
if you’re using SSM for Port Forwarding.
Here’s the minimal policy for an IAM principal to connect to a given EC2 instance.
Policy A:
statement {
actions = [
"ssm:TerminateSession",
"ssm:StartSession",
]
resources = [
"arn:aws:ec2:*:*:instance/i-0000000000000",
"arn:aws:ssm:us-east-1:*:session/*"
]
}
A more realistic (explicit, least privileged, and usable) policy might look like this:
Policy B:
statement {
actions = [
"ssm:StartSession",
]
resources = [
"arn:aws:ec2:us-east-1:*:instance/i-0000000000000",
"arn:aws:ssm:us-east-1:*:document/SSM-SessionManagerRunShell",
]
}
statement {
actions = [
"ssm:DescribeSessions",
"ssm:GetConnectionStatus",
"ssm:DescribeInstanceInformation",
"ssm:DescribeInstanceProperties",
"ec2:DescribeInstances",
]
resources = ["*"]
}
statement {
actions = [
"ssm:TerminateSession",
"ssm:ResumeSession",
]
resources = ["*"]
condition {
test = "StringLike"
variable = "ssm:resourceTag/aws:ssmmessages:session-id"
values = ["${aws:userid}"]
}
}
SSM-SessionManagerRunShell
document, that otherwise is the defaultLet’s test getting a shell using SSM. Assuming everything else is set up, it’ll look something like:
$ aws ssm start-session --target i-0000000000000
Starting session with SessionId: EX-SESS-07a16060613c408b5
We can also make the default document explicit:
$ aws ssm start-session --target i-0000000000000 --document-name SSM-SessionManagerRunShell
Starting session with SessionId: EX-SESS-08b26060613c408b5
Now, let’s say I want to port forward instead - as in Shipping RDS IAM Authentication (with a bastion host & SSM). We can check out the documentation, and see “Starting a session (port forwarding to remote host)” uses the document AWS-StartPortForwardingSessionToRemoteHost
.
Let’s start by trying with Policy B:
$ aws ssm start-session \
--target instance-id \
--document-name AWS-StartPortForwardingSessionToRemoteHost \
--parameters '{"host":["mydb.example.us-east-2.rds.amazonaws.com"],"portNumber":["3306"], "localPortNumber":["3306"]}'
An error occurred (AccessDeniedException) when calling the StartSession operation: User: arn:aws:sts::{accountId}:assumed-role/{role_name}/{session_name} is not authorized to perform: ssm:StartSession on resource: arn:aws:ec2:us-west-2:{accountId}:instance/i-0000000000000 because no identity-based policy allows the ssm:StartSession action
Okay - so we get an AccessDenied error, which makes sense - Policy B doesn’t grant access to the document we’re attempting to use. Let’s make a small tweak, swapping in the AWS-StartPortForwardingSessionToRemoteHost
document:
Policy C:
statement {
actions = [
"ssm:StartSession",
]
resources = [
"arn:aws:ec2:us-east-1:*:instance/i-0000000000000",
"arn:aws:ssm:us-east-1:*:document/AWS-StartPortForwardingSessionToRemoteHost",
]
}
statement {
actions = [
"ssm:DescribeSessions",
"ssm:GetConnectionStatus",
"ssm:DescribeInstanceInformation",
"ssm:DescribeInstanceProperties",
"ec2:DescribeInstances",
]
resources = ["*"]
}
statement {
actions = [
"ssm:TerminateSession",
"ssm:ResumeSession",
]
resources = ["*"]
condition {
test = "StringLike"
variable = "ssm:resourceTag/aws:ssmmessages:session-id"
values = ["${aws:userid}"]
}
}
We can see the port forwarding is successful after this change:
$ aws ssm start-session
–target instance-id
–document-name AWS-StartPortForwardingSessionToRemoteHost
–parameters ‘{“host”:[“mydb.example.us-east-2.rds.amazonaws.com”],”portNumber”:[“3306”], “localPortNumber”:[“3306”]}’ Starting session with SessionId: EX-SESS-08b26060613c408b5 Port 8443 opened for sessionId EX-SESS-08b26060613c408b5. Waiting for connections…
Now, let’s try the inverse, and call the original SSM-SessionManagerRunShell
document with our new Policy C:
aws ssm start-session –target i-0000000000000 –document-name SSM-SessionManagerRunShell
Starting session with SessionId: EX-SESS-08b26060613c408b5
Wait, why did that work?
SSM-SessionManagerRunShell
documentThe basic premise of AWS IAM is:
But, if you go back to Policy A you can see that despite never granting access to SSM-SessionManagerRunShell
, start-session
still worked.
By default, when a user in your account has been granted permission to start sessions by their AWS Identity and Access Management (IAM) policy, they are also granted access to the SSM-SessionManagerRunShell SSM document
This makes sense in some ways, without this default start-session
wouldn’t be a functional permission in isolation. But this special case, where access is not implicitly denied, is easily missed when using SSM for more narrow purposes using scoped documents - prominently for Port Forwarding.
This consideration is documented by AWS under “Enforce a session document permission check for the AWS CLI”.
In short, if you want to use SSM for Port Forwarding without implicitly granting shell access, you need to add the following condition:
"Condition": {
"BoolIfExists": {
"ssm:SessionDocumentAccessCheck": "true"
}
}
If the SessionDocumentAccessCheck condition element is set to true, and you specify a document name in the Resource, you must provide the specified document name when starting a session. If you provide a different document name when starting a session, the request fails.
It’s hard to know whether customers are consistently using this (documented!) condition when setting up Port Forwarding with SSM - a quick Github search finds a few instances of the “vulnerable” pattern. In the end, this led to an advisory: GHSA-q4pp-j36h-3gqg
Third party guides are hit-or-miss, with many eliding the implict access, and others (like Sym) calling out:
]]>Note that by default, users can always use the SSM-SessionManagerRunShell document even if you don’t give that permission explicitly. You can turn this behavior off if you want to manage all document access explicitly.
It was frustratingly hard to locate a straightforward guide to this situation, and there were a few “gotchas” along the way.
I leaned on prior art like:
psql
So, I’m documenting my process under the continued theme of “practical guidance for your AWS security program” (prev: S3 Logging, AWS Service & Region Allowlisting, Lambda Risks, AWS Phishing)
At a high level, this ends up requiring the following steps:
rds-db:connect
)Each part of this process has one or more considerations I had to dig up - let me save you the time!
In my case, this was done in terraform, using the iam_database_authentication_enabled
.
Working at scale and in production, there were a few important questions to validate:
Is enabling RDS IAM Authentication a zero-downtime action? Yes, as per a random note in this random re:Post article
Note: When you choose Apply Immediately when updating your cluster configuration settings, all pending modifications are applied immediately. This action doesn’t result in downtime.
Does enabling RDS IAM Authentication impact cluster performance? Yes, but not significantly (in our testing). CPU jumped from ~10% to ~20%, for a period well under 5 minutes.
What are these errors? Is everything okay? If you’re seeing “The parameter max_wal_senders was set to a value incompatible with replication. It has been adjusted”, it “is a known message when an RDS instance is restarted”. If you’re seeing “The parameter rds.logical_replication was set to a value incompatible with replication”, it generally is expected on a read-only instance.
rds-db:connect
) + Add neccessary permissions for Port Forwarding to the role(s)The easiest way to use SSM to set up a Port Forwarding session over a bastion host is via the AWS-StartPortForwardingSessionToRemoteHost document
Here’s a sample policy document to get you started.
data "aws_iam_policy_document" "rds_iam_authentication" {
statement {
actions = [
"rds-db:connect",
]
resources = [
"arn:aws:rds-db:${module.constants.region}:${module.constants.account_ids["production"]}:dbuser:*/db_user_name"
]
}
statement {
actions = [
"rds:DescribeDBClusters",
]
resources = [
"arn:aws:rds:*:${module.constants.account_ids["production"]}:cluster:*"
]
}
statement {
actions = [
"rds:DescribeDBInstances",
]
resources = [
"arn:aws:rds:*:${module.constants.account_ids["production"]}:db:*"
]
}
statement {
actions = [
"ssm:StartSession",
]
resources = [
"arn:aws:ec2:us-west-2:${module.constants.account_ids["production"]}:instance/i-0000<bastion>000",
"arn:aws:ssm:${module.constants.region}:*:document/AWS-StartPortForwardingSessionToRemoteHost",
]
condition {
test = "BoolIfExists"
variable = "ssm:SessionDocumentAccessCheck"
values = ["true"]
}
}
statement {
actions = [
"ssm:DescribeSessions",
"ssm:GetConnectionStatus",
"ssm:DescribeInstanceInformation",
"ssm:DescribeInstanceProperties",
"ec2:DescribeInstances",
]
resources = ["*"]
}
statement {
actions = [
"ssm:TerminateSession",
"ssm:ResumeSession",
]
resources = ["*"]
condition {
test = "StringLike"
variable = "ssm:resourceTag/aws:ssmmessages:session-id"
values = ["$${aws:userid}"]
}
}
}
The documented command actually “just worked”!
CREATE USER db_userx;
GRANT rds_iam TO db_userx;
The following commands:
export RDSHOST="$(aws-vault exec profile-name -- aws rds describe-db-instances --db-instance-identifier rds-1 --query 'DBInstances[0].Endpoint.Address' --output text)"
export PGPASSWORD="$(aws-vault exec profile-name -- aws rds generate-db-auth-token --hostname $RDSHOST --port 5432 --region us-west-2 --username db_user_name)"
aws-vault exec profile-name -- aws ssm start-session --target i-0000<bastion>000 --document-name AWS-StartPortForwardingSessionToRemoteHost --parameters '{"portNumber":["5432"], "localPortNumber":["5432"], "host":["rds-1.randomdigits.us-west-2.rds.amazonaws.com"]}'
aws-vault exec profile-name -- psql -h rds-1.randomdigits.us-west-2.rds.amazonaws.com -p 5432 "hostaddr=127.0.0.1 sslmode=prefer sslrootcert=rds-ca-2019-root.pem dbname=your_db_name user=db_user_name"
The hostaddr
and SSL configuration is worth noting, it was unobvious (ref).
In practice, I added a wrapper script that:
Overall, I’m glad AWS offers RDS IAM Authentication. It fit a pretty niche need, and now that the parts are together it’s zero maintenance, zero cost and zero overhead. However, I think AWS could and should do more to focus their documentation on Assumed Secure deployment models, and not rely so much on the assumption you’re sticking your databases on the Internet.
]]>