Shipping RDS IAM Authentication (with a bastion host & SSM)

Jul 25, 23
Shipping RDS IAM Authentication (with a bastion host & SSM)

I tried out RDS IAM Authentication. The environment follows common patterns, containing:

Isn’t this easy? Why are you writing this?

It was frustratingly hard to locate a straightforward guide to this situation, and there were a few “gotchas” along the way.

I leaned on prior art like:

So, I’m documenting my process under the continued theme of “practical guidance for your AWS security program” (prev: S3 Logging, AWS Service & Region Allowlisting, Lambda Risks, AWS Phishing)

The process

At a high level, this ends up requiring the following steps:

  1. Enable IAM database authentication
  2. Create IAM role(s) to use for DB access (rds-db:connect)
  3. Add neccessary permissions for Port Forwarding to the role(s)
  4. Add database account(s) for IAM authentication
  5. Document the commands necessary to start a PortForwarding session and use it to authenticate to a private RDS instance

Each part of this process has one or more considerations I had to dig up - let me save you the time!

Enable IAM database authentication

In my case, this was done in terraform, using the iam_database_authentication_enabled.

Working at scale and in production, there were a few important questions to validate:

Is enabling RDS IAM Authentication a zero-downtime action? Yes, as per a random note in this random re:Post article

Note: When you choose Apply Immediately when updating your cluster configuration settings, all pending modifications are applied immediately. This action doesn’t result in downtime.

Does enabling RDS IAM Authentication impact cluster performance? Yes, but not significantly (in our testing). CPU jumped from ~10% to ~20%, for a period well under 5 minutes.

What are these errors? Is everything okay? If you’re seeing “The parameter max_wal_senders was set to a value incompatible with replication. It has been adjusted”, it “is a known message when an RDS instance is restarted”. If you’re seeing “The parameter rds.logical_replication was set to a value incompatible with replication”, it generally is expected on a read-only instance.

Create IAM role(s) to use for DB access (rds-db:connect) + Add neccessary permissions for Port Forwarding to the role(s)

The easiest way to use SSM to set up a Port Forwarding session over a bastion host is via the AWS-StartPortForwardingSessionToRemoteHost document

Here’s a sample policy document to get you started.

data "aws_iam_policy_document" "rds_iam_authentication" {
  statement {
    actions = [
      "rds-db:connect",
    ]
    resources = [
      "arn:aws:rds-db:${module.constants.region}:${module.constants.account_ids["production"]}:dbuser:*/db_user_name"
    ]
  }
  statement {
    actions = [
      "rds:DescribeDBClusters",
    ]
    resources = [
      "arn:aws:rds:*:${module.constants.account_ids["production"]}:cluster:*"
    ]
  }
  statement {
    actions = [
      "rds:DescribeDBInstances",
    ]
    resources = [
      "arn:aws:rds:*:${module.constants.account_ids["production"]}:db:*"
    ]
  }
  statement {
    actions = [
      "ssm:StartSession",
    ]
    resources = [
      "arn:aws:ec2:us-west-2:${module.constants.account_ids["production"]}:instance/i-0000<bastion>000",
      "arn:aws:ssm:${module.constants.region}:*:document/AWS-StartPortForwardingSessionToRemoteHost",
    ]
    condition {
      test     = "BoolIfExists"
      variable = "ssm:SessionDocumentAccessCheck"
      values = ["true"]
    }
  }
  statement {
    actions = [
      "ssm:DescribeSessions",
      "ssm:GetConnectionStatus",
      "ssm:DescribeInstanceInformation",
      "ssm:DescribeInstanceProperties",
      "ec2:DescribeInstances",
    ]
    resources = ["*"]
  }
  statement {
    actions = [
      "ssm:TerminateSession",
      "ssm:ResumeSession",
    ]
    resources = ["*"]
    condition {
      test     = "StringLike"
      variable = "ssm:resourceTag/aws:ssmmessages:session-id"
      values   = ["$${aws:userid}"]
    }
  }
}

Add database account(s) for IAM authentication

The documented command actually “just worked”!

CREATE USER db_userx; 
GRANT rds_iam TO db_userx;

Document the commands necessary to start a PortForwarding session and use it to authenticate to a private RDS instance

The following commands:

export RDSHOST="$(aws-vault exec profile-name -- aws rds describe-db-instances --db-instance-identifier rds-1 --query 'DBInstances[0].Endpoint.Address' --output text)"
export PGPASSWORD="$(aws-vault exec profile-name -- aws rds generate-db-auth-token --hostname $RDSHOST --port 5432 --region us-west-2 --username db_user_name)"
aws-vault exec profile-name -- aws ssm start-session --target i-0000<bastion>000 --document-name AWS-StartPortForwardingSessionToRemoteHost --parameters '{"portNumber":["5432"], "localPortNumber":["5432"], "host":["rds-1.randomdigits.us-west-2.rds.amazonaws.com"]}'
aws-vault exec profile-name -- psql -h rds-1.randomdigits.us-west-2.rds.amazonaws.com -p 5432 "hostaddr=127.0.0.1 sslmode=prefer sslrootcert=rds-ca-2019-root.pem dbname=your_db_name user=db_user_name"

The hostaddr and SSL configuration is worth noting, it was unobvious (ref).

In practice, I added a wrapper script that:

  1. Sets up the tunnel
  2. Generates the exact remaining commands to run and provides them to the user

Conclusion

Overall, I’m glad AWS offers RDS IAM Authentication. It fit a pretty niche need, and now that the parts are together it’s zero maintenance, zero cost and zero overhead.

Note: this post doesn’t dive into the when and why for adopting RDS IAM authentication. As an aside: it can randomly have high latency and has some concrete limitations around number of sessions that make it only suitable for human access with non-interactive patterns.

However, I think AWS could and should do more to focus their documentation on Assumed Secure deployment models, and not rely so much on the assumption you’re sticking your databases on the Internet.