Thwacking DDOS with AWS WAF

Jul 12, 24

There’s no such thing as bad weather, only unsuitable clothing
- Alfred Wainwright

Distributed denial-of-service (DDOS) attempts are internet weather. Thankfully, for cloud-hosted companies, the majority of attacks are in layers handled by your cloud service provider. Take AWS, who commit to provide “native protections against infrastructure DDoS attacks (at layer 3 and 4) at no additional charge.” This protection lets you safely ignore the periodic “largest DDOS attack ever” articles. ¹

However, Layer 7 attacks (targeting your application) are prevalent and commoditized. As you scale, you’ll eventually get on the radar of low effort, low sophistication attackers.

These attacks tend to:

Only involve GET requests, or a small variety of POST endpoints
Target obvious surface, generally meaning the root (/) route, login pages/forms, and whatever are the most obvious few major endpoints visible when proxying traffic
Rely on generic volumetric traffic, not targeted exploit of endpoints that generate asymmetric load

DDOS resilience is a large field, with sophisticated products and architectural investments possible. But the first time you get DDOS’d in AWS - you’re probably just going to hack mitigations together in AWS WAF.

Here’s a braindump on the topic.

DDOS Services

What makes DDOS attacks a commodity good?

There are three categories of DDOS tools² that are offered at a low price as a subscription service.

Tool	Est. Cost	Description
Booter	“$30/month”	aka “Stresser”, directly sell DDOS attack traffic
Botnet	“$300/month”, or $67 for 24h, and $9/h	Large collection of compromised devices that can be used to generate traffic
Open Proxy	???	Allow for obfuscation of attack infrastructure by masking IPs

Attackers organize in Telegram groups, which allow a window into their inner workings.

DDOS Drivers

There are a variety of motivations for DDOS attacks. Fundamentally, you’re unlikely to change your response based on perceived motivation, but it’s worth understanding the attack landscape.

Extortion: Groups like DD4BC will commence a DDOs attack, and then attempt to convince victims to pay to stop the attack. Cloudflare offers a great breakdown of these “ransom DDoS attacks,” which have hit victims like Garmin.
Direct Financial Gain: In rare cases, attackers can directly make money off of the impact of DDOS. Most notably, this was the case with the MtGox DDOS.
Competitive Advantage²: The ecommerce and gaming industries are most susceptible to bad actors seeking advantage via DDOS.
Reputation (“Street Cred”)³: Unsophisticated actors (e.g skiddies) will occasionally attempt to DDOS prominent targets in order to assuage boredom or gain reputation.
Geopolitics²: DDOS attacks have a role in political conflict, involving both state and non-state actors. For example, we’ve seen Russia-affiliated groups target Davos, or Iranian groups targeting American financial institutions.
Censorship: Adjacent to Geopolitics, some attacks attempt to enforce censorship, for example when there was a Chinese DDOS attack against Github, targeting anti-censorship projects.
Retaliation²: Attacks may be a response to (real or perceived) injustice or persecution, including in response to attempts to disrupt DDOS infrastructure.

AWS WAF for DDOS

AWS WAF is the primary service AWS recommends to address the customer responsibilities around DDOS prevention.

AWS Shield

Before discussing WAF approaches, I want to cover the “Shield” family of services: AWS Shield (Standard) and AWS Shield Advanced.

AWS Shield Standard shouldn’t be thought of as a traditional AWS Service. Instead, it is a fundamental component of AWS’ Infrastructure as a Service platform.

AWS Shield is simply the name for AWS’ automated, default protections against “frequently occurring network and transport layer (Layer 3 and 4) DDoS attacks.” It is on by default, and can’t be disabled or configured. It generally seems to be effective at preventing customer impact from Layer 3 and 4 attacks.

What about Shield Advanced?

Shield Advanced, unlike Standard, is a normal AWS Service. It is directly targeted at DDOS prevention.

The core features of Shield Advanced are:

Access to AWS’ Shield Response Team, who can help manage AWS WAF, craft custom rules, and otherwise provide support during DDOS events
“Automatic Application Layer” DDOS mitigation, wherein AWS Shield Advanced attempts to baseline traffic, detect deviations, and create rules to address detected DDOS attacks
DDoS cost protection to safeguard against scaling charges resulting from DDoS-related usage spikes on protected resources
No additional charge for AWS WAF and AWS Firewall Manager

These features cost a flat $3,000/month (but with a minimum year commitment, you’re looking at $36k) per AWS Organization, plus data transfer fees.

The value of the monthly fee depends on the customer. Sprawling organizations might have ROI just from capping high WAF Web ACL and Rule costs at $3000. In other organizations, having AWS SRT support in the face of a major incident is sufficiently valuable on its own (check out the Fathom case, outlined below).

However, the Shield Advanced fees for traffic generally make it uneconomical to apply Shield Advanced Protection to any endpoints that are not both high risk and low-traffic.

Using WAF in Anger

So, you’re using AWS WAF as your DDOS prevention, what does this look like in practice?

Before you get to this point, make sure you are using Cloudfront and Load Balancing traffic. Restrict access to your Load Balancers to Cloudfront. This ensures that you are taking advantage of the resiliency gains from a CDN and load balancing. But it also ensures that once you configure AWS WAF on CloudFront, attackers can’t run around it and directly target the underlying infrastructure.

The basic process of fighting a DDOS with AWS WAF involves:

Inspecting traffic to identify patterns specific to the attack
Creating rules to address that traffic
Monitoring and tuning the rules as attacks develop and attacker behavior changes

😮‍💨 There are two additional WAF features AWS positions as meaningful here: Bot Control and Managed Rules. I won’t dive deep here, but suffice to say my personal opinion is that Bot Control is unreliable and unaffordable, while Managed Rules (especially IP reputation rule groups) aren’t sufficiently high signal to use in the direct way AWS pitches them.

Types of rules

There are some key AWS WAF rule features that you should be aware of when you’re analyzing traffic. An overarching goal is to find low-cardinality attributes of attack traffic. You’ll find it is much better to ban a single shared JA3 fingerprint than maintain a blocklist of thousands of IP addresses.

JA3 fingerprints: The single most powerful attribute for DDOS prevention WAF rules I’ve seen is JA3. JA3 is a form of SSL/TLS client fingerprint. AWS WAF started supporting JA3 Fingerprint Match in 2023. DDOS attacks often share a small set of JA3s, that can be distinct from any customer traffic. This is especially useful when an attack is using a large pool of IPs but a single orchestrating tool with a consistent fingerprint. While it’s possible to randomize SSL/TLS signatures, it is still uncommon to see that level of sophistication in untargeted attacks.
Geography: If you have a narrow legitimate user base, limiting the Geography of traffic can be a quick way to block swaths of malicious traffic. For example, in the Catch Group case below they limited traffic to Australia and New Zealand as a temporary mitigation. Inversely, you can also write rules banning traffic from specific countries if the traffic is exclusively or overwhelmingly malicious.
Request Header Order: Fingerprinting based on request header order has been a tactic used since the mid-2000s. As with JA3 fingerprints, you’ll often see a globally unique request header order specific to DDOS traffic. You can also get more clever by trying to detect unusual pairings of user-agent and Request Header Ordering.
User-Agent: Speaking of user-agent, I would say it is fairly rare that attackers use a single, predictable, distinctly malicious user-agent in DDOS attacks. However, you will see attackers rely on very dated lists of user-agents pulled from the internet or their crappy tools. Often, a large subset of these user-agents will significantly diverge from normal traffic.
Cookies: Using cookies for WAF rules is generally very application specific. However, I will say I’ve often seen attacks fail to include cookies that are mandatory for application functionality, making those requests easy to filter. Similarly, I’ve seen attacks use a static cookie value or set of cookie values that can be used as a high-signal detection.

All of these are examples of Match rules, which compare a specific component of the request to content and return a boolean match. There are also rate-based rules.

Generally, match rules that can exactly identify malicious traffic are the first line of defense. Rate-limiting rules are a second line, because they inherently allow a subset of malicious traffic below the limit. There are a few easy rate-based rules you can apply for DDOS prevention:

Per-IP rate limits: it should be easy to calculate a reasonable maximum on requests-per-user
Global rate limits on unauthenticated traffic: under periods of DDOS attack, you can make a quality of service tradeoff by rate limiting all unauthenticated traffic (or vice versa), given that there is an obvious cookie or regex match for “authenticated traffic”
Rate limits for targeted routes: while per-route rate limits do not naively scale in AWS WAF, they can be a useful tool in the face of basic DDOS attacks that may focus on specific routes, as discussed up top. Limiting per-route often lets you be far more restrictive than you can be on only IP.

AWSManagedIPDDoSList is the one exception to my critique of Managed Rules - in my experience the False Positive rate was acceptable given these are IPs found actively engaging in DDoS activities.

ASNs

ASNs are numbers assigned to each “autonomous system,” which you can think of as a large network with unified routing - Cloudflare uses a “post office” metaphor. Often, ASNs are a very useful signal for categorizing DDOS traffic. This is because often attackers might use a large set of IPs, but ones that share a very small set of ASNs.

While AWS WAF doesn’t yet provide ASN as a match component, unlike JA3. This makes it challenging to use ASNs in WAF rules when the WAF is associated with Cloudfront.

🥺 Go ask your TAM for ASN as a match component!

Workarounds available include using a WAF associated with the ALB, or directly using ALB listener rules to address malicious traffic based on ASN. In both cases, you’ll need to set up Cloudfront to forward the “CloudFront-Viewer-ASN” header.

Count or Block, CAPTCHA or Challenge

There are multiple available “actions” you can have AWS WAF take on rule matches.

Count allows you to track match rates, without any user impact. Always start by putting rules in count mode to avoid major issues with over-matching. You can also use Count as a trigger for AWS WAF logging that allows you to investigate the traffic to determine more granular matching criteria.
Block terminates a request. Use this with very high signal rules, in cases where the tradeoff of false-positives being a hard block is acceptable.
CAPTCHA has a long history, but basically serves as a “soft” block that allows humans to solve the puzzle and unblock themselves.
Challenge is a variation on CAPTCHA that tries to trigger clients to process a silent challenge that verifies they are a browser and not automation.

Generally, when they work well Challenges are a great way to avoid the user pain of CAPTCHA while adding additional verification. However, it’s worth knowing that Challenge actions can be especially hard to debug and integrate to non-Web clients. Even with that caveat, a Challenge is just a Block in the worst case, and in the best case is much more ergonomic for users.

Working with AWS WAF logs

AWS WAF web ACLs offer detailed traffic logging. Logs can be sent to CloudWatch, S3, or a Data Firehose.

Sampled requests offer a minimal version of this, by showing a table of up to 100 matching requests for each rule, and 100 requests that didn’t match any rules.

🍪 AWS WAF logs are detailed request logs. This means they are overwhelmingly likely to contain authentication material in the request cookies or headers.
As a result, they should be carefully handled and tightly controlled. For Logging, you can do field-level redaction, which trades off visibility with derisking the logs.
The only way to avoid leaking data in Sampled logs is by disabling Sampling.

AWS WAF logs are verbose and voluminous. I’d recommend filtering to only log based on specific rules - starting with Count matches as you tune rules to gradually narrow in on malicious traffic.

In an emergency, if you haven’t already set up a log pipeline it’s likely easiest to get going with CloudWatch. AWS has outlined a variety of helpful sample queries.

I’d recommend setting up a dashboard and/or saving rules that cover important context for writing rules. Here are a few examples queries to get you started:

Distinct IPs: a big spike in distinct IPs is an easy sign of DDOS

fields httpRequest.clientIp
| stats count_distinct(httpRequest.clientIp)

Top user-agents

parse @message '{"name":"user-agent","value":"*"}' as userAgent
| stats count(*) as ua_count by userAgent
| sort ua_count desc

Top cookie strings

parse @message '{"name":"cookie","value":"*"}' as cookies
| stats count(*) as c_count by cookies
| sort c_count desc

Most common JA3s

stats count(*) as ja3_count by ja3Fingerprint
| sort ja3_count desc

Most common JA3s in CAPTCHA’d traffic: useful if you have non-JA3 rules you want to distill down to be JA3 based

stats count(*) as request_count by ja3Fingerprint
| filter action = 'CAPTCHA'
| sort request_count desc

Group rule matches by IP:URI

stats count(*) as num by httpRequest.clientIp, httpRequest.uri
| filter terminatingRuleId = '<TKTK>'
| sort num desc

Limitations

AWS WAF isn’t a comprehensive solution for DDOS mitigation. It’s just likely to be the one you have handy.

While you may have no choice in the moment, it is worth being aware of AWS WAF’s limitations. Most notably, AWS WAF will always be a bit slow to kick in when an attack starts - you should understand that you either need to be resilient enough to last until the WAF kicks in, or accept there will be 30-60 seconds of downtime when an attack starts.

Chandrapal Badshah has a good look at these from a few months ago in Beyond the Basics: AWS WAF’s Lesser-Known Limitations

Rate-based rules have at best a 1 minute time window. The longer the time window, the higher you’ll need to set the rate (e.g you may only see 10 RPS max, but that means you have to set a minute long rate limit to 600 requests, which increases the load-per-IP/client an attacker can generate)
“It’s possible for requests to be coming in at too high a rate for up to several minutes before AWS WAF detects and rate limits them. Similarly, the request rate can be below the limit for a period of time before AWS WAF detects the decrease and discontinues the rate limiting action. Usually, this delay is below 30 seconds.”
“AWS WAF supports inspecting up to 64KB of the body” - meaning large requests can bypass the AWS WAF inspection
For Challenges specifically, AWS charges based on number served. Because they don’t charge based on attempts (as is done with CAPTCHA), it’s more likely to see billing impact during a DDOS when using Challenges.
Some AWS WAF Service Quotas can be limiting - specifically I’ve hit issues personally with the limit of only 10 rate-based rules per ACL as well as with the maximum of 10,000 IP addresses that a rate-based rule will track and limit.

There is also a variety of research on bypasses for AWS WAF, which I don’t think is super impactful in the naive DDOS case, but can be interesting.

I recommend Shubham Shah’s recent “Modern WAF Bypass Techniques” talk, Sysdig’s Fuzzing and Bypassing the AWS WAF, and Fraktal’s Cloud WAF comparison.

Recent improvements

I do want to give credit to some recent improvements that AWS has shipped, including:

I hope they continue to improve the service, but please - I don’t need Amazon Q for WAF.

Appendices

Relevant DDOS Retros & Case Studies

12/2020 - Fathom

Waves of attacks over the course of two weeks
“3,000 - 10,000 concurrent connections at any one time”
“We had decided that we were going to simply increase our lambda concurrency limit to 8,000,000 requests per second (800,000 concurrents) and handle the spam attacks using DynamoDB for various blocking techniques”
Bought Shield Advanced:
- “We now have 24x7 access to the AWS DDoS Response Team …”
- “This same attacker has been after us for 3 weeks. But this time, we had 6 people from AWS fighting for us. It was exhilarating.”
- “John then identified a pattern in all the IP addresses. I’m sure I could share more details here but I’m reluctant to, so I’ll leave it to your imagination.”

12/2021 - Flexbooker

“We are continuing to work with AWS on resolving this issue. We have been informed that this should not have been possible, but before they were able to assist technically, they had to ensure that all our security practices were correct.”
“have been on the phone with AWS support for 7 hours now, trying to push them through. A brute force attack such as this should not have been possible”

05/2022 - StackOverflow

“since we don’t use trailing slash on that URL, we blocked all traffic on that specific path.”
“We created a script that would check our traffic logs for IPs behaving a specific way and automatically add them to the ban list”

06/2023 - Deno Deploy

Increased traffic caused downstream impact due to resource contention against a metadata store

11/2023 - blender

“executed by a botnet with hundreds of IP addresses sending over 1.5 billion malicious request, at a peak rate of 100 thousand rps”
“The issue was resolved by moving behind a dedicated DDoS mitigation service”

10/2023 - Catch Group (re:Inforce)

Started with generous rate limits
Added an IP blocklist rule
Created a “break-glass” Geo-Block rule to restrict traffic down to target market
60d AWS Shield Advanced trial offered
Built custom AWS WAF & Shield Datadog Dashboard

01/2024 - Railway

“L7 (HTTP) DDoS attack affecting the Dashboard … Peak RPS: ~250k/sec”

01/2024 - Hacker News

“The targeted customer has implemented CloudFlare, and we have taken steps to mitigate this event.”

01/2024 - Basecamp

Blocked all Indonesian traffic
Moved behind Cloudflare
80,000 req/sec at the peak

Simulating DDOS - Tools

Simulating DDOS - Vendors

AWS also has a set of DDoS Simulation Testing Partners you can work with.

References

This is a generalization. You’ll occasionally learn of amplification vectors (like HTTP/2) where you could implement Layer 7 mitigations. Eventually though, these tactics end up defanged by default (e.g Cloudflare, AWS). ↩
Dismantling DDoS - Lessons in Scaling ↩ ↩² ↩³ ↩⁴
TISN - Managing DoS Attacks ↩