Skip to main content

Command Palette

Search for a command to run...

๐Ÿ˜ The Thundering Herd Problem โ€” How Distributed Systems Fall Apart

Updated
โ€ข7 min read
๐Ÿ˜ The Thundering Herd Problem โ€” How Distributed Systems Fall Apart

๐Ÿ›๏ธ Think of It Like Black Friday

Imagine a popular electronics store on Black Friday.

Hundreds of people are waiting outside before the doors open. The moment the store opens, everyone rushes in at the same time trying to grab the limited deals.

The store staff, billing counters, and aisles cannot handle so many people at once, leading to chaos and long delays.

That's exactly what happens to servers when a huge amount of requests โ€” greater than their compute power โ€” comes in all at once. This is commonly known as the "Thundering Herd" problem. โšก

Online Ticket Booking System

A ticket website has 3 servers behind a load balancer. Each server can handle 1000 booking requests per second. A popular concert opens bookings and a massive traffic spike hits.

Here's how the failure unfolds step by step:

๐Ÿ”ด Server 1

  • Starts processing 1000 requests

  • Crashes after completing 600 requests due to overload

  • Remaining 400 requests are redirected by the load balancer

๐Ÿ”ด Server 2

  • Already processing 900 requests

  • Gets the extra 400 redirected requests

  • Total load becomes 1300 requests, exceeding its capacity

  • Server 2 crashes after processing some of them

๐Ÿ”ด Server 3

  • Now receives the remaining requests from Server 1

  • Receives the remaining requests from Server 2

  • Plus new incoming user requests

  • Becomes overwhelmed and eventually runs out of resources

๐Ÿ’ก Result: All servers fail sequentially โ†’ Cascading failure across the entire system

โšฝ Live World Cup Final Streaming

Consider millions watching the FIFA World Cup Final online.

The critical moment:

  • It's the 90th minute โฑ๏ธ

  • A penalty is awarded ๐Ÿ˜ฑ

  • Millions of viewers open apps or refresh streams simultaneously

Platforms like Disney+ Hotstar, Netflix Platform massive spike of requests instantly.

๐Ÿ”ด Server 1

  • Handling 200k viewers

  • Sudden spike to 350k

  • CPU maxes out โ†’ server crashes

๐Ÿ”ด Server 2

  • Receives redirected traffic

  • Also processes live encoding + streaming

  • Memory spikes โ†’ fails

๐Ÿ”ด Server 3

  • Now receives redirected viewers

  • Plus new viewers joining

  • Plus bitrate adaptation requests

  • Eventually buffering or full service outage occurs ๐Ÿ“บ

๐Ÿ’ก Result: Millions reacting to the same event at the same time causes a thundering herd traffic spike.

๐Ÿ”— Why Distributed Systems Make This Worse

1. ๐Ÿ’ฅ Cascading Failures

In distributed systems, multiple servers and services are connected. A typical flow looks like:

API Server โ†’ Auth Service โ†’ Database โ†’ Cache

If the API server gets overloaded, it sends too many requests downstream. If the database slows down, API servers start waiting for responses. Eventually:

  • Request queues fill up

  • Threads get blocked

  • Services crash one by one

This chain reaction creates cascading failures, often triggered by the Thundering Herd Problem.

2. ๐Ÿพ Shared Resource Bottlenecks

Distributed systems often share resources such as databases, message queues, and caches โ€” like Redis or MySQL.

During a traffic spike:

  • Thousands of requests hit the same resource

  • That resource becomes the single bottleneck

  • If it fails, many services fail simultaneously ๐Ÿ˜ต

3. ๐Ÿ” Retry Storms

Most distributed systems implement automatic retries. Here's how that backfires:

  1. Service A calls Service B

  2. Service B becomes slow

  3. Service A retries requests

  4. Now imagine 100,000 clients retrying simultaneously

Instead of reducing load, retries multiply the traffic, making the system collapse even faster. ๐Ÿ“ˆ๐Ÿ’€

4. ๐Ÿ“ฆ Queue Build-Up

When systems cannot process requests fast enough, requests start piling up in queues and memory usage grows rapidly.

For systems running on the Java Virtual Machine, this can lead to:

  • Long garbage collection pauses

  • OutOfMemoryError ๐Ÿšจ

  • Application crashes

๐Ÿค” Normal Traffic Spike vs. Thundering Herd โ€” What's the Difference?

๐Ÿ“ˆ Normal Traffic Spike

A normal traffic spike is a sudden increase in traffic, usually due to predictable events like a sale, a popular content release, or a live event.

Example: On Netflix releasing a new episode, millions start streaming โ€” but each request is handled independently. Load balancer + caching + auto-scaling can handle it smoothly. โœ…

โšก Thundering Herd

A thundering herd occurs when many threads, processes, or clients simultaneously wake up and try to access the same resource, often triggered by a cache expiration, lock release, or service recovery.

Example:

  • A cache entry expires on Redis

  • 10,000 clients all query the database at the exact same time

  • Database can't handle it โ†’ crash โ†’ remaining clients redirected โ†’ cascade โ†’ system outage ๐Ÿ”ฅ

Impact on resources

  • ๐Ÿ–ฅ๏ธ CPU โ€” maxes out, context switching explodes, server crashes.

  • ๐Ÿ—„๏ธ Database โ€” connection pools exhausted, query queues pile up, can go fully down.

  • โšก Cache โ€” the origin point of the herd, gets flooded with write operations on repopulation.

  • โฑ๏ธ Latency โ€” response times balloon from milliseconds to seconds, "slow is the new down".

๐Ÿ›ก๏ธ How to Prevent the Thundering Herd Problem

1. ๐Ÿšฆ Rate Limiting

Limit how many requests a user or client can send within a time window.

Example: Allow only 100 requests per minute per user. If requests exceed the limit, the system returns 429 Too Many Requests.

Tools often used: NGINX, Envoy

โœ… Prevents sudden traffic spikes from overwhelming servers.

2. ๐Ÿ“ฌ Request Queueing

Instead of processing everything instantly, requests are placed in a queue and workers process them gradually.

Example systems: Apache Kafka, RabbitMQ

โœ… Prevents sudden overload by smoothing out the traffic flow.

3. ๐Ÿ” Mutex Locking โ€” One Rebuilds, Rest Wait

Only one request regenerates the expired cache. All others wait for the fresh value.

Without mutex โ†’ 5,000 DB queries fire at once ๐Ÿ’ฃ

With mutex:

1๏ธโƒฃ Cache miss detected

2๏ธโƒฃ Request #1 acquires the lock ๐Ÿ”’

3๏ธโƒฃ Request #1 fetches from DB and updates cache

4๏ธโƒฃ Lock released ๐Ÿ”“

5๏ธโƒฃ Requests #2โ€“5000 read the fresh cached value โœ…

Only 1 DB hit instead of 5,000. ๐Ÿ™Œ

Request 1 โ†’ lock mutex โ†’ fetch data from DB โ†’ update cache โ†’ unlock Request 2 โ†’ waits for mutex โ†’ reads from cache Request 3 โ†’ waits โ†’ reads from cache Request 4 โ†’ waits โ†’ reads from cache

4. ๐Ÿค Request Coalescing

Request Coalescing is a technique used in distributed systems to combine multiple identical requests into a single request, so the system performs the expensive operation only once and shares the result with all waiting requests.

โœ… Instead of 5,000 separate DB calls, the system makes just one and fans the result out to everyone waiting.

๐Ÿ Final Thoughts

The Thundering Herd Problem is one of those silent killers in distributed systems. Everything seems fine โ€” until one cache expires, one server goes down, or one big event happens. Then the domino effect begins. ๐Ÿฃ

The key takeaway: design your system to expect the herd. Rate limit aggressively, queue your requests, use mutexes on cache rebuilds, and coalesce duplicate requests. A little prevention goes a long way before your servers end up like those Black Friday shoppers โ€” trampling each other on the way in. ๐Ÿ˜„

#systemdesign #distributedsystems #thunderingherdproblem #chaiaurcode #chaicode