the thundering herd problem how distributed systems fall

🛍️ Think of It Like Black Friday

Imagine a popular electronics store on Black Friday.

Hundreds of people are waiting outside before the doors open. The moment the store opens, everyone rushes in at the same time trying to grab the limited deals.

The store staff, billing counters, and aisles cannot handle so many people at once, leading to chaos and long delays.

That's exactly what happens to servers when a huge amount of requests — greater than their compute power — comes in all at once. This is commonly known as the "Thundering Herd" problem. ⚡

Online Ticket Booking System

A ticket website has 3 servers behind a load balancer. Each server can handle 1000 booking requests per second. A popular concert opens bookings and a massive traffic spike hits.

Here's how the failure unfolds step by step:

🔴 Server 1

Starts processing 1000 requests
Crashes after completing 600 requests due to overload
Remaining 400 requests are redirected by the load balancer

🔴 Server 2

Already processing 900 requests
Gets the extra 400 redirected requests
Total load becomes 1300 requests, exceeding its capacity
Server 2 crashes after processing some of them

🔴 Server 3

Now receives the remaining requests from Server 1
Receives the remaining requests from Server 2
Plus new incoming user requests
Becomes overwhelmed and eventually runs out of resources

💡 Result: All servers fail sequentially → Cascading failure across the entire system

⚽ Live World Cup Final Streaming

Consider millions watching the FIFA World Cup Final online.

The critical moment:

It's the 90th minute ⏱️
A penalty is awarded 😱
Millions of viewers open apps or refresh streams simultaneously

Platforms like Disney+ Hotstar, Netflix Platform massive spike of requests instantly.

🔴 Server 1

Handling 200k viewers
Sudden spike to 350k
CPU maxes out → server crashes

🔴 Server 2

Receives redirected traffic
Also processes live encoding + streaming
Memory spikes → fails

🔴 Server 3

Now receives redirected viewers
Plus new viewers joining
Plus bitrate adaptation requests
Eventually buffering or full service outage occurs 📺

💡 Result: Millions reacting to the same event at the same time causes a thundering herd traffic spike.

🔗 Why Distributed Systems Make This Worse

1. 💥 Cascading Failures

In distributed systems, multiple servers and services are connected. A typical flow looks like:

API Server → Auth Service → Database → Cache

If the API server gets overloaded, it sends too many requests downstream. If the database slows down, API servers start waiting for responses. Eventually:

Request queues fill up
Threads get blocked
Services crash one by one

This chain reaction creates cascading failures, often triggered by the Thundering Herd Problem.

2. 🍾 Shared Resource Bottlenecks

Distributed systems often share resources such as databases, message queues, and caches — like Redis or MySQL.

During a traffic spike:

Thousands of requests hit the same resource
That resource becomes the single bottleneck
If it fails, many services fail simultaneously 😵

3. 🔁 Retry Storms

Most distributed systems implement automatic retries. Here's how that backfires:

Service A calls Service B
Service B becomes slow
Service A retries requests
Now imagine 100,000 clients retrying simultaneously

Instead of reducing load, retries multiply the traffic, making the system collapse even faster. 📈💀

4. 📦 Queue Build-Up

When systems cannot process requests fast enough, requests start piling up in queues and memory usage grows rapidly.

For systems running on the Java Virtual Machine, this can lead to:

Long garbage collection pauses
OutOfMemoryError 🚨
Application crashes

🤔 Normal Traffic Spike vs. Thundering Herd — What's the Difference?

📈 Normal Traffic Spike

A normal traffic spike is a sudden increase in traffic, usually due to predictable events like a sale, a popular content release, or a live event.

Example: On Netflix releasing a new episode, millions start streaming — but each request is handled independently. Load balancer + caching + auto-scaling can handle it smoothly. ✅

⚡ Thundering Herd

A thundering herd occurs when many threads, processes, or clients simultaneously wake up and try to access the same resource, often triggered by a cache expiration, lock release, or service recovery.

Example:

A cache entry expires on Redis
10,000 clients all query the database at the exact same time
Database can't handle it → crash → remaining clients redirected → cascade → system outage 🔥

Impact on resources

🖥️ CPU — maxes out, context switching explodes, server crashes.
🗄️ Database — connection pools exhausted, query queues pile up, can go fully down.
⚡ Cache — the origin point of the herd, gets flooded with write operations on repopulation.
⏱️ Latency — response times balloon from milliseconds to seconds, "slow is the new down".

🛡️ How to Prevent the Thundering Herd Problem

1. 🚦 Rate Limiting

Limit how many requests a user or client can send within a time window.

Example: Allow only 100 requests per minute per user. If requests exceed the limit, the system returns 429 Too Many Requests.

Tools often used: NGINX, Envoy

✅ Prevents sudden traffic spikes from overwhelming servers.

2. 📬 Request Queueing

Instead of processing everything instantly, requests are placed in a queue and workers process them gradually.

Example systems: Apache Kafka, RabbitMQ

✅ Prevents sudden overload by smoothing out the traffic flow.

3. 🔐 Mutex Locking — One Rebuilds, Rest Wait

Only one request regenerates the expired cache. All others wait for the fresh value.

Without mutex → 5,000 DB queries fire at once 💣

With mutex:

1️⃣ Cache miss detected

2️⃣ Request #1 acquires the lock 🔒

3️⃣ Request #1 fetches from DB and updates cache

4️⃣ Lock released 🔓

5️⃣ Requests #2–5000 read the fresh cached value ✅

Only 1 DB hit instead of 5,000. 🙌

Request 1 → lock mutex → fetch data from DB → update cache → unlock Request 2 → waits for mutex → reads from cache Request 3 → waits → reads from cache Request 4 → waits → reads from cache

4. 🤝 Request Coalescing

Request Coalescing is a technique used in distributed systems to combine multiple identical requests into a single request, so the system performs the expensive operation only once and shares the result with all waiting requests.

✅ Instead of 5,000 separate DB calls, the system makes just one and fans the result out to everyone waiting.

🏁 Final Thoughts

The Thundering Herd Problem is one of those silent killers in distributed systems. Everything seems fine — until one cache expires, one server goes down, or one big event happens. Then the domino effect begins. 🁣

The key takeaway: design your system to expect the herd. Rate limit aggressively, queue your requests, use mutexes on cache rebuilds, and coalesce duplicate requests. A little prevention goes a long way before your servers end up like those Black Friday shoppers — trampling each other on the way in. 😄

#systemdesign #distributedsystems #thunderingherdproblem #chaiaurcode #chaicode

🐘 The Thundering Herd Problem — How Distributed Systems Fall Apart

🛍️ Think of It Like Black Friday

Online Ticket Booking System

⚽ Live World Cup Final Streaming

🔗 Why Distributed Systems Make This Worse

1. 💥 Cascading Failures

2. 🍾 Shared Resource Bottlenecks

3. 🔁 Retry Storms

4. 📦 Queue Build-Up

🤔 Normal Traffic Spike vs. Thundering Herd — What's the Difference?

📈 Normal Traffic Spike

⚡ Thundering Herd

Impact on resources

🛡️ How to Prevent the Thundering Herd Problem

1. 🚦 Rate Limiting

2. 📬 Request Queueing

3. 🔐 Mutex Locking — One Rebuilds, Rest Wait

4. 🤝 Request Coalescing

🏁 Final Thoughts

Comments

More from this blog

🚀 Kafka Simplified: A Must-Know Tool for System Design

Cache Strategies in Distributed Systems

Command Palette

🛍️ Think of It Like Black Friday

Online Ticket Booking System

⚽ Live World Cup Final Streaming

🔗 Why Distributed Systems Make This Worse

1. 💥 Cascading Failures

2. 🍾 Shared Resource Bottlenecks

3. 🔁 Retry Storms

4. 📦 Queue Build-Up

🤔 Normal Traffic Spike vs. Thundering Herd — What's the Difference?

📈 Normal Traffic Spike

⚡ Thundering Herd

Impact on resources

🛡️ How to Prevent the Thundering Herd Problem

1. 🚦 Rate Limiting

2. 📬 Request Queueing

3. 🔐 Mutex Locking — One Rebuilds, Rest Wait

4. 🤝 Request Coalescing

🏁 Final Thoughts

Comments

More from this blog