๐ The Thundering Herd Problem โ How Distributed Systems Fall Apart

๐๏ธ Think of It Like Black Friday
Imagine a popular electronics store on Black Friday.
Hundreds of people are waiting outside before the doors open. The moment the store opens, everyone rushes in at the same time trying to grab the limited deals.
The store staff, billing counters, and aisles cannot handle so many people at once, leading to chaos and long delays.
That's exactly what happens to servers when a huge amount of requests โ greater than their compute power โ comes in all at once. This is commonly known as the "Thundering Herd" problem. โก
Online Ticket Booking System
A ticket website has 3 servers behind a load balancer. Each server can handle 1000 booking requests per second. A popular concert opens bookings and a massive traffic spike hits.
Here's how the failure unfolds step by step:
๐ด Server 1
Starts processing 1000 requests
Crashes after completing 600 requests due to overload
Remaining 400 requests are redirected by the load balancer
๐ด Server 2
Already processing 900 requests
Gets the extra 400 redirected requests
Total load becomes 1300 requests, exceeding its capacity
Server 2 crashes after processing some of them
๐ด Server 3
Now receives the remaining requests from Server 1
Receives the remaining requests from Server 2
Plus new incoming user requests
Becomes overwhelmed and eventually runs out of resources
๐ก Result: All servers fail sequentially โ Cascading failure across the entire system
โฝ Live World Cup Final Streaming
Consider millions watching the FIFA World Cup Final online.
The critical moment:
It's the 90th minute โฑ๏ธ
A penalty is awarded ๐ฑ
Millions of viewers open apps or refresh streams simultaneously
Platforms like Disney+ Hotstar, Netflix Platform massive spike of requests instantly.
๐ด Server 1
Handling 200k viewers
Sudden spike to 350k
CPU maxes out โ server crashes
๐ด Server 2
Receives redirected traffic
Also processes live encoding + streaming
Memory spikes โ fails
๐ด Server 3
Now receives redirected viewers
Plus new viewers joining
Plus bitrate adaptation requests
Eventually buffering or full service outage occurs ๐บ
๐ก Result: Millions reacting to the same event at the same time causes a thundering herd traffic spike.
๐ Why Distributed Systems Make This Worse
1. ๐ฅ Cascading Failures
In distributed systems, multiple servers and services are connected. A typical flow looks like:
API Server โ Auth Service โ Database โ Cache
If the API server gets overloaded, it sends too many requests downstream. If the database slows down, API servers start waiting for responses. Eventually:
Request queues fill up
Threads get blocked
Services crash one by one
This chain reaction creates cascading failures, often triggered by the Thundering Herd Problem.
2. ๐พ Shared Resource Bottlenecks
Distributed systems often share resources such as databases, message queues, and caches โ like Redis or MySQL.
During a traffic spike:
Thousands of requests hit the same resource
That resource becomes the single bottleneck
If it fails, many services fail simultaneously ๐ต
3. ๐ Retry Storms
Most distributed systems implement automatic retries. Here's how that backfires:
Service A calls Service B
Service B becomes slow
Service A retries requests
Now imagine 100,000 clients retrying simultaneously
Instead of reducing load, retries multiply the traffic, making the system collapse even faster. ๐๐
4. ๐ฆ Queue Build-Up
When systems cannot process requests fast enough, requests start piling up in queues and memory usage grows rapidly.
For systems running on the Java Virtual Machine, this can lead to:
Long garbage collection pauses
OutOfMemoryError ๐จ
Application crashes
๐ค Normal Traffic Spike vs. Thundering Herd โ What's the Difference?
๐ Normal Traffic Spike
A normal traffic spike is a sudden increase in traffic, usually due to predictable events like a sale, a popular content release, or a live event.
Example: On Netflix releasing a new episode, millions start streaming โ but each request is handled independently. Load balancer + caching + auto-scaling can handle it smoothly. โ
โก Thundering Herd
A thundering herd occurs when many threads, processes, or clients simultaneously wake up and try to access the same resource, often triggered by a cache expiration, lock release, or service recovery.
Example:
A cache entry expires on Redis
10,000 clients all query the database at the exact same time
Database can't handle it โ crash โ remaining clients redirected โ cascade โ system outage ๐ฅ
Impact on resources
๐ฅ๏ธ CPU โ maxes out, context switching explodes, server crashes.
๐๏ธ Database โ connection pools exhausted, query queues pile up, can go fully down.
โก Cache โ the origin point of the herd, gets flooded with write operations on repopulation.
โฑ๏ธ Latency โ response times balloon from milliseconds to seconds, "slow is the new down".
๐ก๏ธ How to Prevent the Thundering Herd Problem
1. ๐ฆ Rate Limiting
Limit how many requests a user or client can send within a time window.
Example: Allow only 100 requests per minute per user. If requests exceed the limit, the system returns 429 Too Many Requests.
Tools often used: NGINX, Envoy
โ Prevents sudden traffic spikes from overwhelming servers.
2. ๐ฌ Request Queueing
Instead of processing everything instantly, requests are placed in a queue and workers process them gradually.
Example systems: Apache Kafka, RabbitMQ
โ Prevents sudden overload by smoothing out the traffic flow.
3. ๐ Mutex Locking โ One Rebuilds, Rest Wait
Only one request regenerates the expired cache. All others wait for the fresh value.
Without mutex โ 5,000 DB queries fire at once ๐ฃ
With mutex:
1๏ธโฃ Cache miss detected
2๏ธโฃ Request #1 acquires the lock ๐
3๏ธโฃ Request #1 fetches from DB and updates cache
4๏ธโฃ Lock released ๐
5๏ธโฃ Requests #2โ5000 read the fresh cached value โ
Only 1 DB hit instead of 5,000. ๐
Request 1 โ lock mutex โ fetch data from DB โ update cache โ unlock Request 2 โ waits for mutex โ reads from cache Request 3 โ waits โ reads from cache Request 4 โ waits โ reads from cache
4. ๐ค Request Coalescing
Request Coalescing is a technique used in distributed systems to combine multiple identical requests into a single request, so the system performs the expensive operation only once and shares the result with all waiting requests.
โ Instead of 5,000 separate DB calls, the system makes just one and fans the result out to everyone waiting.
๐ Final Thoughts
The Thundering Herd Problem is one of those silent killers in distributed systems. Everything seems fine โ until one cache expires, one server goes down, or one big event happens. Then the domino effect begins. ๐ฃ
The key takeaway: design your system to expect the herd. Rate limit aggressively, queue your requests, use mutexes on cache rebuilds, and coalesce duplicate requests. A little prevention goes a long way before your servers end up like those Black Friday shoppers โ trampling each other on the way in. ๐
#systemdesign #distributedsystems #thunderingherdproblem #chaiaurcode #chaicode


