Skip to content

Cache Stampede and Thundering Herd: Prevention Guide

Webalert Team
June 23, 2026
6 min read

Cache Stampede and Thundering Herd: Prevention Guide

Your cache is doing its job: a popular query that would hammer the database is served from Redis in a millisecond, thousands of times a second. Then that one cache entry expires. In the same instant, every one of those thousands of requests misses the cache, and every single one turns around and hits the database to recompute the same value. The database, which was comfortably idle a moment ago, is suddenly buried under a coordinated avalanche of identical queries. This is a cache stampede — and it's one of the most common ways a perfectly healthy system falls over.

This guide explains what cache stampedes and thundering herds are, why they're so destructive, and how to prevent them.


What a Cache Stampede Is

A cache stampede (also called a dog-pile or, more broadly, a thundering herd) happens when a popular cached item expires and many concurrent requests all miss the cache at once. Because the value is gone, each request independently tries to regenerate it — running the same expensive query, API call, or computation — and they all pile onto the backend simultaneously.

The "thundering herd" name is the general pattern: a large number of waiting processes all wake up and contend for the same resource at the same time. A cache stampede is the caching-specific version, where the shared resource is the work needed to refill one hot key.

The cruel irony is that the cache was protecting the backend. As long as the value was cached, the database saw almost no load. The moment it expires, the database sees the full, un-cached load — often far more than it can handle — precisely because the cache had been hiding just how popular that item was.


Why It's So Destructive

A cache stampede is dangerous because it's sudden, coordinated, and self-amplifying:

  • The load arrives all at once. It isn't a gradual ramp the database can autoscale into — it's thousands of identical queries in the same few milliseconds.
  • The work is redundant. Every request computes the same value. You might need that value computed once; instead it's computed thousands of times in parallel, wasting the entire effort.
  • It cascades. The overwhelmed database slows down, so each regeneration takes longer, so more requests pile up waiting, so the cache stays empty longer — a feedback loop. Slow regeneration can trigger connection pool exhaustion and retry storms on top.
  • It's triggered by success. The more popular an item, the worse its stampede — so your best-performing, highest-traffic pages cause the biggest outages.

A related trigger is the cold cache: after a cache restart, a flush, or a deploy that invalidates everything, every key is missing at once and the entire request load slams the backend simultaneously.


How to Prevent Cache Stampedes

The fix is to make sure that when a hot key expires, the work to refill it happens once, not thousands of times. The main techniques:

  1. Lock / single-flight regeneration. When a request finds the key missing, it acquires a lock and becomes the one responsible for recomputing the value. Other requests wait briefly for the result (or serve stale data) instead of all hitting the backend. This collapses thousands of redundant computations into one.

  2. Stale-while-revalidate. Serve the slightly-expired cached value while one background task refreshes it. Users never see a miss, and the backend handles a single refresh instead of a flood. This is one of the most effective and widely used patterns.

  3. Add jitter to expiration times. If many keys are set with the same TTL, they all expire together and stampede together. Randomizing each TTL by a few percent spreads expirations out over time so misses don't synchronize — the same trick that jitter solves for retries.

  4. Probabilistic early expiration. Have requests randomly refresh a popular key slightly before it expires, so the value is regenerated by one unlucky request while the old value is still serving everyone else — avoiding the hard cliff entirely.

  5. Pre-warm the cache. After a deploy, flush, or restart, proactively populate hot keys before traffic hits them so you never face a fully cold cache under load.

  6. Treat the cache as an optimization, not a crutch. If a backend literally cannot survive its own traffic without the cache, a single eviction becomes an outage. A circuit breaker and graceful degradation give you a fallback when regeneration can't keep up.


How Webalert Helps

A cache stampede shows up from the outside as a sudden latency spike and a burst of errors that arrives the instant a hot key (or the whole cache) expires — often with no warning from your normal capacity dashboards, because the cache had been masking the real load. Webalert helps you catch it:

  • Outside-in latency monitoring that catches the sudden response-time spike a stampede causes, on the real endpoints users hit.
  • Error and downtime alerts for the 5xx errors and timeouts that follow when the backend gets buried — so you find out in seconds, not from customers.
  • Latency-percentile tracking (p95/p99) that surfaces the tail-latency blowups stampedes produce even when averages look fine.
  • Confirmation of recovery once you've added locking, jitter, or stale-while-revalidate, verifying the spikes are actually gone under real traffic.

Webalert won't refill your cache, but it tells you the moment an expiring key turns into a user-facing outage — and confirms when your prevention measures have tamed it.


Summary

A cache stampede (or thundering herd) happens when a popular cached item expires and many concurrent requests all miss at once, each independently regenerating the same value and burying the backend in redundant work. It's destructive because the load is sudden, coordinated, redundant, and self-amplifying — and it's triggered by exactly the popular items and cold-cache events you most rely on.

Prevent it by making regeneration happen once instead of thousands of times: lock or single-flight the refill, serve stale-while-revalidate, add jitter to TTLs so expirations don't synchronize, use probabilistic early expiration, pre-warm hot keys after restarts, and never let your backend depend on the cache for basic survival. Pair those defenses with outside-in latency and error monitoring so you catch the spike the instant a hot key expires.


Catch the latency spikes a stampede causes

Start monitoring with Webalert ->

See features and pricing. No credit card required.

Monitor your website in under 60 seconds — no credit card required.

Start Free Monitoring

Written by

Webalert Team

The Webalert team is dedicated to helping businesses keep their websites online and their users happy with reliable monitoring solutions.

Ready to Monitor Your Website?

Start monitoring for free with 3 monitors, 10-minute checks, and instant alerts.

Start Free Monitoring