Caching Strategies

Caching is not a speed knob — it is a second source of truth you must keep honest. Patterns, eviction math, the invalidation trap, and the stampede that pages you.

22 min readupdated 2026-06-28

On this page

A cache gets pitched as a speed knob: bolt it on, reads get faster, done. That framing survives exactly until the first time the cache and the database disagree and a customer sees a number that is no longer true. The instant you add a cache you have created a second source of truth, and from then on you own the gap between the two.

Caching does not make a system simpler. It trades a latency problem you understand for a consistency-and-invalidation problem you usually don’t. The happy path — a warm hit returning in microseconds — is the easy 90%. The 10% that pages you is what happens when a key expires under load, when an invalidation message gets lost, when a deploy leaves stale serialized blobs in memory, or when a restart drops your hit ratio to zero with full traffic landing on a database that has never felt it.

This is the long-form context article on caching as an architectural decision, not a library call. It covers the four read/write patterns and how they differ in failure behavior rather than happy-path speed, the eviction-versus-expiry distinction that quietly collapses hit ratios, the invalidation traps, and the stampede that is the single most common way a cache amplifies an outage instead of preventing one. It leans heavily on Redis, the store most application caches actually run on, and connects to database replication for the read-scaling alternative, consistent hashing and sharding & partitioning for distributing the keyspace, and CAP theorem for the consistency tradeoff underneath all of it. The dedicated stampede deep-dive is on the roadmap.

The one belief to walk in with: a cache is a bet, and every caching decision is just you setting the terms of that bet. How stale you’ll tolerate, for how long, and who pays when the bet loses. Most caching incidents are someone discovering they placed a bet they never meant to make.

A motivating failure

An e-commerce team runs a price cache in front of Postgres. Product prices are read millions of times an hour and changed maybe a few thousand times a day, so caching them is obviously correct. Cache-aside, 300s TTL, and on every price write the service does an explicit DEL price:{sku} to invalidate. It runs clean for a year.

Then the company goes multi-region. The catalog service now runs in us-east and eu-west, each with its own Redis, and price writes happen in us-east against the primary database, which replicates to a eu-west read replica. The invalidation was wired as a single DEL against the local Redis of whichever node handled the write. Nobody updated it to fan out across regions.

A merchandising manager drops the price on a flagship product for a flash sale, then — when the sale ends four hours later — raises it back. The us-east cache invalidates correctly. The eu-west cache never gets the DEL. Its price:{sku} key was written during the sale and carries the sale price, and because the SKU is hot, every read refreshes interest but never re-fetches: the key keeps getting served from cache, well past its 300s TTL, because a steady stream of reads kept a background refresh job topping it up against the stale replica.

For ninety minutes, European customers check out at the flash-sale price that no longer exists. Finance notices a margin anomaly, not an alert. By the time someone runs redis-cli -h eu-west GET price:{sku} and sees the wrong number, the company has sold several thousand units below cost.

Nothing threw an error. DEL did exactly what it was told on the node it was told to run on. The bug lived in an assumption nobody wrote down: that invalidation is local and reliable. It is neither. A cache is only as consistent as its weakest invalidation path, and a missed DEL with a TTL that keeps getting refreshed is a stale value that never self-heals. That is the failure this article exists to prevent.

The one-sentence mental model

A cache is a bet that the cost of a stale or missing answer is lower than the cost of computing the right one every time — and every knob you touch (TTL, eviction policy, invalidation strategy) is just you setting the terms and the payout of that bet.

Each clause is an operational constraint, not a slogan:

Stale or missing answer → you are explicitly accepting wrong-but-fast over right-but-slow for some window. If that window is unbounded — no TTL, no reliable invalidation — you don’t have a cache, you have a data-corruption bug with good latency.
Lower than computing it every time → caching only pays when reads dominate writes and the computation is expensive. A cache in front of a cheap, write-heavy table adds a coherence problem and buys you nothing.
Who pays when the bet loses → a miss is not free. Under load, a flood of simultaneous misses is exactly how a cache turns a traffic spike into a database outage.

flowchart LR
  Cl[Client] --> Edge[CDN edge\ncache]
  Edge -->|miss| App[App cache\nRedis / local]
  App -->|miss| DB[(Database\nbuffer pool)]
  DB --> App
  App --> Edge
  Edge --> Cl
  App -. metrics .-> M[hit ratio\nTTL + evictions]

Read the diagram as a stack of independent bets. The CDN bets that static-ish content tolerates minutes of staleness. The application cache (Redis or in-process memory) bets that hot rows tolerate seconds. The database’s own buffer pool is a cache too — it bets the working set fits in RAM. An invalidation has to win at every layer or the stalest layer wins. That is why the motivating failure happened: the eu-west layer never got the message, and the stalest layer is the one the customer sees.

How it actually works

The strategy is defined by two questions: who populates the cache, and when the database gets written relative to the cache. There are four patterns worth knowing cold. The happy path of all four is nearly identical; they diverge entirely in how they fail.

Cache-aside (lazy loading)

The application owns the logic. On read: check the cache; on a miss, read the database, write the result back, return it. On write: update the database, then invalidate (delete) the key.

This is the default for good reasons. It only caches what is actually requested, and a cache outage degrades to slow, not broken — every miss just falls through to the database. The classic trap is the read-modify-write race: two requests miss simultaneously, both read the database, a write lands between them, and the slower miss writes its now-stale value back over the fresh one. It then lives until the TTL.

sequenceDiagram
  participant App
  participant Cache
  participant DB
  App->>Cache: GET price:sku
  Cache-->>App: nil (miss)
  App->>DB: SELECT price
  DB-->>App: 4999
  App->>Cache: SET key 4999 EX 300
  App-->>App: return 4999
  Note over App,DB: on write update DB\nthen DEL the key

The rule that prevents most cache-aside pain: delete on write, do not update on write. Writing the new value into the cache from the writer races with concurrent readers and with replication lag. Deleting the key is idempotent and lets the next reader repopulate from the authoritative source. The motivating failure violated this implicitly — a long-lived refreshed key behaved like an update-on-write that never got deleted.

Read-through and write-through

Read-through moves the miss logic into the cache layer or a library in front of it, so the application only ever talks to the cache and the cache loads from the database on a miss. It centralizes the logic but couples your availability to the cache: if the cache layer is down, reads fail instead of falling through.

Write-through writes the cache and the database synchronously on every write, keeping the cache always populated and coherent. The cost is write latency (you pay both writes) and cache pollution — you cache data nobody reads. Pair it with read-through and you get a cache that is always warm and always correct, at the price of slower writes and wasted memory on cold keys.

Write-back (write-behind)

The write hits the cache and returns immediately; the cache flushes to the database asynchronously, usually batched. This is the fastest write path and the most dangerous: a cache node that dies with unflushed writes takes acknowledged data with it. It is the same durability gap that Redis has between “OK” and “persisted,” moved up a layer. Use it only where loss is survivable (view counters, metrics, telemetry) or where a durable buffer sits underneath.

flowchart TD
  W[write request] --> P{which pattern}
  P -->|cache-aside| CA[update DB\nthen DEL key]
  P -->|write-through| WT[write cache\nplus DB sync]
  P -->|write-back| WB[write cache\nack now]
  WB --> Q[flush queue\nasync batch]
  Q --> DBx[(DB)]
  WB --> R[node dies\nbefore flush]
  R --> L[acked writes\nlost]
  style L fill:#e11d48,color:#fff
  style R fill:#171717,color:#fff

Here is the four-way summary, organized by the thing that actually matters — how each one breaks:

Pattern	Read path	Write path	Worst failure
Cache-aside	app handles miss	DB then invalidate	stale on lost `DEL`, read-modify-write race
Read-through	cache handles miss	paired write policy	cache outage blocks all reads
Write-through	always hot	sync cache + DB	write latency, cold-data pollution
Write-back	always hot	async flush	data loss on node death

The tradeoffs that bite

These look free at design time and bill you later.

TTL is a guess, and it is your entire consistency story. With cache-aside and TTL-only invalidation, worst-case staleness equals the TTL. A 300s TTL means a changed value can be wrong for five minutes. Shorten it and hit ratio drops while database load climbs; lengthen it and staleness grows. There is no free TTL — derive it from the data’s tolerance for being wrong, not from a framework default. A user’s session can tolerate seconds; a price at the point of charge can tolerate none and should not be cached there at all.

Eviction is not expiry, and conflating them collapses hit ratios silently. TTL removes a key when it ages out. Eviction removes a key when you run out of memory, regardless of TTL. With Redis, maxmemory-policy allkeys-lru evicts the least-recently-used key the moment maxmemory is hit, even if that key had four more minutes to live. If your working set exceeds memory, LRU thrash quietly turns “cached” reads into miss-plus-database-hit, and your dashboards still show keys present. Watch evicted_keys and keyspace_misses, not just hit rate — a rising evicted_keys is the early warning that the working set no longer fits.

Invalidation is the genuinely hard part of computer science, and the network makes it harder. A DEL on write is a network call that can fail, arrive late, or never arrive (the motivating failure). Belt-and-suspenders: always set a TTL even when you invalidate explicitly, so a lost invalidation self-heals within the TTL instead of poisoning the key forever. And make the TTL refresh logic re-read from source — never extend a key’s life without re-fetching, or you build the exact stale-forever trap from the opening story.

Negative caching cuts both ways. Caching “this key does not exist” protects the database from repeated misses and from penetration attacks that probe random keys. But too long a negative TTL means a newly created row stays invisible to readers who cached its absence. Keep negative TTLs short — 10–30s — and you get most of the protection with little of the lag.

Decision	The free-looking choice	What it actually costs
Staleness	A comfy default TTL	Worst-case staleness equals the TTL, everywhere
Memory	Sizing RAM to the dataset	Eviction thrash when the working set won’t fit
Invalidation	Single local `DEL` on write	Stale-forever on a lost message (multi-region)
Write-back speed	Acking before the DB write	Lost writes when a node dies pre-flush
Negative cache	Long “not found” TTL	New rows invisible for the TTL window

Read and write performance

Caching is a performance tool, so the numbers are the point. The win is measured in two places: latency at the edge of the request, and load removed from the source of truth.

What a cache makes fast: repeated reads of expensive answers. A join-heavy aggregate that takes 80ms in Postgres becomes a 0.3ms Redis GET. A rendered page fragment that costs a template render plus three service calls becomes one round trip. The leverage scales with two factors — how expensive the computation is, and how lopsided the read/write ratio is. At 1000:1 reads-to-writes on an expensive query, a cache removes nearly all the load; at 2:1, it barely helps and you pay coherence cost for the privilege.

What it does not make fast, and can make slower: writes (you now do two writes, or an extra DEL), and reads of data that changes faster than you read it (you cache, invalidate, re-cache, and never serve a hit). A cache in front of a primary-key lookup on a well-indexed table often adds latency variance and a consistency bug for no real gain — the database was already serving that from its buffer pool in well under a millisecond.

The levers that actually move the needle, in rough order of impact:

Hit ratio is the whole game. A cache at 99% hit ratio sends 1% of traffic to the database; at 90% it sends ten times as much. The difference between those two numbers is often the difference between a calm database and an overloaded one. Tune TTLs and working-set sizing to protect the ratio before anything else.
Right-size the working set to memory. If the set of keys actively read in a window fits in maxmemory with headroom, eviction stays near zero and the ratio holds. If it doesn’t, no amount of TTL tuning saves you — you are thrashing.
Coalesce misses. One database hit per expired key instead of a thousand (see failure modes). This is the lever that decides whether a miss is cheap or catastrophic.
Push the right data to the right layer. Static and public content to the CDN edge (highest hit ratio, cheapest, but slowest to invalidate); user-specific and fast-moving data to the application cache where you control eviction and purge instantly.

Measure the right things. Hit ratio tells you about effectiveness; it tells you nothing about whether you are about to fall over. Track keyspace_misses, evicted_keys, the p99 of cache-miss latency (a miss storm shows up here first), and database load attributable to cache misses. A slowly declining hit ratio is the leading indicator of a database overload that hasn’t happened yet.

Failure modes

A cache fails under load — precisely when you can least afford it — because the failure path is correlated across requests. Symptom → root cause → prevention.

Cache stampede / thundering herd. Symptom: a hot key expires and within milliseconds thousands of concurrent requests all miss, all run the same expensive query, and the database — sized for the cached load — falls over. The recompute is now slower because the database is saturated, so the cache stays empty, so more requests pile in. Root cause: synchronized misses on a single hot key with no coordination between the requests that miss. Prevention, in order of how often you’ll need it:

Request coalescing / single-flight — the first miss takes a per-key lock and recomputes; everyone else waits for that one result. One database hit instead of thousands. This is your primary stampede insurance.
Probabilistic early expiration — recompute a key slightly before its TTL, with the probability rising as expiry approaches (the XFetch technique), so the herd never forms because one lucky early request refreshes for everyone.
Stale-while-revalidate — serve the stale value and refresh in the background. You trade a sliver of staleness for never having a synchronous miss storm. The HTTP Cache-Control: max-age=60, stale-while-revalidate=600 header is the same idea at the edge.

If a single hot key expiring can take down your database, you do not have a cache — you have a load-bearing countdown timer. Coalesce misses before you tune anything else; probabilistic expiry and bigger instances are refinements, single-flight is the fix.

The stale-forever invalidation gap. Symptom: a value stays wrong long past its TTL, with no error (the motivating failure). Root cause: a lost or mis-targeted invalidation, often combined with a refresh path that extends TTL without re-fetching, or a multi-layer/multi-region topology where one layer never gets the DEL. Prevention: TTL ceiling on every key so loss self-heals; fan invalidation out to every cache that could hold the key (a pub/sub invalidation channel, not a local delete); make any refresh re-read from source.

Eviction thrash / the memory cliff. Symptom: latency spikes, hit ratio craters, evicted_keys climbs. Root cause: the working set exceeds maxmemory, so eviction runs on the hot path, evicting keys you are about to need, forcing recomputation and more writes and more eviction — the cache fights itself. Prevention: alarm on used_memory / maxmemory > 0.8, size for the working set plus headroom, set an explicit maxmemory-policy. The detailed version of this lives in the Redis memory-cliff incident.

Cache penetration. Symptom: steady database load from requests for keys that never exist, often malicious probing of random IDs. Root cause: misses for nonexistent keys bypass the cache entirely by definition. Prevention: negative caching with a short TTL, or a Bloom filter in front of the lookup that answers “definitely not present” without touching the database.

Stale-after-deploy. Symptom: reads deserialize garbage or throw after a release. Root cause: a serialization or schema change shipped, but the cache is full of the old format. Prevention: version your cache keys (v3:user:123) so a deploy effectively starts with a cold, correct cache instead of a warm, wrong one.

The cold-start cliff. Symptom: a cache flush, restart, or failover leaves a 0% hit ratio with full production traffic landing on a database that has never seen it, and the database tips over. Root cause: an empty cache plus no traffic ramp. Prevention: warm critical keys on startup, ramp traffic gradually after a flush, or fail over to a still-warm replica rather than a cold node.

flowchart TD
  T[hot key\nexpires] --> M[1000 requests\nmiss at once]
  M --> DB[all hit DB\nsame query]
  DB --> S[DB saturates\nrecompute slows]
  S --> E[cache stays\nempty]
  E --> M
  M -. single-flight .-> F[1 recompute\nothers wait]
  F --> OK[1 DB hit\nherd avoided]
  style S fill:#e11d48,color:#fff
  style E fill:#171717,color:#fff

Scaling it

At low scale, a single cache node or even in-process memory is enough. The interesting decisions appear as you add layers and nodes.

More layers, more coherence problems. A CDN edge cache, an application cache, and the database buffer pool are three caches, and an invalidation must propagate through all of them. Edge caches are the hardest to invalidate — you are at the mercy of the CDN’s purge latency, which can be seconds to minutes — so push only data that tolerates that staleness to the edge (static assets, public pages) and keep user-specific or fast-moving data in the application layer where a purge is instant.

Distributing the cache. One Redis node eventually hits a memory or single-core CPU wall, and you shard the keyspace across nodes. Key placement now matters enormously: a naive hash(key) % N reshuffles almost every key when N changes, dumping the entire working set onto the database at once during a scale event — a self-inflicted cold-start cliff. Consistent hashing moves only about 1/N of keys when the topology changes, which is the whole reason it exists; the mechanics are in consistent hashing and the broader story in sharding & partitioning.

flowchart TD
  K[key name] --> H[hash ring\nposition]
  H --> A[Node A\narc 1]
  H --> B[Node B\narc 2]
  H --> C[Node C\narc 3]
  C -. add node .-> D[Node D\nsplits arc 3]
  D --> Mv[only ~1/N keys\nmove]
  style Mv fill:#171717,color:#fff

The hot-key wall. Sharding spreads keys evenly but not traffic. One celebrity key — a viral product, a global feature flag, a homepage config — lands on a single shard and saturates it while the others idle. No amount of sharding moves load within a key. The fixes are application-level: replicate the hot key to several nodes and pick one at random per read, put a small in-process cache in front of the distributed one to absorb the reads locally, or split the value across key:{0..N} and combine on read. This is the same wall described in the Redis hot-key story.

Read replicas of the cache. Replicas multiply read throughput, but cache replication is asynchronous, so a replica can briefly serve a value the primary has already invalidated — the same stale-read window as database replication, just at cache speed. Route any read-your-writes path to the primary.

When to reach for it (and when not to)

Reach for a cache when reads massively outnumber writes, the cached computation is genuinely expensive (a join-heavy query, a rendered fragment, a remote API call, an ML inference), and the data tolerates a bounded staleness window. Session storage, rendered pages, feed fan-out, rate-limit counters, config and feature flags, and expensive aggregates are textbook fits. The pattern works because the data is cheap to rebuild from the source of truth and survivable to lose.

Don’t reach for it when writes dominate reads (you pay coherence cost for a hit ratio that never materializes), when the data must be strictly correct on every read (an account balance at the point of charge — read the authoritative row), or when the underlying query is already cheap. A cache in front of an indexed primary-key lookup usually adds latency variance and an invalidation bug in exchange for nothing the buffer pool wasn’t already giving you.

The honest alternative is frequently a better index, a read replica, or a materialized view — each keeps a single source of truth and sidesteps invalidation entirely. Reach for those first when correctness matters more than shaving the last few milliseconds. A cache is the right answer for expensive and read-heavy and staleness-tolerant; miss any of those three and reconsider.

When to consider alternatives

Strictly correct reads, one source of truth → a database index or read replica instead of a cache.
The cached store itself, durably → Redis for in-memory speed, or DynamoDB when you need a durable low-latency key-value store rather than a rebuildable cache.
Distributing the keyspace correctly → consistent hashing and sharding & partitioning.
Understanding the staleness you’re accepting → CAP theorem and consistency & consensus.
Offloading expensive recomputation instead of caching it → push the work to a message queue or Celery and cache the precomputed result.

Operational checklist

Set an explicit maxmemory and maxmemory-policy (e.g. allkeys-lru); never run on a default that errors writes or evicts unpredictably when full.
Always pair explicit invalidation with a TTL ceiling, so a lost DEL self-heals within the window instead of poisoning the key forever.
Fan invalidation out to every layer and region that can hold the key — a pub/sub purge channel, not a single local delete.
Implement miss coalescing (per-key lock or single-flight) for any key expensive enough to recompute; this is your stampede insurance.
Alert on hit ratio, evicted_keys, and keyspace_misses — a quietly collapsing hit ratio precedes a database overload.
Version cache keys (v3:) so deploys that change serialization start cold-correct, not warm-wrong.
Keep negative-cache TTLs short (10–30s); add a Bloom filter if penetration is a real threat.
Have a cold-start plan: warm critical keys or ramp traffic after any flush, restart, or failover.
When sharding, use consistent hashing and have a hot-key mitigation plan before launching anything that can go viral.
Decide and document, per dataset, the maximum acceptable staleness before you pick a TTL.

Summary

A cache is the best latency tool you have and the easiest one to misuse, because it quietly makes you the owner of a second source of truth. Almost every caching incident traces back to one of four facts: staleness equals your TTL (so a lost invalidation lives until then), eviction ignores TTL (so a too-small instance thrashes), a hot key expiring under load synchronizes misses into a stampede, and an empty cache after a restart drops full traffic onto a cold database. Choose the pattern by how it fails, not how it succeeds; delete on write rather than update; always set a TTL even when you invalidate; coalesce misses before you tune anything else; and reach for an index, a replica, or a materialized view when correctness matters more than the last few milliseconds. Do that and a cache is the cheapest performance win in your stack. Forget one of those four facts and it is the dependency that pages you at 3am with a margin report instead of an alert.

Appendix: caching fundamentals refresher

If the body assumed terms you’d like restated:

Hit ratio — the fraction of reads served from cache. The single most important caching metric; small drops mean large multiples of load shifted to the source of truth.
TTL (time to live) — how long a key is allowed to live before it expires. With TTL-only invalidation, it is also your worst-case staleness.
Eviction — removing keys to free memory when full, governed by a policy (allkeys-lru, allkeys-lfu, volatile-ttl, noeviction). Distinct from expiry, which is age-based.
Invalidation — actively removing or refreshing a cached value when the underlying data changes. The hard part, because it is a distributed-systems message that can be lost.
Stampede / thundering herd — many requests missing the same key at once and all hitting the source of truth together.
Cache-aside, read-through, write-through, write-back — the four patterns, differing in who populates the cache and when the database is written. The body covers each.

The unifying idea: a cache is a bet that recomputing or re-fetching is more expensive than storing a copy, and that the copy going stale or vanishing is survivable. The engine is usually Redis; the terms of the bet are yours to set.

Incidents & deep-dives

Where this system breaks in production — and how it comes back.

Documenting next

🔒 Thundering Herd & Cache Stampederoadmap →