Hot Key Problem
Before adding infrastructure to solve a problem, verify the problem actually exists at your scale. Numbers first, solution second.
The scenario¶
Pastebin is a link-sharing service. Someone with 100k Twitter followers pastes an incident log or a config snippet and tweets the link. Within minutes, tens of thousands of people click the same URL. One specific short code is now receiving a disproportionate share of all read traffic — this is called the hot key problem.
The question is: does this actually break anything?
Step 1 — Understand what "hot key" means for Redis¶
Redis is a single-threaded in-memory store. All reads and writes go through one event loop. Its throughput limit for simple GET operations is approximately 100,000 operations per second on a single instance.
That means if one key is being read 100,000 times per second, Redis is at capacity — serving nothing else.
The hot key problem becomes real when a single key's traffic approaches or exceeds the single-instance throughput limit.
Step 2 — What is our total peak read QPS?¶
From estimation:
DAU: 2M
Reads per DAU: 0.5 pastes/day × 100 (read:write ratio) = 50 reads/DAU/day
Total reads/day: 2M × 50 = 100M reads/day
Avg read QPS: 100M / 100,000 = 1,000 reads/sec
Peak read QPS: 1,000 × 3 = 3,000 reads/sec
Our entire system — all short codes combined — peaks at 3,000 reads/sec.
Step 3 — What fraction could a single hot key receive?¶
Assume worst case: a viral paste captures 10% of all peak traffic. That is an extreme assumption — 10% of all Pastebin reads going to one link simultaneously.
Hot key traffic = 10% × 3,000 = 300 reads/sec
Even at this extreme, one hot short code is seeing 300 reads/sec.
Redis capacity is 100,000 ops/sec.
300 / 100,000 = 0.3% of Redis capacity
A hot key consuming 0.3% of Redis capacity is not a problem. Redis has 333× headroom above this load.
Step 4 — The verdict¶
Redis capacity: 100,000 ops/sec
Total system peak QPS: 3,000 reads/sec
Headroom factor: 33×
Even a hot key at 10% of peak:
300 reads/sec vs 100,000 ops/sec = 0.3% utilisation
The hot key problem does not exist at this scale. Our peak read traffic is too low to stress a single Redis instance, let alone a single key within it.
Recognising when a problem doesn't exist is as important as knowing how to solve it. Jumping to "add Redis replicas for hot keys" without checking the numbers is over-engineering — it adds complexity without solving a real bottleneck.
When would it become a problem?¶
If Pastebin scaled to 100× current traffic:
Peak read QPS: 300,000 reads/sec
Hot key (10%): 30,000 reads/sec
Redis capacity: 100,000 ops/sec
Now a hot key at 10% of peak is consuming 30% of Redis capacity. Still not broken, but worth watching. At 1,000× current traffic (30M reads/sec), hot keys become a genuine problem.
At that scale, the solutions are:
1. Redis read replicas — distribute reads across multiple Redis nodes
2. Local in-process cache on app servers — serve the hottest keys from
app server memory, zero network hop, unlimited throughput per server
But those are Google-scale problems. At Pastebin's current scale, the right answer is: check the numbers, conclude no problem exists, move on.
Interview framing
"I considered the hot key problem — a viral paste could concentrate reads on one short code. Our peak read QPS is 3k. Even if 10% hits one key, that's 300 reads/sec against Redis's 100k ops/sec capacity — 0.3% utilisation. Hot keys are not a concern at this scale. If traffic grew 100×, I'd add Redis read replicas or local in-process caching on app servers. For now, the numbers don't justify the complexity."