Estimation
Estimation anchors every architecture decision that follows — wrong numbers here mean wrong design choices later.
The goal is not a precise number. The goal is a justified number you can defend and use to make design decisions.
Assumptions¶
MAU: 10M
DAU: 20% × 10M = 2M
Writes per DAU: 0.5 pastes/day (occasional tool, not daily habit)
Read:write ratio: 100:1 (pastes are read many more times than created)
Peak multiplier: 3× (Pastebin doesn't go viral — 3× is realistic)
Paste size: ~10KB (100 lines × 100 chars = 10,000 bytes)
Retention: 10 years
Why 0.5 writes per DAU? Pastebin is a developer tool used occasionally — during debugging, sharing a config, sending a log snippet. The average active user creates a paste roughly once every two days, not multiple times daily. Contrast this with Twitter (multiple posts/day) or chat (dozens of messages/day). 0.5 is a conservative, defensible number.
Why 100:1 read:write ratio? A paste gets shared via a link. One person creates it, many people read it — teammates, colleagues, people on a forum. The read traffic far outweighs creation traffic, similar to URL Shortener but less extreme (1000:1 there because URLs get tweeted and go viral; pastes are shared in smaller circles).
QPS¶
Writes/day: 2M × 0.5 = 1M writes/day
Avg write QPS: 1M / 100,000 = 10 writes/sec
Peak write QPS: 10 × 3 = ~30 writes/sec
Avg read QPS: 10 × 100 = 1,000 reads/sec
Peak read QPS: 1,000 × 3 = ~3,000 reads/sec
At 3,000 peak read QPS, a single Postgres instance (10k–50k reads/sec) can handle this comfortably. But 10KB per paste means each read is heavier than a URL redirect — caching becomes important not just for throughput but for reducing DB I/O on large payloads.
Storage¶
Writes/day: 1M
10-year total: 1M × 3,650 = 3.65B pastes
Per paste: ~10KB (content) + ~100 bytes (metadata) ≈ 10KB
Raw storage: 3.65B × 10KB = 36.5TB
Replication (3×): 36.5 × 3 = ~110TB
Index overhead (1.3×): ~143TB
→ ~150TB for 10 years
150TB crosses the sharding threshold. A single Postgres machine handles ~10TB practically before ops becomes painful. At 150TB we need sharding — same conclusion as URL Shortener.
Note: metadata (user_id, created_at, expiry, short_code) is negligible at ~100 bytes per row vs 10KB of paste content. Storage is dominated entirely by the paste text — not indexes, not metadata.
Bandwidth¶
Convert rule: bytes/sec × 8 = bits/sec (NICs are rated in bits)
Outgoing (paste reads):
Avg: 1,000/sec × 10KB = 10 MB/s × 8 = 80 Mbps
Peak: 3,000/sec × 10KB = 30 MB/s × 8 = 240 Mbps
Incoming (paste writes):
Avg: 10/sec × 10KB = 100 KB/s × 8 = 0.8 Mbps
Peak: 30/sec × 10KB = 300 KB/s × 8 = 2.4 Mbps
240 Mbps peak outgoing is well under a standard 10Gbps NIC (10,000 Mbps). No bandwidth constraint — CDN is not required for bandwidth reasons. (We may still want one for latency if users are globally distributed.)
Summary¶
Avg write QPS: 10/sec | Peak: 30/sec
Avg read QPS: 1,000/sec | Peak: 3,000/sec
Read:write ratio: 100:1
Storage (10 yrs): ~150TB → sharding required
Peak bandwidth: 240 Mbps → fits single NIC, no CDN needed for BW
Architecture decisions this forces¶
Read QPS 3,000 → single DB handles this, but caching helps with 10KB payloads
Storage 150TB → sharding required (>10TB per machine limit)
Bandwidth 240Mbps → no CDN needed for bandwidth alone
Paste size 10KB → content should be stored separately from metadata(content in blob store or text column, metadata in main DB)
The last point is new compared to URL Shortener. A URL row is 500 bytes — everything fits in one DB row. A paste is 10KB of content plus metadata. At scale it's worth separating the content blob from the metadata to allow independent scaling — small metadata reads are fast, large content reads go to a separate store.
Interview framing
"10M MAU, 2M DAU, 0.5 writes per DAU = 1M writes/day = 10 writes/sec avg, 30 peak. 100:1 read:write = 1k reads/sec avg, 3k peak. Storage: 3.65B pastes × 10KB = 36.5TB raw, ~150TB with replication and indexes — sharding required. Bandwidth: 3k reads × 10KB = 30 MB/s = 240 Mbps peak — fits on one NIC, no CDN needed for bandwidth."