Updated Architecture
The full scaled architecture
After caching, DB sharding, and peak traffic handling — this is what the complete system looks like. Every layer is redundant, every bottleneck has been addressed.
Full system diagram¶
graph TD
C[Client / Browser] -->|HTTPS| CLB[Cloud Load Balancer managed by provider]
CLB --> GW1[API Gateway 1 Zone A]
CLB --> GW2[API Gateway 2 Zone B]
CLB --> GW3[API Gateway 3 Zone C]
GW1 & GW2 & GW3 --> AS[App Server Fleet auto-scaled]
AS -->|check local cache first| LC[Local In-Process Cache hot keys only]
AS -->|cache lookup| RC[Redis Cluster ~27GB steady state]
AS -->|cache miss| DB[(DB Shards 8 shards x 3 replicas)]
RC -->|miss| DB
Request flow at peak — redirect¶
1. User clicks bit.ly/x7k2p9
Browser → HTTPS → Cloud LB
2. Cloud LB routes to one of 3 API Gateway instances
3. API Gateway:
- Rate limit check
- TLS already terminated
- Load balance to one app server
4. App server:
a. Check local in-process cache → HIT (hot key) → return 301 immediately
b. Miss → check Redis Cluster → HIT → return 301, async populate local cache
c. Miss → query DB shard → return 301, async populate Redis + local cache
5. Browser follows 301 → Location: https://long-url.com
Request flow at peak — creation¶
1. Client → POST /api/v1/urls { long_url }
→ Cloud LB → API Gateway → App server
2. App server:
- Generate random 6-char base62 short code
- Check DB shard for collision (unique index lookup)
- If collision → regenerate
- INSERT into DB shard primary
- Set cookie: ryow_until = now + 30s
- Return 200 { short_url: bit.ly/x7k2p9 }
3. Cache NOT populated on creation (write-around)
First click populates cache
The numbers after all deep dives¶
Total reads at peak → 1M/sec
Local cache absorbs → hot keys, ~varies
Redis serves → 80%+ of remaining reads
DB sees at peak → ~200k/sec → spread across 8 shards × 2 secondaries = 16 read nodes
→ ~12.5k reads/sec per node ← within Postgres capacity ✓
Total writes → 1k/sec → all to shard primaries → ~125 writes/sec per shard ← trivial
Every SPOF eliminated¶
| Component | Redundancy |
|---|---|
| Cloud LB | Managed by provider, multi-zone |
| API Gateway | 3 instances across 3 availability zones |
| App servers | Auto-scaled fleet, stateless |
| Redis | Cluster mode, multiple nodes |
| DB | 8 shards × 3 replicas = 24 machines |
No single machine failure takes down the system at any layer.
Known remaining limitations¶
Pre-generated key DB → collision retry degrades at high DB fill rate → next deep dive
Fault isolation → creation and redirect share app servers → next deep dive
Analytics → out of scope, but 301 means no click tracking possible
Interview framing
"Full stack at peak: Cloud LB → 3 API Gateway instances → auto-scaled app server fleet → local cache for hot keys → Redis Cluster → 8 DB shards with 3 replicas each. 1M reads/sec: local cache absorbs hot keys, Redis absorbs 80%+, DB sees ~200k/sec spread across 16 read nodes — ~12.5k each, within capacity. No single point of failure at any layer."