Skip to content

Registry

The hot partition registry — a routing table, not a cache

The registry maps conversation_id to max_N. Every write and every read consults it. Losing it doesn't mean a cache miss — it means wrong routing, which means missing messages. This distinction changes everything about how you store it.


Cache vs routing table

There is a critical distinction between a cache and a routing table:

Cache:
  → Stores a copy of data that also exists elsewhere
  → Losing it = cache miss = slower, but correct
  → TTL is fine — expired entry just means a DB fetch

Routing table:
  → Stores the ONLY record of where data lives
  → Losing it = wrong routing = incorrect results, missing data
  → TTL is NOT fine — expired entry means you query the wrong partitions

The hot partition registry is a routing table. If conv_abc123 → max_N=4 expires from Redis, the app server treats the conversation as N=1, queries only conv_abc123, and misses all messages in #1, #2, #3. From the user's perspective, recent messages disappear. This is silent data loss — no error, just wrong results.


Requirements for the registry

1. Fast reads     → checked on every write and read (~1ms budget)
2. No TTL         → entries must never expire
3. Durable        → survives Redis restart, server crash, power loss
4. N only goes up → writes are always max(current_N, new_N)

Redis with AOF persistence

Redis with AOF (Append Only File) persistence satisfies all four requirements:

Fast reads:    O(1) hash lookup → ~1ms ✓
No TTL:        SET without EXPIRE → entries never expire ✓
Durable:       AOF logs every write to disk → survives restart ✓
N only up:     SET only if new_N > current_N (Lua script for atomicity) ✓

How AOF works:

Every Redis write → appended to /var/redis/appendonly.aof on disk
Redis restart     → replays AOF file → full recovery of all keys

With AOF set to fsync everysec (the default), at most 1 second of writes are lost on a crash — acceptable for a routing table that updates infrequently.

For stricter durability, fsync always syncs every write to disk before acknowledging — zero data loss but ~10× slower writes. For this registry (updated only when a conversation becomes hot, not on every message), fsync everysec is more than adequate.


The atomic N update

The registry must only ever increase N. A race condition where two hot partition service instances both try to update N simultaneously could result in N being set to a lower value:

Current max_N = 4
Instance A reads max_N=4, computes new_N=3 (traffic dropped), writes 3  ← wrong
Instance B reads max_N=4, computes new_N=5, writes 5  ← correct

Fix: use a Lua script in Redis to make the read-compare-write atomic:

local current = redis.call('HGET', KEYS[1], 'max_N')
local new_n = tonumber(ARGV[1])
if current == false or tonumber(current) < new_n then
  redis.call('HSET', KEYS[1], 'max_N', new_n)
end

This runs atomically — no other Redis command can execute between the read and the write.


Why not the database alone

Storing the registry in DynamoDB instead of Redis seems safe — durable by default, no TTL risk. But:

DynamoDB read latency → ~5-10ms
Redis read latency    → ~1ms

Registry is checked on EVERY message write and read.
At 10k WPS:
  DynamoDB registry: 10k × 10ms = 100 seconds of latency per second → impossible
  Redis registry:    10k × 1ms  = 10 seconds of latency per second  → still too much

Wait — even Redis at 1ms × 10k WPS = 10,000ms of cumulative latency? That's fine — these are parallel reads, not sequential. 10k concurrent registry lookups each taking 1ms means each message write adds 1ms of latency, not 10 seconds. The math is per-request, not cumulative.

Per-request cost:
  Redis registry lookup → 1ms added to each message write
  DynamoDB registry     → 5-10ms added to each message write

1ms is acceptable. 5-10ms on every write eats into the 200ms latency SLO meaningfully. Redis wins.


Interview framing

"The registry is a routing table, not a cache — losing it causes silent data loss, not just a cache miss. So no TTL. Redis with AOF persistence gives us fast reads (~1ms), durability across restarts, and no expiry. The Lua script ensures N only ever increases. DynamoDB alone would add 5-10ms per write — too expensive at this scale."