Skip to content

Read Query

Reads must account for all partitions a conversation has ever used

Once a conversation is salted, its messages are physically distributed across N DynamoDB partitions. A read that only queries one partition returns incomplete history. The app server must scatter the query across all N partitions and merge the results.


The naive read — broken for salted conversations

Without accounting for salting, a chat history read looks like:

Query: PK = conv_abc123, SK < cursor, LIMIT 20, ORDER DESC

This works for N=1 conversations. For a salted conversation with max_N=4, this query only hits the unsuffixed partition key — which may contain very few or no messages (all messages went to #0, #1, #2, #3). The result is empty or incomplete.


The correct read — scatter-gather

Step 1 — Check the hot partition registry:

GET registry[conv_abc123]
→ max_N = 4

Step 2 — Fire N queries in parallel, one per salted partition:

Query 1: PK = conv_abc123#0, SK < cursor, LIMIT 20, ORDER DESC
Query 2: PK = conv_abc123#1, SK < cursor, LIMIT 20, ORDER DESC
Query 3: PK = conv_abc123#2, SK < cursor, LIMIT 20, ORDER DESC
Query 4: PK = conv_abc123#3, SK < cursor, LIMIT 20, ORDER DESC

All 4 queries run simultaneously — total latency is bounded by the slowest query, not the sum.

Step 3 — Merge results by timestamp:

Results from all 4 partitions (up to 80 messages total):
  t=1005 "good thanks"   (from #1)
  t=1004 "how are you?"  (from #2)
  t=1003 "hi!"           (from #3)
  t=1002 "hey"           (from #0)
  ...

Sort by timestamp DESC → take top 20

Step 4 — Return the 20 most recent messages to the client.


The fast path for normal conversations

For the 99.9% of conversations that were never hot (max_N=1 or not in registry):

GET registry[conv_xyz999]
→ null (not in registry) → treat as N=1

Query: PK = conv_xyz999, SK < cursor, LIMIT 20, ORDER DESC
→ single partition read, no scatter-gather

Zero overhead. The scatter-gather only activates for conversations that were ever hot.


Latency impact

Normal conversation (N=1):
  1 DynamoDB read → ~5ms

Hot conversation (N=4):
  4 parallel DynamoDB reads → ~5ms (parallel, bounded by slowest)
  + merge 80 messages in memory → ~1ms
  Total → ~6ms

Hot conversation (N=10):
  10 parallel reads → ~5ms
  + merge 200 messages → ~2ms
  Total → ~7ms

The scatter-gather adds minimal latency because all queries run in parallel. The merge is in-memory and fast. Even at N=10, the read is only ~2ms slower than a single partition read.


Cursor pagination with scatter-gather

Pagination works the same way — the cursor (timestamp#message_id) is applied to all N partition queries:

First page (no cursor):
  All N partitions: LIMIT 20, ORDER DESC → take top 20 overall

Next page (cursor = "1713087600000#msg_xyz789"):
  All N partitions: SK < "1713087600000#msg_xyz789", LIMIT 20, ORDER DESC
  → merge → top 20 messages before the cursor

The cursor is consistent across all partitions because it's a sort key value — the same sort key range applies regardless of which salted partition holds the message.


Interview framing

"For salted conversations, reads scatter across all N partitions in parallel and merge by timestamp. Latency is bounded by the slowest partition — not the sum — so N=4 adds only ~1ms over a single read. Normal conversations (N=1) are unaffected. The registry lookup tells the app server whether to scatter or not."