Skip to content

Sharding

Do we need manual sharding?

Sharding means splitting data across multiple nodes so no single node holds everything or handles all traffic. In Postgres you manage this yourself. In DynamoDB, it's handled internally — what DynamoDB calls "partitions" is what Postgres engineers call "shards." The question is whether we need to intervene manually or let DynamoDB handle it.


Sharding vs Partitioning — same concept, different ownership

Postgres sharding:
  → You decide how many shards
  → You build the routing layer (which shard gets this conversation_id?)
  → You manage rebalancing when nodes are added or removed
  → You handle cross-shard queries
  → Fully manual, fully your problem

DynamoDB partitioning:
  → DynamoDB decides how many partitions
  → DynamoDB handles routing internally via consistent hashing
  → DynamoDB rebalances automatically as data grows
  → You never think about it — unless a partition gets too hot (covered next)

This is one of the core reasons DynamoDB was chosen over Postgres — not just write throughput, but eliminating the entire operational burden of manual sharding.


Do the numbers require manual intervention?

Let's check both throughput and storage against DynamoDB's per-partition limits:

Throughput:

Peak WPS            → 20k writes/sec
Peak RPS            → 20k reads/sec
Total ops/sec       → 40k ops/sec

DynamoDB per-partition limits:
  Write  → 1,000 WPS
  Read   → 3,000 RPS
  Total  → 4,000 ops/sec per partition

Partitions needed for throughput = 40,000 / 4,000 = 10 partitions minimum

10 partitions for throughput — trivial.

Storage:

Hot data (90 days)           → 22.5 TB = 22,500 GB
DynamoDB per-partition limit → 10 GB

Partitions needed for storage = 22,500 / 10 = 2,250 partitions

Storage dominates. DynamoDB automatically creates ~2,250 partitions to hold 22.5 TB.

Throughput capacity unlocked by those 2,250 partitions:

2,250 partitions × 4,000 ops/sec = 9,000,000 ops/sec available
Actual need                      = 40,000 ops/sec

Headroom: 225× more capacity than needed

The storage requirement forces DynamoDB to create far more partitions than throughput alone would need. Those extra partitions come with massive throughput headroom as a side effect.


Conclusion — no manual sharding needed

DynamoDB auto-partitions based on storage and throughput. With 22.5 TB of hot data, it creates ~2,250 internal partitions automatically. No routing layer to build, no rebalancing to manage, no cross-shard queries to handle.

The only manual intervention needed is hot partition salting — when a specific conversation_id generates too many writes to a single partition. That's a throughput concentration problem, not a sharding problem. It's covered in the next section.

Manual sharding (Postgres style) → not needed
DynamoDB auto-partitioning       → handles it
Hot partition salting            → only for conversation-level throughput spikes

Interview framing

DynamoDB handles partitioning automatically. With 22.5 TB of hot data it creates ~2,250 partitions, giving us 9M ops/sec of capacity against our 40k ops/sec need — 225× headroom. No manual sharding needed. The only manual intervention is hot partition salting for individual conversations that exceed 1,000 WPS — which is a different problem.