Ref Counting
Reference counting solves the shared ownership problem — who is responsible for deleting something that multiple people are using?
The answer: nobody deletes it until everyone is done with it.
The Delete Problem¶
Content-addressable storage means multiple users can share the same content row. User A and User B both paste "hello world" — one row in content, two rows in pastes, both pointing at the same content_hash.
Now User A deletes their paste. What happens to the content?
Naive answer: delete the content row.
This is wrong. User B's paste still points to that content_hash. Their link is now broken — they try to read their paste and get nothing. User A's delete broke User B's paste without User B doing anything.
The Fix — Reference Counting¶
The content table has a ref_count column. It tracks how many pastes rows are currently pointing at this content.
User A creates paste → content row inserted with ref_count = 1
User B creates same paste → content already exists → ref_count = 2
User A deletes → ref_count = 2 → 1 (content stays, User B's link works ✓)
User B deletes → ref_count = 1 → 0 (nobody pointing at it → physically delete ✓)
Content is only physically deleted when ref_count reaches zero. Until then, it stays. This is the same pattern used in filesystems (inode link count), Python's garbage collector, and S3 object lifecycle management.
The Double-Delete Problem¶
Reference counting introduces a new problem. What if User A deletes twice?
Initial state: ref_count = 2
User A deletes: ref_count = 2 → 1 ✓
User A deletes again: ref_count = 1 → 0 → content deleted ✗
User B's link just broke because User A called delete twice — maybe due to a retry after a network timeout, maybe a bug in the client. This is the same idempotent delete problem from the API design, now showing up at the data layer.
The Fix — Pastes Row as Ownership Record¶
The solution is to treat the pastes table row as the ownership record. A user owns content as long as their row in pastes exists. When they delete, you DELETE their pastes row and decrement ref_count. If they try to delete again, the pastes row is already gone — there's nothing to delete, so ref_count is not touched.
First delete:
pastes row EXISTS for (userA, shortCode)
→ DELETE pastes row
→ decrement ref_count: 2 → 1 ✓
Second delete (retry):
pastes row DOES NOT EXIST for (userA, shortCode)
→ nothing to do
→ ref_count unchanged: stays at 1 ✓
User B's link is safe. The double-delete is absorbed harmlessly.
This works because the pastes row deletion is idempotent — deleting a row that doesn't exist is a no-op. The ref_count decrement is gated behind the row deletion: no row, no decrement.
Atomicity — Wrapping in a Transaction¶
The DELETE from pastes and the decrement of ref_count in content must happen atomically. If the decrement succeeds but the row deletion fails (or vice versa), you end up with an inconsistent ref_count.
BEGIN;
DELETE FROM pastes WHERE short_code = ? AND user_id = ?;
-- only if a row was actually deleted (affected_rows = 1):
UPDATE content SET ref_count = ref_count - 1 WHERE content_hash = ?;
-- if ref_count reaches 0, schedule physical deletion
COMMIT;
PostgreSQL's ACID guarantees make this safe. Both operations succeed or neither does.
When ref_count Reaches Zero¶
When ref_count hits zero, the content has no owners. Two things need to happen:
- DELETE the row from the
contenttable - Delete the blob from S3
The S3 deletion can be async — it doesn't need to happen in the same transaction. A background job can pick up orphaned S3 objects and clean them up. The content row deletion, however, should happen in the same transaction as the final ref_count decrement to keep the DB consistent.
Interview framing
"ref_count on the content table tracks how many pastes point at it. Create increments, delete decrements. Physical deletion only when ref_count hits zero. Double-delete problem solved by using the pastes row as the ownership record — no row, no decrement. The DELETE from pastes and the ref_count decrement are wrapped in a single transaction for atomicity."