UUID Base64 Trim
Building on the previous
Random generation degrades as the DB fills up — too many collision retries at high fill rates. The root cause is that randomness has no coordination — two servers can independently generate the same code. What if we used a UUID — a standard that guarantees global uniqueness without any coordination?
What is a UUID?¶
UUID (Universally Unique Identifier) is a 128-bit number generated using a combination of timestamp, machine identifier, and randomness. The standard is designed so that any machine anywhere can generate a UUID and it will not collide with any UUID generated by any other machine — ever.
Example UUID → 550e8400-e29b-41d4-a716-446655440000
No DB check needed. No coordination between servers. Just generate and use.
The length problem — first, why not base16?¶
A UUID is 128 bits. The simplest encoding is base16 (hex) — the same format UUIDs are usually displayed in.
Step 1 — how many bits per character in base16?
Base16 has 16 possible characters per position (0-9, a-f). To represent 16 different values, you need exactly enough bits so that 2^bits = 16:
2^1 = 2 → not enough
2^2 = 4 → not enough
2^3 = 8 → not enough
2^4 = 16 ✓
So 1 base16 character = 4 bits
Step 2 — how many characters to encode a 128-bit UUID?
UUID size = 128 bits
Bits per character = 4 (base16)
Characters needed = 128 / 4 = 32 characters
That's the raw hex UUID you already know — 550e8400-e29b-41d4-a716-446655440000 — 32 characters plus dashes. Completely unusable as a short code.
The rule: higher base = more bits packed per character = fewer characters needed. Base16 is too low. We need to go higher.
Moving to base64¶
Step 1 — how many bits per character in base64?
Base64 has 64 possible characters per position. Same logic:
2^1 = 2 → not enough
2^2 = 4 → not enough
2^3 = 8 → not enough
2^4 = 16 → not enough
2^5 = 32 → not enough
2^6 = 64 ✓
So 1 base64 character = 6 bits
Step 2 — how many characters to encode a 128-bit UUID?
UUID size = 128 bits
Bits per character = 6 (base64)
Characters needed = 128 / 6 = 21.3 → round up to 22 characters
So the full base64 encoding of a UUID is 22 characters long. That is not a short URL — that's longer than most actual paths.
bit.ly/550e8400e29b41d4a716446655440000 ← 32 chars hex, horrible UX
bit.ly/VQ6EAOKbQdSnFkRmVUQA ← 22 chars base64, still too long
The temptation — trim to 6 characters¶
You already have the uniqueness from the UUID. Can you just take the first 6 characters of the base64 encoding?
UUID base64 → VQ6EAOKbQdSnFkRmVUQA
Trimmed → VQ6EAO
Short URL → bit.ly/VQ6EAO
No. This is exactly the same mistake as trimming a hash.
The UUID's uniqueness comes from all 128 bits working together. The first 6 characters of the base64 encoding represent only 36 bits (6 chars × 6 bits). Two different UUIDs can easily share the same first 36 bits:
UUID 1 → VQ6EAOKbQdSnFkRmVUQA → trimmed → VQ6EAO
UUID 2 → VQ6EAO1x9pTrWzHqMnLB → trimmed → VQ6EAO ← collision ✗
You have thrown away the bits that made them different. Trimming a UUID breaks uniqueness just as surely as trimming a hash.
The fundamental tension¶
UUID gives you uniqueness across distributed systems — but at 22 characters, it's too long. Trimming gets you to 6 characters — but destroys the uniqueness guarantee.
Full UUID encoding → unique ✓ / too long ✗
Trimmed UUID → short ✓ / not unique ✗
You cannot have both by trimming. You need a different kind of ID — one that is both short enough and uniquely structured.
Why this fails
Trimming a UUID discards bits and breaks the uniqueness guarantee. The same collision problem from approach 2 (hashing + trim) reappears here. Trimming any large unique identifier down to a short code always causes this — you cannot trim your way to uniqueness.
Next: UUID is 128 bits. That's more than we need. From the estimation, 36 bits is enough to cover 50 billion URLs. What if we used a 64-bit ID instead — one designed specifically for distributed systems? That's Snowflake.