KSUID in Postgresql: A Custom Function for Sortable, Prefixed IdsKSUID in Postgresql: A Custom Function for Sortable, Prefixed Ids
Why KSUID-style IDs over UUIDs?
Using a KSUID-style scheme means each ID is globally unique (128 bits of entropy in the canonical KSUID design, 80 bits of entropy by default in this Postgres function), roughly time-ordered when sorted lexicographically, and human-meaningful via prefixes like app_* or topic_* so you can tell at a glance what it refers to. Unlike UUIDv4, which is random but not sortable by creation time, and UUIDv1, which is time-ordered but leaks MAC address and timestamp details, this format keeps IDs opaque and sortable while being shorter and more ergonomic for your product.
The ordering comes from the layout: timestamp*base62 || random_base62. Because the timestamp chunk is at the beginning and encoded in a monotonic base‑62 representation, sorting IDs as text naturally puts earlier timestamps first. Within the same millisecond, the random suffix breaks ties, so relative order there is effectively random—which is usually fine.
Prefixes add a nice UX layer: app_*, topic_*, and similar markers let you visually parse logs, metrics, and URLs without extra context, and make it trivial to filter or search by ID type just using simple prefix filters.
Implementing a KSUID generator in PostgreSQL with pgCrypto
To make this work inside the database, I use a small PL/pgSQL helper that returns a KSUID-style string and optionally prepends a prefix like app_* or topic_*. PL/pgSQL gives me loops, conditionals, and local variables, which makes it much easier to express the base‑62 conversion and random‑byte handling than raw SQL. I also expose one extra parameter to control how long the random suffix should be.
Here is the full function:
sqlCREATE OR REPLACE FUNCTION ksuid(prefix TEXT DEFAULT NULL, random_length INT DEFAULT 10) RETURNS TEXT AS $$ DECLARE chars TEXT[] := ARRAY['0','1','2','3','4','5','6','7','8','9', 'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z', 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']; timestamp_ms BIGINT; timestamp_b62 TEXT := ''; random_bytes BYTEA; random_b62 TEXT := ''; temp_val BIGINT; char_index INT; i INT; base_id TEXT; BEGIN timestamp_ms := FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000); -- Generate 7-character timestamp part temp_val := timestamp_ms; WHILE temp_val > 0 LOOP char_index := (temp_val % 62) + 1; timestamp_b62 := chars[char_index] || timestamp_b62; temp_val := temp_val / 62; END LOOP; -- Generate random_length-character random part random_bytes := gen_random_bytes(random_length); FOR i IN 0..(random_length - 1) LOOP char_index := (get_byte(random_bytes, i) % 62) + 1; random_b62 := random_b62 || chars[char_index]; END LOOP; base_id := timestamp_b62 || random_b62; IF prefix IS NOT NULL AND prefix != '' THEN RETURN prefix || '_' || base_id; ELSE RETURN base_id; END IF; END; $$ LANGUAGE plpgsql;
The timestamp component (millisecond precision) and base62 encoding
The timestamp_ms variable turns the current wall‑clock time into an integer number of milliseconds:
- Time source:
clock_timestamp()instead ofnow().now()is fixed for the duration of a transaction, which is great for consistency but terrible for ID generation, because every call in the same transaction would see the exact same timestamp.clock_timestamp()gives you the actual wall‑clock time for each call. - Millisecond precision:
EXTRACT(EPOCH FROM clock_timestamp())returns seconds as adouble precision. Multiplying by1000and wrapping it inFLOOR(...)gives a stableBIGINTmillisecond value: one tick per millisecond.
Once I have timestamp_ms, I convert it to base‑62 using the chars array:
charsholds0–9,A–Z,a–z, which is a 62‑character alphabet.- In the
WHILEloop,temp_val % 62picks the least‑significant base‑62 digit, which becomes an index intochars. - I prepend each character (
chars[char_index] || timestamp_b62) so the final string is in the correct order. - Dividing
temp_val := temp_val / 62moves to the next digit, and the loop ends when there is nothing left.
At current Unix timestamps, this produces a 7‑character base‑62 string that grows monotonically over time. Because this timestamp chunk comes first in the final ID, lexicographic sorting on the text ID mirrors creation order.
The random component, random length, and total length
The random part relies on pgcrypto via extensions.gen_random_bytes(random_length):
- By default it asks for 10 random bytes, giving 80 bits of entropy, but you can raise or lower that by passing a different
random_length. - The
FOR i IN 0..(random_length - 1)loop walks each byte and maps it into the same 62‑character alphabet usingget_byte(random_bytes, i) % 62. - Concatenating these characters builds a
random_length‑characterrandom_b62suffix.
That gives a base KSUID of timestamp_b62 (7 chars) + random_b62 (random_length chars, 10 by default) for a 7 + random_length character body:
- Without prefix and default settings: something like
3Fj9aZQmP7s2bLdXy(17 chars total). - With prefix: you can call
SELECT ksuid('app');and get IDs likeapp_3Fj9aZQmP7s2bLdXy, orksuid('topic')fortopic_*IDs. - With a custom random length:
SELECT ksuid('app', 6);shrinks the random suffix for shorter IDs in a dev environment, whileSELECT ksuid('app', 16);gives you more entropy in production.
Because the timestamp is encoded first, these IDs are still sortable by creation time, even though the random portion is completely independent per ID.
Security and correctness concerns
Two lines at the top of the function are there purely for safety:
SECURITY DEFINERlets the function run with the privileges of its owner, so application roles that cannot touchpgcryptodirectly can still callksuid(...).SET search_path TO ''prevents an attacker from swapping in a different function with the same name on a later schema in the search path; insideksuid, every unqualified name must resolve explicitly.
Usage example
In tables, you can now use this function as a default:
sqlCREATE TABLE topics ( id text PRIMARY KEY DEFAULT ksuid('topic'), name text NOT NULL, created_at timestamptz NOT NULL DEFAULT now() );
Every insert that omits id will get a fresh, prefix‑aware KSUID that is unique, time‑sortable, and reasonably compact.
Caveats
Segment’s original KSUID format is a fixed, canonical spec: a 20‑byte binary representation (4‑byte timestamp + 16‑byte random payload) encoded as a 27‑character base‑62 string with second‑level timestamp precision. The function in this post keeps the same high‑level shape—timestamp chunk first, then randomness, plus an optional prefix—but it uses a variable‑length base‑62 Unix timestamp in milliseconds, followed by random_length base‑62 characters of randomness derived from gen_random_bytes(random_length) (10 by default). You get almost all of the nice properties (lexicographic ordering, high entropy, readable prefixes) without strictly following Segment’s byte layout, which is fine as long as you don’t need wire‑level compatibility with canonical KSUID.
Two small gotchas to keep in mind:
- The comment “7‑character timestamp part” in the function is descriptive of today’s timestamps, not a hard guarantee. As Unix time grows, the base‑62 representation will eventually need 8+ characters. That doesn’t break ordering or prefixing—the timestamp is still the first chunk—but you shouldn’t rely on it staying at exactly 7 characters forever.
- If you’re parsing these IDs in application code, avoid assuming a fixed overall length. Instead, treat the ID as
<optional_prefix>_<timestamp_chunk><random_chunk>: split on the underscore to peel off the prefix (if present), then treat the entire remaining leading run of timestamp characters as one chunk and the tail as randomness. That way your code keeps working even if the timestamp grows in length or you changerandom_lengthlater.
- PostgreSQL
- pgcrypto
- KSUID