All posts

KSUID in Postgresql: A Custom Function for Sortable, Prefixed Ids

6 min read

Why KSUID-style IDs over UUIDs?

Using a KSUID-style scheme means each ID is globally unique (128 bits of entropy in the canonical KSUID design, 80 bits of entropy by default in this Postgres function), roughly time-ordered when sorted lexicographically, and human-meaningful via prefixes like app_* or topic_* so you can tell at a glance what it refers to. Unlike UUIDv4, which is random but not sortable by creation time, and UUIDv1, which is time-ordered but leaks MAC address and timestamp details, this format keeps IDs opaque and sortable while being shorter and more ergonomic for your product.

The ordering comes from the layout: timestamp*base62 || random_base62. Because the timestamp chunk is at the beginning and encoded in a monotonic base‑62 representation, sorting IDs as text naturally puts earlier timestamps first. Within the same millisecond, the random suffix breaks ties, so relative order there is effectively random—which is usually fine.

Prefixes add a nice UX layer: app_*, topic_*, and similar markers let you visually parse logs, metrics, and URLs without extra context, and make it trivial to filter or search by ID type just using simple prefix filters.

Implementing a KSUID generator in PostgreSQL with pgCrypto

To make this work inside the database, I use a small PL/pgSQL helper that returns a KSUID-style string and optionally prepends a prefix like app_* or topic_*. PL/pgSQL gives me loops, conditionals, and local variables, which makes it much easier to express the base‑62 conversion and random‑byte handling than raw SQL. I also expose one extra parameter to control how long the random suffix should be.

Here is the full function:

sql
CREATE OR REPLACE FUNCTION ksuid(prefix TEXT DEFAULT NULL, random_length INT DEFAULT 10) RETURNS TEXT AS $$ DECLARE chars TEXT[] := ARRAY['0','1','2','3','4','5','6','7','8','9', 'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z', 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']; timestamp_ms BIGINT; timestamp_b62 TEXT := ''; random_bytes BYTEA; random_b62 TEXT := ''; temp_val BIGINT; char_index INT; i INT; base_id TEXT; BEGIN timestamp_ms := FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000); -- Generate 7-character timestamp part temp_val := timestamp_ms; WHILE temp_val > 0 LOOP char_index := (temp_val % 62) + 1; timestamp_b62 := chars[char_index] || timestamp_b62; temp_val := temp_val / 62; END LOOP; -- Generate random_length-character random part random_bytes := gen_random_bytes(random_length); FOR i IN 0..(random_length - 1) LOOP char_index := (get_byte(random_bytes, i) % 62) + 1; random_b62 := random_b62 || chars[char_index]; END LOOP; base_id := timestamp_b62 || random_b62; IF prefix IS NOT NULL AND prefix != '' THEN RETURN prefix || '_' || base_id; ELSE RETURN base_id; END IF; END; $$ LANGUAGE plpgsql;

The timestamp component (millisecond precision) and base62 encoding

The timestamp_ms variable turns the current wall‑clock time into an integer number of milliseconds:

  • Time source: clock_timestamp() instead of now(). now() is fixed for the duration of a transaction, which is great for consistency but terrible for ID generation, because every call in the same transaction would see the exact same timestamp. clock_timestamp() gives you the actual wall‑clock time for each call.
  • Millisecond precision: EXTRACT(EPOCH FROM clock_timestamp()) returns seconds as a double precision. Multiplying by 1000 and wrapping it in FLOOR(...) gives a stable BIGINT millisecond value: one tick per millisecond.

Once I have timestamp_ms, I convert it to base‑62 using the chars array:

  • chars holds 0–9, A–Z, a–z, which is a 62‑character alphabet.
  • In the WHILE loop, temp_val % 62 picks the least‑significant base‑62 digit, which becomes an index into chars.
  • I prepend each character (chars[char_index] || timestamp_b62) so the final string is in the correct order.
  • Dividing temp_val := temp_val / 62 moves to the next digit, and the loop ends when there is nothing left.

At current Unix timestamps, this produces a 7‑character base‑62 string that grows monotonically over time. Because this timestamp chunk comes first in the final ID, lexicographic sorting on the text ID mirrors creation order.

The random component, random length, and total length

The random part relies on pgcrypto via extensions.gen_random_bytes(random_length):

  • By default it asks for 10 random bytes, giving 80 bits of entropy, but you can raise or lower that by passing a different random_length.
  • The FOR i IN 0..(random_length - 1) loop walks each byte and maps it into the same 62‑character alphabet using get_byte(random_bytes, i) % 62.
  • Concatenating these characters builds a random_length‑character random_b62 suffix.

That gives a base KSUID of timestamp_b62 (7 chars) + random_b62 (random_length chars, 10 by default) for a 7 + random_length character body:

  • Without prefix and default settings: something like 3Fj9aZQmP7s2bLdXy (17 chars total).
  • With prefix: you can call SELECT ksuid('app'); and get IDs like app_3Fj9aZQmP7s2bLdXy, or ksuid('topic') for topic_* IDs.
  • With a custom random length: SELECT ksuid('app', 6); shrinks the random suffix for shorter IDs in a dev environment, while SELECT ksuid('app', 16); gives you more entropy in production.

Because the timestamp is encoded first, these IDs are still sortable by creation time, even though the random portion is completely independent per ID.

Security and correctness concerns

Two lines at the top of the function are there purely for safety:

  • SECURITY DEFINER lets the function run with the privileges of its owner, so application roles that cannot touch pgcrypto directly can still call ksuid(...).
  • SET search_path TO '' prevents an attacker from swapping in a different function with the same name on a later schema in the search path; inside ksuid, every unqualified name must resolve explicitly.

Usage example

In tables, you can now use this function as a default:

sql
CREATE TABLE topics ( id text PRIMARY KEY DEFAULT ksuid('topic'), name text NOT NULL, created_at timestamptz NOT NULL DEFAULT now() );

Every insert that omits id will get a fresh, prefix‑aware KSUID that is unique, time‑sortable, and reasonably compact.

Caveats

Segment’s original KSUID format is a fixed, canonical spec: a 20‑byte binary representation (4‑byte timestamp + 16‑byte random payload) encoded as a 27‑character base‑62 string with second‑level timestamp precision. The function in this post keeps the same high‑level shape—timestamp chunk first, then randomness, plus an optional prefix—but it uses a variable‑length base‑62 Unix timestamp in milliseconds, followed by random_length base‑62 characters of randomness derived from gen_random_bytes(random_length) (10 by default). You get almost all of the nice properties (lexicographic ordering, high entropy, readable prefixes) without strictly following Segment’s byte layout, which is fine as long as you don’t need wire‑level compatibility with canonical KSUID.

Two small gotchas to keep in mind:

  • The comment “7‑character timestamp part” in the function is descriptive of today’s timestamps, not a hard guarantee. As Unix time grows, the base‑62 representation will eventually need 8+ characters. That doesn’t break ordering or prefixing—the timestamp is still the first chunk—but you shouldn’t rely on it staying at exactly 7 characters forever.
  • If you’re parsing these IDs in application code, avoid assuming a fixed overall length. Instead, treat the ID as <optional_prefix>_<timestamp_chunk><random_chunk>: split on the underscore to peel off the prefix (if present), then treat the entire remaining leading run of timestamp characters as one chunk and the tail as randomness. That way your code keeps working even if the timestamp grows in length or you change random_length later.

  • PostgreSQL
  • pgcrypto
  • KSUID
Foxy seeing you here!
Let's chat!
Logo