Interview Prep · Deep Dive

Fractional Indexing with Strings

Interactive tutorial. Learn the algorithm, play with it live, and understand the edge cases that trip everyone up — especially a / aa and a / ab.

1The problem we're solving

You have ordered items. Users drag-reorder them. If positions are integers 1, 2, 3, 4, 5 and a user moves item #5 between #1 and #2, you'd have to renumber 4 items. With concurrent users on a board: write storm, conflicts, broken sort during in-flight updates.

Fractional indexing assigns each item a position that is densely orderable — you can always generate a new value between any two existing values, without modifying neighbors.

Strings vs decimals — why strings win

ApproachCapacityWhat breaks
JS Number (IEEE 754 double)~52 bits mantissa → ~16 decimal digitsAfter ~50 inserts at the same gap, you run out of precision and have to renumber
DECIMAL(precision, scale) in DBBounded by declared precision (e.g., 65 digits in MySQL)Hard ceiling. Must rebalance.
Strings (lex-ordered)Effectively unbounded — just add another charLength grows unboundedly under pathological patterns; rebalance triggered by length threshold, not arithmetic precision

You correctly remembered that with decimals, you hit a hard limit (~52 mantissa bits ≈ 16 decimal digits, give or take). Strings dodge that — but trade arithmetic precision for variable length. The new question becomes: "how long is too long?" (answered in §6).

2The midpoint algorithm

Given two strings a < b (lex order), produce a string x with a < x < b. We'll use lowercase a-z (base-26) for clarity. Real libraries use base-62 or base-94 for better density.

Mental model

Think of each string as a path through a tree. Each character drills one level deeper. Inserting between two strings means finding a path that branches off between them.

Three cases the algorithm handles

  1. Gap > 1 character at the first differing position → pick the midpoint char. Done in O(1). Example: "a" and "c""b".
  2. Gap = 1 character at the first differing position → no single char fits, so keep a's char and append something that grows past a's rest. Example: "a" and "b""an".
  3. One string is a prefix of the other → the shorter one is a parent; descend and find room beneath. Example: "a" and "ab""aa".

The forbidden case

Trailing first-letter trap. If the upper bound ends in 'a' (the smallest char), there's no room below it in a-z. Example: between "a" and "aa", no valid string exists in pure a-z. Real libraries enforce an invariant: "no key may end in the alphabet's first character" — so this state is never reached if the library generated all keys.

3Try it live

Enter any two lowercase strings (a-z only). The midpoint algorithm runs step-by-step.

<
<
Enter values…

Quick examples — click to load

a < ? < c Simple midpoint
Gap of 2 chars. The algorithm picks 'b' in O(1). Classic case.
a < ? < b No room — must extend
Adjacent chars. The algorithm keeps 'a' and appends a midpoint char from a-z (roughly the middle: 'n'). Result: "an". Length grew by one.
a < ? < ab Prefix case
a is a prefix of ab. Keep 'a', then we need something < 'b' at position 1. 'a' fits. Result: "aa". But notice: "aa" ends in 'a' — the forbidden state. The next insert between "a" and "aa" will fail!
a < ? < aa Impossible in a-z
No valid string exists in lowercase a-z. Any candidate must start with 'a' and then have a char < 'a' — there is none. Real libraries forbid keys ending in 'a' to prevent ever reaching this state.
an < ? < b Cross-prefix midpoint
First chars differ by 1. Keep 'a', append something > 'n'. Result: "at"-ish (midpoint of 'n' and end-of-alphabet).
abc < ? < abe Common prefix
Common prefix "ab". Recurse on "c" vs "e""d". Result: "abd".

4Visual intuition

Map strings onto a [0, 1) number line. Each char is a base-26 digit. "a" = 0.0, "n" ≈ 0.5, "z" ≈ 0.96. "aa" ≈ 0.0 + 0.0/26 = 0.0…01. The midpoint algorithm is finding a value between two real numbers in this base-26 representation.

Each marker shows where the current example's strings land on [0, 1). Notice how "a" and "aa" are visually almost on top of each other — there's vanishingly little room between them.

5Stress test — watch the string grow

This is what happens when a user keeps inserting at the same spot (e.g., always at the top of a board, or repeatedly between two specific items). Each insert needs to fit between the same lower bound and the previous insert — forcing the string to grow.

Inserts
0
Max length
0
Avg length
0
Status
Idle

6When is a string "too long"?

Unlike floating-point, strings don't have a hard precision wall — they just keep growing. But you do need a soft cap for practical reasons:

What costs grow with length

  • DB index size. B-tree pages fit fewer entries as keys grow → more I/O. MySQL VARCHAR indexes are typically capped at 767–3072 bytes.
  • Comparison cost. String comparison is O(length). With a 100-char position, every sort comparison touches 100 bytes.
  • Network & storage. Every API response includes positions. 1M items × 80 bytes overhead per row adds up.
  • Memory. Client sort of 10k items × 100-char strings = noticeable.

Typical thresholds

  • Soft warning: any single key > 32 chars.
  • Rebalance trigger: any key > 50 chars, OR average key length > 16, OR ratio of (max len / log26(N)) > 4.
  • Hard cap: 256 chars at the DB column level. Reject inserts that would exceed.
  • Trello's hybrid: 53-bit doubles until precision runs out, then switch to string suffix. Hits string territory after ~50 inserts at the same gap.

Growth rate math

With base-26 and a balanced midpoint algorithm: each insert at the same spot halves the remaining gap, requiring roughly log26(2) ≈ 0.21 extra characters per insert on average. So:

Inserts at same gapExpected length
10~3 chars
100~5 chars
1,000~7 chars
10,000~10 chars
1,000,000~15 chars

Base-62 (digits + upper + lower) does even better: log62(2) ≈ 0.17 chars per insert. In practice, you'll almost never hit pathological lengths organically — they happen via bots, scripts, or adversarial input.

The "recycling" idea you mentioned (decimals)

With decimal-based fractional indexing (Trello's original approach): after precision is exhausted within JS's 53-bit double, they "recycle" by appending a string suffix. So 1.5 becomes 1.5|abc-style — the prefix is the numeric part, the suffix breaks ties when precision is gone. Pure string-based systems skip this entirely — there's no precision to exhaust, only length to manage.

7Rebalancing — the escape hatch

When keys get too long (or as a periodic maintenance task), you rebalance: rewrite all positions evenly spaced across the available space.

Strategy

  1. Lock or snapshot the board's items.
  2. Sort by current position.
  3. Generate new positions evenly spaced: "d", "h", "l", "p", "t", "x" (or similar — leave headroom at both ends).
  4. Write all new positions in one transaction.
  5. If real-time clients are watching: broadcast a "full resort" event so they refetch.

When to trigger

Concurrent inserts during rebalance

Hard problem. If you rebalance while a user is dragging an item, their pending position becomes stale. Solutions: (a) use an optimistic lock with version number per board, retry on conflict; (b) rebalance via a CRDT-aware position generator that's safe to interleave; (c) only rebalance during low-traffic windows.

8The reference implementation

The full algorithm in JavaScript. This is essentially what the calculator above runs.

const FIRST = 'a';
const LAST  = 'z';

// Returns a string x with a < x < b (lex order).
// Throws if a >= b or if the bounds cannot fit a string in a-z.
function midpoint(a, b) {
  if (a >= b) throw new Error(`a (${a}) must be < b (${b})`);

  // 1. Strip common prefix
  let i = 0;
  while (i < a.length && i < b.length && a[i] === b[i]) i++;
  const prefix = b.slice(0, i);

  // 2. a is a prefix of b: descend into b's territory
  if (i === a.length) {
    const cb = b.charCodeAt(i);
    if (cb > FIRST.charCodeAt(0)) {
      // pick any char in [FIRST, cb)
      return prefix + String.fromCharCode(
        Math.floor((FIRST.charCodeAt(0) + cb) / 2)
      );
    }
    // cb === 'a': must recurse deeper - the trailing 'a' trap
    return prefix + FIRST + midpoint('', b.slice(i + 1));
  }

  // 3. Both have chars at i. Look at the gap.
  const ca = a.charCodeAt(i);
  const cb = b.charCodeAt(i);
  if (cb - ca > 1) {
    // room for a midpoint char
    return prefix + String.fromCharCode(Math.floor((ca + cb) / 2));
  }

  // 4. Adjacent chars (cb = ca+1). Keep a's branch, extend upward.
  return prefix + a[i] + extendUp(a.slice(i + 1));
}

// Shortest string strictly > suffix.
function extendUp(suffix) {
  if (suffix === '') {
    // any char in (FIRST, LAST]; pick midpoint for nice spacing
    return String.fromCharCode(
      Math.floor((FIRST.charCodeAt(0) + LAST.charCodeAt(0) + 1) / 2)
    );
  }
  const c = suffix.charCodeAt(0);
  if (c < LAST.charCodeAt(0)) {
    return String.fromCharCode(Math.floor((c + LAST.charCodeAt(0) + 1) / 2));
  }
  // suffix[0] === 'z': keep and recurse
  return suffix[0] + extendUp(suffix.slice(1));
}

Production libraries

9What to say in the interview

The 60-second pitch. "I'd use fractional indexing — store positions as strings, generate new positions by finding a midpoint between neighbors. Insert is O(1) writes since we only touch the new item. Cost: keys grow in length under repeated inserts at the same spot, so we'd add a rebalancing job triggered when any key exceeds ~50 characters."

Follow-ups they'll ask

  1. "What if two users insert at the same gap simultaneously?" → Both compute the same midpoint, collide. Resolve via (a) server-side jitter (append random char), (b) tie-break on user_id or timestamp, (c) CRDT-style generator that includes site_id in the key.
  2. "What's the worst case for string length?" → User repeatedly inserts at the same spot. Each insert adds ~log26(2) chars. After 1M inserts: ~15 chars. After 1B: ~21 chars. Pathological but bounded.
  3. "How do you rebalance without downtime?" → Optimistic locking with board version. Rebalance writes new positions + bumps version. In-flight inserts retry against new version.
  4. "Why not just use floats?" → Precision ceiling at 52 bits. After ~50 inserts at the same gap, you can't fit a new value. Then you'd have to renumber anyway. Strings push the problem off indefinitely.
  5. "What alphabet would you use?" → Base-62 (0-9, A-Z, a-z) for density and URL-safety. Avoid characters that need escaping in JSON / URLs.