Every problem from the prep guide with detailed solutions, code, follow-ups, and live interactive demos where applicable. P = previous, L = high likely.
Previous Round Questions 3 problems
The questions you got last time. Have these cold. They may repeat verbatim or as a close variant.
1
Notification system with users, groups, sets
PreviousHardSets / Hashmaps
Problem
Users belong to groups. You can subscribe a user or a group to a notification channel. When an event fires, deliver to every subscriber exactly once — even if a user is subscribed both directly and via multiple groups.
Why sets are the answer
Dedup: Same user via direct + group = one notification.
O(1) membership: "Is user X subscribed?"
Set union: Merge recipients from N groups in one operation.
Mute / block per user. Per-channel block set, subtract during notify.
Hierarchical groups. BFS from subscribed groups; use visited set for cycles.
Permission check. Intersect recipients with the resource ACL set.
Storm prevention. LRU + TTL of (user, hash(message)) before delivery.
Scale beyond memory. Move sets to Redis: SADD / SUNION. For huge groups, fan out async.
2
Fractional indexing for sortable items
PreviousHardString / Algorithms
Quick summary
Store positions as lex-orderable strings. Insert between A and B → generate a midpoint string. No neighbor renumbering. Insert is O(1) writes.
Full interactive tutorial:fractional-indexing-tutorial.html — with the live midpoint calculator, edge cases (a/aa, a/ab), stress test, and string-growth analysis.
The 60-second pitch
"Store positions as strings. To insert between A and B, compute the lex midpoint. Insert is O(1). Cost: keys grow under repeated inserts at the same spot — log26(2) chars per insert. After 1M inserts at the same gap: ~15 chars. Add a rebalancing job triggered when any key exceeds ~50 chars."
Three cases of the midpoint algorithm
Gap > 1 char at first diff: pick midpoint char. "a"/"c" → "b".
Forbidden:"a" < ? < "aa" has no answer in a-z. Real libs enforce "no key ends in alphabet's first char" to prevent ever reaching this state.
3
Worker pool for scheduled automations
PreviousHardDistributed Systems
Problem
Users set up automations: "When status changes, in 2 hours send a Slack message." Tens of millions of pending tasks. Each has a fire-at timestamp. Execute reliably, near exactly-once, with bounded latency.
Architecture
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Automation │────▶│ Scheduler │────▶│ Delayed Queue │
│ Trigger API │ │ (writer) │ │ (Redis ZSET │
└─────────────┘ └──────────────┘ │ scored by ts) │
└────────┬────────┘
│ poll due
▼
┌─────────────────┐
│ Dispatcher │ leases tasks
│ (1 per shard) │ publishes work
└────────┬────────┘
▼
┌─────────────────┐
│ Work Queue │ Kafka topic
│ per action │ per integration
└────────┬────────┘
▼
┌─────────────────┐
│ Worker Pool │ autoscaled
│ acks/retries │ → DLQ on fail
└─────────────────┘
Live simulation
Worker Pool Simulator
⏰ Delayed queue (waiting for fire-at)
⚙ Workers (executing)
Scheduled
0
Completed
0
Retries
0
DLQ
0
Critical design questions to nail
At-least-once + idempotent handlers — true exactly-once is a myth.
Leases for dispatch: atomic SET NX EX. Crashed workers' leases expire and tasks become claimable again.
Partitioning: hash by account_id. Stops one bad tenant from starving others.
Backpressure / retry: exp backoff (1m, 5m, 30m, 2h, 12h). Move to DLQ after 5 fails.
Cancellations: check task.status === 'pending' before execute.
-- KEYS[1]=user_key, ARGV[1]=now, ARGV[2]=window, ARGV[3]=limitlocal key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count >= limit thenreturn0end
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, window)
return1
Follow-ups
Token bucket vs sliding window. Token bucket = bursty traffic allowed. Sliding = strict over window.
Multi-tier limits. Per-user AND per-IP AND per-endpoint. Layer them.
What if Redis is down? Fail open (allow) or fail closed (deny). Discuss tradeoff.
3
Topological sort + cycle detection
MediumDAG / DFS
Problem (monday.com framing)
Formula columns can reference other columns: Total = Quantity × Price. When user adds a new formula, detect cycles and return evaluation order.
Live demo
Dependency Graph
depends on
Code (DFS with 3 colors)
deftopo_sort(graph): # node -> [deps]
WHITE, GRAY, BLACK = 0, 1, 2
color = {n: WHITE for n in graph}
order = []
defdfs(n):
if color[n] == GRAY:
raiseValueError(f"Cycle at {n}")
if color[n] == BLACK: return
color[n] = GRAY
for d in graph.get(n, []):
dfs(d)
color[n] = BLACK
order.append(n)
for n in graph:
dfs(n)
return order
Kahn's algorithm (alternative)
BFS-based. Track in-degrees, repeatedly take zero-in-degree nodes. If you can't process all nodes → cycle exists. O(V+E) same as DFS.
4
Activity feed — collapse duplicates in window
High likelyMediumHashmap + Window
Problem
Stream of events (user, action, item). Group consecutive same-user-same-item events within 5 minutes into one feed entry: "Vikas updated Status 3 times."
Live demo
Event Stream → Collapsed Feed
Window: 5 minutes. Same (user, item) within window → grouped.
Memory cleanup: expire active groups whose last_ts < now - WINDOW.
Different action types in same group: usually flatten ("updated 3 times" hides which fields). Keep field names if needed.
Cross-item collapse: "Vikas made 5 changes" across many items. Different grouping key.
5
Permission tree walker
High likelyHardTree / ACL
Problem
Workspaces contain boards. Boards contain items. Permissions can be set at any level. Implement can_view(user, item).
Live demo
Visualize who can see what
User:
Code
defcan_view(user, item):
for node in [item, item.board, item.board.workspace]:
decision = node.acl.decide(user)
if decision in (ALLOW, DENY):
return decision == ALLOW
ifnot node.inherits:
returnFalsereturnFalse# default deny
Follow-ups
Bulk list: "all items user can see on a board." Precompute user's effective ACL once, intersect with item set.
Groups: user inherits permissions from groups. Walk the user's group set.
Cache: key on (user, item, version). Invalidate on ACL change.
6
Parse mentions in a comment
EasyStrings / Sets
Problem
Given comment text and a user/group directory, return distinct user_ids to notify. Watch for @group expanding to multiple users, and dedup.
Live demo
Mention Resolver
Known groups: @team=[alice,bob], @devs=[bob,carol]. Known users: @alice, @bob, @carol, @dave.
Code
import re
defresolve_mentions(text, users, groups):
# users: set[str], groups: dict[str, set[str]]
mentions = re.findall(r"@(\w+)", text)
recipients = set()
for name in mentions:
if name in users:
recipients.add(name)
elif name in groups:
recipients |= groups[name]
return recipients # dedup via set
Same set-union pattern as Q1. If they ask this, default to set for resolution + dedup.
7
Top K active items in last hour
MediumHeap / Streaming
Problem
Stream of activity events. Periodically return top K items by event count in the last hour.
Approach
Counts per item using a hashmap.
Sliding window: deque of (timestamp, item) → on read, evict old, decrement count.
If item space is huge (millions) and you need approximate counts, use Count-Min Sketch — bounded memory, bounded error, ideal for streaming top-K at scale.
Product Engineering 6 problems
Design + code a user-facing feature. They'll judge your UX intuition along with your code.
1
Real-time collaborative cursors
High likelyHardWebSockets / Pub-Sub
Problem
Show other users' cursors and selections live on the same board. Sub-100ms latency, handle 50+ users.
Architecture
Client ─[WebSocket]─▶ Edge LB ─▶ WS Server ─[pub]─▶ Redis Pub/Sub
▲ │
└──[sub]─────────────┘
│
broadcast to all
other clients on board
Implementation details
Client opens WS, subscribes to board:{id}.
Throttle cursor emit to ~50ms (20Hz). Don't flood.
Server broadcasts via Redis pub/sub so it works across N WS servers.
Don't persist cursor data — ephemeral. Only presence (who's online) goes to a Redis SET.
Presence heartbeat every 30s. On disconnect/timeout: SREM from presence.
Gotchas to mention
Privacy: verify user has board access before subscribing. Re-check on every event.
Reconnection storms: add jitter on reconnect (1–5s random delay).
Mobile / background tabs: pause emit when not visible.
Dropped events: cursor position is idempotent — losing one is fine.
2
Filter + group items on a board
High likelyMediumTradeoffs
Decision framework
Item count
Strategy
Why
< 500
Client-side
Already loaded. Instant. Zero server cost.
500 – 10k
Hybrid: server-side filter, client paginate
Index on filterable cols. Stream pages.
> 10k
Server-side, virtualize UI
Never load all client-side. Window of ~50 rows visible.
Collaborative undo: Alice's undo shouldn't trample Bob's later edit. Common policy: each user undoes only their own actions, rebased against current state.
Server authority: client undo sends command to server. Server validates, applies, broadcasts.
Limits: cap undo stack at ~50 entries. Older actions become non-undoable.
4
Search across all boards
MediumElasticsearch / Indexing
Stack
Elasticsearch/OpenSearch index per workspace (or sharded).
Document includes permissions: { users: [...], groups: [...] } for ACL filter.
Event Bus (Kafka): all item changes published. Partition by item_id for ordering.
Subscription Registry: board_id → list of (url, secret, event_types).
Dispatcher: consumes events, looks up subs, enqueues delivery jobs.
Delivery Workers: HMAC-sign payload, POST to URL, parse response.
Retry/DLQ: exp backoff (1m, 5m, 30m, 2h, 12h). After 5 failures → DLQ + user dashboard alert.
Critical concerns to verbalize
Ordering: Webhooks for same item must arrive in order. Kafka partition by item_id guarantees this.
Slow consumers: One slow URL can't block others. Per-subscription delivery queues OR worker pools partitioned by URL.
Security: HMAC-SHA256 sign payload. User verifies signature. Reject SSRF (no 127.0.0.1, no metadata endpoints).
Replay: User dashboard shows failed deliveries with "Replay" button. Re-enqueue from event log.
Idempotency: Send X-Event-Id header; instruct subscribers to dedup.
2
Real-time board updates (multi-user editing)
High likelyHardWS / Pub-Sub
Architecture
Client ─WS─▶ Edge LB ─▶ WS Server ─writes─▶ DB ─CDC─▶ Change Events
│ │
▼ ▼
Redis Pub/Sub ◀─────────────────── publishes
│
Other WS servers subscribed to same channel
push to their connected clients
Flow
Client opens WS, subscribes to board:N.
Edit happens → API writes to DB → publishes event to Redis pub/sub on board:N.
All WS servers subscribed to that channel push to their connected clients on that board.
Concerns
Conflict resolution: last-write-wins per cell is usually fine. For free-text columns: CRDT or operational transform.
Catch-up after disconnect: Client reconnects with last_event_id. Server replays missed events from Redis stream or DB.
Permissions: verify on subscribe. Re-verify on each broadcast (or cache effective ACL).
Connection scaling: 100k concurrent → sticky-less WS servers + shared Redis. Use a connection-broker library (Centrifugo, Socket.IO adapter).
3
Notification system at scale
HardFan-out / Queue
Pipeline
Event ──▶ Notification Service ──▶ Preference Filter ──▶ Channel Routers
├─▶ Email queue → SES/Sendgrid
├─▶ Push queue → FCM / APNs
└─▶ In-app DB store
+ WS push to online users