Interview Prep · Full Coverage

All Problems, One Page

Every problem from the prep guide with detailed solutions, code, follow-ups, and live interactive demos where applicable. P = previous, L = high likely.

Previous Round Questions 3 problems

The questions you got last time. Have these cold. They may repeat verbatim or as a close variant.

1
Notification system with users, groups, sets
Previous Hard Sets / Hashmaps

Problem

Users belong to groups. You can subscribe a user or a group to a notification channel. When an event fires, deliver to every subscriber exactly once — even if a user is subscribed both directly and via multiple groups.

Why sets are the answer

  • Dedup: Same user via direct + group = one notification.
  • O(1) membership: "Is user X subscribed?"
  • Set union: Merge recipients from N groups in one operation.

Live demo

Notification System Playground

Reference implementation

class NotificationSystem:
    def __init__(self):
        self.group_members = {}      # group_id -> set[user_id]
        self.channel_user_subs = {}  # channel -> set[user_id]
        self.channel_group_subs = {} # channel -> set[group_id]

    def add_user_to_group(self, user, group):
        self.group_members.setdefault(group, set()).add(user)

    def subscribe_user(self, user, channel):
        self.channel_user_subs.setdefault(channel, set()).add(user)

    def subscribe_group(self, group, channel):
        self.channel_group_subs.setdefault(channel, set()).add(group)

    def notify(self, channel):
        recipients = set(self.channel_user_subs.get(channel, set()))
        for g in self.channel_group_subs.get(channel, set()):
            recipients |= self.group_members.get(g, set())
        return recipients   # dedup happens via set union

Follow-ups likely to come up

  1. Mute / block per user. Per-channel block set, subtract during notify.
  2. Hierarchical groups. BFS from subscribed groups; use visited set for cycles.
  3. Permission check. Intersect recipients with the resource ACL set.
  4. Storm prevention. LRU + TTL of (user, hash(message)) before delivery.
  5. Scale beyond memory. Move sets to Redis: SADD / SUNION. For huge groups, fan out async.
2
Fractional indexing for sortable items
Previous Hard String / Algorithms

Quick summary

Store positions as lex-orderable strings. Insert between A and B → generate a midpoint string. No neighbor renumbering. Insert is O(1) writes.

Full interactive tutorial: fractional-indexing-tutorial.html — with the live midpoint calculator, edge cases (a/aa, a/ab), stress test, and string-growth analysis.

The 60-second pitch

"Store positions as strings. To insert between A and B, compute the lex midpoint. Insert is O(1). Cost: keys grow under repeated inserts at the same spot — log26(2) chars per insert. After 1M inserts at the same gap: ~15 chars. Add a rebalancing job triggered when any key exceeds ~50 chars."

Three cases of the midpoint algorithm

  1. Gap > 1 char at first diff: pick midpoint char. "a"/"c""b".
  2. Gap = 1 char: keep a's char, extend. "a"/"b""an".
  3. a is prefix of b: descend. "a"/"ab""aa".
Forbidden: "a" < ? < "aa" has no answer in a-z. Real libs enforce "no key ends in alphabet's first char" to prevent ever reaching this state.
3
Worker pool for scheduled automations
Previous Hard Distributed Systems

Problem

Users set up automations: "When status changes, in 2 hours send a Slack message." Tens of millions of pending tasks. Each has a fire-at timestamp. Execute reliably, near exactly-once, with bounded latency.

Architecture

┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │ Automation │────▶│ Scheduler │────▶│ Delayed Queue │ │ Trigger API │ │ (writer) │ │ (Redis ZSET │ └─────────────┘ └──────────────┘ │ scored by ts) │ └────────┬────────┘ │ poll due ▼ ┌─────────────────┐ │ Dispatcher │ leases tasks │ (1 per shard) │ publishes work └────────┬────────┘ ▼ ┌─────────────────┐ │ Work Queue │ Kafka topic │ per action │ per integration └────────┬────────┘ ▼ ┌─────────────────┐ │ Worker Pool │ autoscaled │ acks/retries │ → DLQ on fail └─────────────────┘

Live simulation

Worker Pool Simulator
⏰ Delayed queue (waiting for fire-at)
⚙ Workers (executing)
Scheduled
0
Completed
0
Retries
0
DLQ
0

Critical design questions to nail

  1. At-least-once + idempotent handlers — true exactly-once is a myth.
  2. Leases for dispatch: atomic SET NX EX. Crashed workers' leases expire and tasks become claimable again.
  3. Partitioning: hash by account_id. Stops one bad tenant from starving others.
  4. Backpressure / retry: exp backoff (1m, 5m, 30m, 2h, 12h). Move to DLQ after 5 fails.
  5. Cancellations: check task.status === 'pending' before execute.

Scaling math

  • 10M pending × 1h avg delay → steady ~2.7k fires/sec.
  • Monday 9am peak: ~27k fires/sec.
  • Worker doing external HTTP: ~50 RPS. Need ~600 workers at peak.
  • Redis ZSET: ~50k ops/sec per shard. Shard 5–10× for safety.

Coding Round 7 problems

monday.com flavored coding: real product framing, data-structure choice signals seniority.

1
LRU Cache
Medium Doubly Linked List + Hashmap

Problem

O(1) get(key) and put(key, value). Evict least-recently-used on overflow.

Live demo

LRU Cache Visualization

← Most recently used   |   Least recently used →

Code

class Node:
    def __init__(self, k, v):
        self.k, self.v = k, v
        self.prev = self.next = None

class LRU:
    def __init__(self, cap):
        self.cap = cap
        self.map = {}
        self.head = Node(0, 0)  # dummy
        self.tail = Node(0, 0)  # dummy
        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, n):
        n.prev.next, n.next.prev = n.next, n.prev

    def _add_front(self, n):
        n.next = self.head.next
        n.prev = self.head
        self.head.next.prev = n
        self.head.next = n

    def get(self, k):
        if k not in self.map: return -1
        n = self.map[k]
        self._remove(n); self._add_front(n)
        return n.v

    def put(self, k, v):
        if k in self.map:
            self._remove(self.map[k])
        n = Node(k, v)
        self.map[k] = n
        self._add_front(n)
        if len(self.map) > self.cap:
            lru = self.tail.prev
            self._remove(lru)
            del self.map[lru.k]

Common follow-ups

  • TTL eviction: add expiry. Either lazy (check on access) or active (sweeper thread).
  • LFU vs LRU: LFU tracks access counts. More complex with min-frequency tracking.
  • Concurrent LRU: guard with lock, or use lock-free design (e.g., Caffeine's W-TinyLFU).
2
Sliding-window rate limiter
High likely Medium Deque / Time Windows

Problem

Allow at most N requests per user per W seconds.

Live demo

Sliding Window Rate Limiter
Allowed
0
Denied
0
In window
0
Limit
3

Code (in-memory)

from collections import deque

class SlidingWindow:
    def __init__(self, limit, window_s):
        self.limit = limit
        self.window = window_s
        self.hits = {}  # user_id -> deque[timestamp]

    def allow(self, user, now):
        q = self.hits.setdefault(user, deque())
        while q and q[0] <= now - self.window:
            q.popleft()
        if len(q) >= self.limit:
            return False
        q.append(now)
        return True

Distributed version (Redis Lua)

-- KEYS[1]=user_key, ARGV[1]=now, ARGV[2]=window, ARGV[3]=limit
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])

redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count >= limit then return 0 end
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, window)
return 1

Follow-ups

  • Token bucket vs sliding window. Token bucket = bursty traffic allowed. Sliding = strict over window.
  • Multi-tier limits. Per-user AND per-IP AND per-endpoint. Layer them.
  • What if Redis is down? Fail open (allow) or fail closed (deny). Discuss tradeoff.
3
Topological sort + cycle detection
Medium DAG / DFS

Problem (monday.com framing)

Formula columns can reference other columns: Total = Quantity × Price. When user adds a new formula, detect cycles and return evaluation order.

Live demo

Dependency Graph
depends on

Code (DFS with 3 colors)

def topo_sort(graph):  # node -> [deps]
    WHITE, GRAY, BLACK = 0, 1, 2
    color = {n: WHITE for n in graph}
    order = []

    def dfs(n):
        if color[n] == GRAY:
            raise ValueError(f"Cycle at {n}")
        if color[n] == BLACK: return
        color[n] = GRAY
        for d in graph.get(n, []):
            dfs(d)
        color[n] = BLACK
        order.append(n)

    for n in graph:
        dfs(n)
    return order

Kahn's algorithm (alternative)

BFS-based. Track in-degrees, repeatedly take zero-in-degree nodes. If you can't process all nodes → cycle exists. O(V+E) same as DFS.

4
Activity feed — collapse duplicates in window
High likely Medium Hashmap + Window

Problem

Stream of events (user, action, item). Group consecutive same-user-same-item events within 5 minutes into one feed entry: "Vikas updated Status 3 times."

Live demo

Event Stream → Collapsed Feed

Window: 5 minutes. Same (user, item) within window → grouped.

Code

class FeedCollapser:
    WINDOW = 300  # 5 min

    def __init__(self):
        self.active = {}   # (user, item) -> {actions, last_ts, first_ts}
        self.feed = []

    def emit(self, user, item, action, ts):
        key = (user, item)
        g = self.active.get(key)
        if g and ts - g['last_ts'] <= self.WINDOW:
            g['actions'].append(action)
            g['last_ts'] = ts
        else:
            if g:
                self.feed.append(g)
            self.active[key] = {
                'user': user, 'item': item,
                'actions': [action],
                'first_ts': ts, 'last_ts': ts,
            }

Follow-ups

  • Memory cleanup: expire active groups whose last_ts < now - WINDOW.
  • Different action types in same group: usually flatten ("updated 3 times" hides which fields). Keep field names if needed.
  • Cross-item collapse: "Vikas made 5 changes" across many items. Different grouping key.
5
Permission tree walker
High likely Hard Tree / ACL

Problem

Workspaces contain boards. Boards contain items. Permissions can be set at any level. Implement can_view(user, item).

Live demo

Visualize who can see what
User:

Code

def can_view(user, item):
    for node in [item, item.board, item.board.workspace]:
        decision = node.acl.decide(user)
        if decision in (ALLOW, DENY):
            return decision == ALLOW
        if not node.inherits:
            return False
    return False  # default deny

Follow-ups

  • Bulk list: "all items user can see on a board." Precompute user's effective ACL once, intersect with item set.
  • Groups: user inherits permissions from groups. Walk the user's group set.
  • Cache: key on (user, item, version). Invalidate on ACL change.
6
Parse mentions in a comment
Easy Strings / Sets

Problem

Given comment text and a user/group directory, return distinct user_ids to notify. Watch for @group expanding to multiple users, and dedup.

Live demo

Mention Resolver

Known groups: @team=[alice,bob], @devs=[bob,carol]. Known users: @alice, @bob, @carol, @dave.

Code

import re

def resolve_mentions(text, users, groups):
    # users: set[str], groups: dict[str, set[str]]
    mentions = re.findall(r"@(\w+)", text)
    recipients = set()
    for name in mentions:
        if name in users:
            recipients.add(name)
        elif name in groups:
            recipients |= groups[name]
    return recipients   # dedup via set

Same set-union pattern as Q1. If they ask this, default to set for resolution + dedup.

7
Top K active items in last hour
Medium Heap / Streaming

Problem

Stream of activity events. Periodically return top K items by event count in the last hour.

Approach

  • Counts per item using a hashmap.
  • Sliding window: deque of (timestamp, item) → on read, evict old, decrement count.
  • Top K: min-heap of size K, ordered by count.

Code

import heapq
from collections import deque

class TopK:
    def __init__(self, k, window):
        self.k, self.window = k, window
        self.events = deque()
        self.counts = {}

    def record(self, item, ts):
        self.events.append((ts, item))
        self.counts[item] = self.counts.get(item, 0) + 1

    def top(self, now):
        while self.events and self.events[0][0] < now - self.window:
            _, item = self.events.popleft()
            self.counts[item] -= 1
            if self.counts[item] == 0:
                del self.counts[item]
        return heapq.nlargest(self.k, self.counts.items(), key=lambda x: x[1])

Scale follow-up

If item space is huge (millions) and you need approximate counts, use Count-Min Sketch — bounded memory, bounded error, ideal for streaming top-K at scale.

Product Engineering 6 problems

Design + code a user-facing feature. They'll judge your UX intuition along with your code.

1
Real-time collaborative cursors
High likely Hard WebSockets / Pub-Sub

Problem

Show other users' cursors and selections live on the same board. Sub-100ms latency, handle 50+ users.

Architecture

Client ─[WebSocket]─▶ Edge LB ─▶ WS Server ─[pub]─▶ Redis Pub/Sub ▲ │ └──[sub]─────────────┘ │ broadcast to all other clients on board

Implementation details

  1. Client opens WS, subscribes to board:{id}.
  2. Throttle cursor emit to ~50ms (20Hz). Don't flood.
  3. Server broadcasts via Redis pub/sub so it works across N WS servers.
  4. Don't persist cursor data — ephemeral. Only presence (who's online) goes to a Redis SET.
  5. Presence heartbeat every 30s. On disconnect/timeout: SREM from presence.

Gotchas to mention

  • Privacy: verify user has board access before subscribing. Re-check on every event.
  • Reconnection storms: add jitter on reconnect (1–5s random delay).
  • Mobile / background tabs: pause emit when not visible.
  • Dropped events: cursor position is idempotent — losing one is fine.
2
Filter + group items on a board
High likely Medium Tradeoffs

Decision framework

Item countStrategyWhy
< 500Client-sideAlready loaded. Instant. Zero server cost.
500 – 10kHybrid: server-side filter, client paginateIndex on filterable cols. Stream pages.
> 10kServer-side, virtualize UINever load all client-side. Window of ~50 rows visible.

Group view payload

{
  groups: [
    { key: "Done", count: 142, items: [...first 50...] },
    { key: "In Progress", count: 37, items: [...] },
    { key: "Stuck", count: 8, items: [...] }
  ],
  total: 187,
  has_more: { "Done": true }
}

Server returns counts + first N per group. Expanding a group fetches more pages.

Index strategy

For server-side filter: composite index (board_id, status, owner_id) covers common filters. For free-text search later: separate ES index.

3
Undo / Redo for board edits
Medium Command Pattern

Approach

Each user action → Command object with do() and undo(). Two stacks: undo, redo.

class UpdateCell(Command):
    def __init__(self, item, col, new_val):
        self.item, self.col = item, col
        self.old_val = item[col]
        self.new_val = new_val
    def do(self): self.item[self.col] = self.new_val
    def undo(self): self.item[self.col] = self.old_val

Hard parts

  • Collaborative undo: Alice's undo shouldn't trample Bob's later edit. Common policy: each user undoes only their own actions, rebased against current state.
  • Server authority: client undo sends command to server. Server validates, applies, broadcasts.
  • Limits: cap undo stack at ~50 entries. Older actions become non-undoable.
6
Bulk operations on selected items
Medium UX / Transactions

API

POST /items/bulk_update
{
  ids: [1, 2, 3, ..., 200],
  changes: { status: "Done" }
}

// Response - per-item status
{
  results: [
    { id: 1, ok: true },
    { id: 2, ok: false, error: "permission denied" },
    ...
  ]
}

Client flow (optimistic)

  1. Apply changes to all selected items immediately in UI.
  2. Send batch request.
  3. On response: revert any items where ok: false. Toast errors.

Large batches (10k+ items)

Switch to async job: POST /items/bulk_update_async returns job_id. Client polls or subscribes via WebSocket for progress. Show progress bar.

System Design 6 problems

monday.com cares about event-driven patterns, failure tolerance, and tradeoff reasoning. They want to hear "what if X breaks" answers.

1
Webhook system for integrations
Very likely Hard Event-driven
This is monday.com's canonical system-design question — explicitly mentioned in their engineering blog. Have it cold.

Architecture

┌──────────┐ ┌──────────┐ ┌──────────────┐ ┌─────────────┐ │ Item │────▶│ Event │────▶│ Subscription │────▶│ Delivery │ │ Change │ │ Bus │ │ Lookup │ │ Queue │ └──────────┘ │ (Kafka) │ │ (Redis/DB) │ │ (per-sub) │ └──────────┘ └──────────────┘ └──────┬──────┘ ▼ ┌─────────────┐ │ Delivery │ │ Workers │──▶ Customer URL │ HMAC sign │ (HTTPS POST) └──────┬──────┘ │ on fail ▼ ┌─────────────┐ │ Retry + │ │ DLQ + alert │ └─────────────┘

Components

  • Event Bus (Kafka): all item changes published. Partition by item_id for ordering.
  • Subscription Registry: board_id → list of (url, secret, event_types).
  • Dispatcher: consumes events, looks up subs, enqueues delivery jobs.
  • Delivery Workers: HMAC-sign payload, POST to URL, parse response.
  • Retry/DLQ: exp backoff (1m, 5m, 30m, 2h, 12h). After 5 failures → DLQ + user dashboard alert.

Critical concerns to verbalize

  1. Ordering: Webhooks for same item must arrive in order. Kafka partition by item_id guarantees this.
  2. Slow consumers: One slow URL can't block others. Per-subscription delivery queues OR worker pools partitioned by URL.
  3. Security: HMAC-SHA256 sign payload. User verifies signature. Reject SSRF (no 127.0.0.1, no metadata endpoints).
  4. Replay: User dashboard shows failed deliveries with "Replay" button. Re-enqueue from event log.
  5. Idempotency: Send X-Event-Id header; instruct subscribers to dedup.
2
Real-time board updates (multi-user editing)
High likely Hard WS / Pub-Sub

Architecture

Client ─WS─▶ Edge LB ─▶ WS Server ─writes─▶ DB ─CDC─▶ Change Events │ │ ▼ ▼ Redis Pub/Sub ◀─────────────────── publishes │ Other WS servers subscribed to same channel push to their connected clients

Flow

  1. Client opens WS, subscribes to board:N.
  2. Edit happens → API writes to DB → publishes event to Redis pub/sub on board:N.
  3. All WS servers subscribed to that channel push to their connected clients on that board.

Concerns

  • Conflict resolution: last-write-wins per cell is usually fine. For free-text columns: CRDT or operational transform.
  • Catch-up after disconnect: Client reconnects with last_event_id. Server replays missed events from Redis stream or DB.
  • Permissions: verify on subscribe. Re-verify on each broadcast (or cache effective ACL).
  • Connection scaling: 100k concurrent → sticky-less WS servers + shared Redis. Use a connection-broker library (Centrifugo, Socket.IO adapter).
3
Notification system at scale
Hard Fan-out / Queue

Pipeline

Event ──▶ Notification Service ──▶ Preference Filter ──▶ Channel Routers ├─▶ Email queue → SES/Sendgrid ├─▶ Push queue → FCM / APNs └─▶ In-app DB store + WS push to online users

Features to bring up

  • Per-user preferences: mute channels, quiet hours, digest mode.
  • Digest mode: instead of immediate, batch per user. Cron rolls up daily/hourly digest.
  • Throttling: max 1 email per user per hour. Redis token bucket.
  • Dedup: hash of (user, event_id) checked before send. (The Q1 pattern at scale.)
  • Templating: rendered per recipient (locale, name, item title). Edge cache.
4
Multi-tenant database design
Hard Sharding / Isolation

Three patterns

PatternIsolationCostUse for
Shared DB + shared schema, account_id colLow (row-level)LowMass-market SaaS, freemium
Shared DB, schema per tenantMediumMediumMid-market, simpler migrations risky
DB per tenantHighHighEnterprise tier, regulated, large accounts

monday.com is shared with row-level isolation. account_id on every table, indexed first in composites.

Sharding strategy

  • Shard by account_id — all an account's data lives together. Joins stay cheap.
  • Lookup service maps account_id → shard_id (cached in app).
  • Big accounts get dedicated shards (move via online migration).
  • Cross-shard analytics: ETL to data warehouse.
5
Activity log / audit trail
Medium Append-only / Column Store

Write path

Async via Kafka. Never block user writes on logging.

Storage tiers

  • Hot: last 90 days in Postgres or ClickHouse. Partitioned by (account_id, month).
  • Cold: > 90 days as Parquet in S3. Queried via Athena/Trino.
  • Analytics: stream into BigQuery / Snowflake for product analytics.

Privacy / GDPR

  • User deletion request → tombstone records (don't hard-delete due to legal hold).
  • Mask PII fields on tombstoning. Keep IDs for referential integrity.
  • Retention policy per tier of account (Enterprise vs Free).
6
File uploads & attachments
Medium Object Storage / CDN

Flow

  1. Client requests upload URL: POST /uploads → {url, fields}.
  2. Server returns S3 presigned POST. Bytes never touch app servers.
  3. Client uploads directly to S3.
  4. S3 ObjectCreated event → Lambda → virus scan → quarantine vs clean bucket.
  5. Async: generate thumbnails (image), extract preview (PDF).
  6. Client polls or subscribes to attachment-ready event.

Serving

  • Public attachments: CDN (CloudFront) with origin S3.
  • Private: signed URLs expiring in 1h. Sign via Lambda@Edge if you need permission check at edge.
  • Thumbnails: separate cache key. Different size variants.