Interview Prep · Full Coverage

All Problems, One Page

Every problem from the prep guide with detailed solutions, code, follow-ups, and live interactive demos where applicable. P = previous, L = high likely.

Previous Round Questions 3 problems

The questions you got last time. Have these cold. They may repeat verbatim or as a close variant.

Notification system with users, groups, sets

Previous Hard Sets / Hashmaps

Problem

Users belong to groups. You can subscribe a user or a group to a notification channel. When an event fires, deliver to every subscriber exactly once — even if a user is subscribed both directly and via multiple groups.

Why sets are the answer

Dedup: Same user via direct + group = one notification.
O(1) membership: "Is user X subscribed?"
Set union: Merge recipients from N groups in one operation.

Live demo

Notification System Playground

Reference implementation

class NotificationSystem:
    def __init__(self):
        self.group_members = {}      # group_id -> set[user_id]
        self.channel_user_subs = {}  # channel -> set[user_id]
        self.channel_group_subs = {} # channel -> set[group_id]

    def add_user_to_group(self, user, group):
        self.group_members.setdefault(group, set()).add(user)

    def subscribe_user(self, user, channel):
        self.channel_user_subs.setdefault(channel, set()).add(user)

    def subscribe_group(self, group, channel):
        self.channel_group_subs.setdefault(channel, set()).add(group)

    def notify(self, channel):
        recipients = set(self.channel_user_subs.get(channel, set()))
        for g in self.channel_group_subs.get(channel, set()):
            recipients |= self.group_members.get(g, set())
        return recipients   # dedup happens via set union

Follow-ups likely to come up

Mute / block per user. Per-channel block set, subtract during notify.
Hierarchical groups. BFS from subscribed groups; use visited set for cycles.
Permission check. Intersect recipients with the resource ACL set.
Storm prevention. LRU + TTL of (user, hash(message)) before delivery.
Scale beyond memory. Move sets to Redis: SADD / SUNION. For huge groups, fan out async.

Fractional indexing for sortable items

Previous Hard String / Algorithms

Quick summary

Store positions as lex-orderable strings. Insert between A and B → generate a midpoint string. No neighbor renumbering. Insert is O(1) writes.

Full interactive tutorial: fractional-indexing-tutorial.html — with the live midpoint calculator, edge cases (a/aa, a/ab), stress test, and string-growth analysis.

The 60-second pitch

"Store positions as strings. To insert between A and B, compute the lex midpoint. Insert is O(1). Cost: keys grow under repeated inserts at the same spot — log₂₆(2) chars per insert. After 1M inserts at the same gap: ~15 chars. Add a rebalancing job triggered when any key exceeds ~50 chars."

Three cases of the midpoint algorithm

Gap > 1 char at first diff: pick midpoint char. "a"/"c" → "b".
Gap = 1 char: keep a's char, extend. "a"/"b" → "an".
a is prefix of b: descend. "a"/"ab" → "aa".

Forbidden: "a" < ? < "aa" has no answer in a-z. Real libs enforce "no key ends in alphabet's first char" to prevent ever reaching this state.

Worker pool for scheduled automations

Previous Hard Distributed Systems

Problem

Users set up automations: "When status changes, in 2 hours send a Slack message." Tens of millions of pending tasks. Each has a fire-at timestamp. Execute reliably, near exactly-once, with bounded latency.

Architecture

┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │ Automation │────▶│ Scheduler │────▶│ Delayed Queue │ │ Trigger API │ │ (writer) │ │ (Redis ZSET │ └─────────────┘ └──────────────┘ │ scored by ts) │ └────────┬────────┘ │ poll due ▼ ┌─────────────────┐ │ Dispatcher │ leases tasks │ (1 per shard) │ publishes work └────────┬────────┘ ▼ ┌─────────────────┐ │ Work Queue │ Kafka topic │ per action │ per integration └────────┬────────┘ ▼ ┌─────────────────┐ │ Worker Pool │ autoscaled │ acks/retries │ → DLQ on fail └─────────────────┘

Live simulation

Worker Pool Simulator

⏰ Delayed queue (waiting for fire-at)

⚙ Workers (executing)

Scheduled

Completed

Retries

DLQ

Critical design questions to nail

At-least-once + idempotent handlers — true exactly-once is a myth.
Leases for dispatch: atomic SET NX EX. Crashed workers' leases expire and tasks become claimable again.
Partitioning: hash by account_id. Stops one bad tenant from starving others.
Backpressure / retry: exp backoff (1m, 5m, 30m, 2h, 12h). Move to DLQ after 5 fails.
Cancellations: check task.status === 'pending' before execute.

Scaling math

10M pending × 1h avg delay → steady ~2.7k fires/sec.
Monday 9am peak: ~27k fires/sec.
Worker doing external HTTP: ~50 RPS. Need ~600 workers at peak.
Redis ZSET: ~50k ops/sec per shard. Shard 5–10× for safety.

Coding Round 7 problems

monday.com flavored coding: real product framing, data-structure choice signals seniority.

LRU Cache

Medium Doubly Linked List + Hashmap

Problem

O(1) get(key) and put(key, value). Evict least-recently-used on overflow.

Live demo

LRU Cache Visualization

← Most recently used | Least recently used →

Code

class Node:
    def __init__(self, k, v):
        self.k, self.v = k, v
        self.prev = self.next = None

class LRU:
    def __init__(self, cap):
        self.cap = cap
        self.map = {}
        self.head = Node(0, 0)  # dummy
        self.tail = Node(0, 0)  # dummy
        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, n):
        n.prev.next, n.next.prev = n.next, n.prev

    def _add_front(self, n):
        n.next = self.head.next
        n.prev = self.head
        self.head.next.prev = n
        self.head.next = n

    def get(self, k):
        if k not in self.map: return -1
        n = self.map[k]
        self._remove(n); self._add_front(n)
        return n.v

    def put(self, k, v):
        if k in self.map:
            self._remove(self.map[k])
        n = Node(k, v)
        self.map[k] = n
        self._add_front(n)
        if len(self.map) > self.cap:
            lru = self.tail.prev
            self._remove(lru)
            del self.map[lru.k]

Common follow-ups

TTL eviction: add expiry. Either lazy (check on access) or active (sweeper thread).
LFU vs LRU: LFU tracks access counts. More complex with min-frequency tracking.
Concurrent LRU: guard with lock, or use lock-free design (e.g., Caffeine's W-TinyLFU).

Sliding-window rate limiter

High likely Medium Deque / Time Windows

Problem

Allow at most N requests per user per W seconds.

Live demo

Sliding Window Rate Limiter

Allowed

Denied

In window

Limit

Code (in-memory)

from collections import deque

class SlidingWindow:
    def __init__(self, limit, window_s):
        self.limit = limit
        self.window = window_s
        self.hits = {}  # user_id -> deque[timestamp]

    def allow(self, user, now):
        q = self.hits.setdefault(user, deque())
        while q and q[0] <= now - self.window:
            q.popleft()
        if len(q) >= self.limit:
            return False
        q.append(now)
        return True

Distributed version (Redis Lua)

-- KEYS[1]=user_key, ARGV[1]=now, ARGV[2]=window, ARGV[3]=limit
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])

redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count >= limit then return 0 end
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, window)
return 1

Follow-ups

Token bucket vs sliding window. Token bucket = bursty traffic allowed. Sliding = strict over window.
Multi-tier limits. Per-user AND per-IP AND per-endpoint. Layer them.
What if Redis is down? Fail open (allow) or fail closed (deny). Discuss tradeoff.

Topological sort + cycle detection

Medium DAG / DFS

Problem (monday.com framing)

Formula columns can reference other columns: Total = Quantity × Price. When user adds a new formula, detect cycles and return evaluation order.

Live demo

Dependency Graph

depends on

Code (DFS with 3 colors)

def topo_sort(graph):  # node -> [deps]
    WHITE, GRAY, BLACK = 0, 1, 2
    color = {n: WHITE for n in graph}
    order = []

    def dfs(n):
        if color[n] == GRAY:
            raise ValueError(f"Cycle at {n}")
        if color[n] == BLACK: return
        color[n] = GRAY
        for d in graph.get(n, []):
            dfs(d)
        color[n] = BLACK
        order.append(n)

    for n in graph:
        dfs(n)
    return order

Kahn's algorithm (alternative)

BFS-based. Track in-degrees, repeatedly take zero-in-degree nodes. If you can't process all nodes → cycle exists. O(V+E) same as DFS.

Activity feed — collapse duplicates in window

High likely Medium Hashmap + Window

Problem

Stream of events (user, action, item). Group consecutive same-user-same-item events within 5 minutes into one feed entry: "Vikas updated Status 3 times."

Live demo

Event Stream → Collapsed Feed

Window: 5 minutes. Same (user, item) within window → grouped.

Code

class FeedCollapser:
    WINDOW = 300  # 5 min

    def __init__(self):
        self.active = {}   # (user, item) -> {actions, last_ts, first_ts}
        self.feed = []

    def emit(self, user, item, action, ts):
        key = (user, item)
        g = self.active.get(key)
        if g and ts - g['last_ts'] <= self.WINDOW:
            g['actions'].append(action)
            g['last_ts'] = ts
        else:
            if g:
                self.feed.append(g)
            self.active[key] = {
                'user': user, 'item': item,
                'actions': [action],
                'first_ts': ts, 'last_ts': ts,
            }

Follow-ups

Memory cleanup: expire active groups whose last_ts < now - WINDOW.
Different action types in same group: usually flatten ("updated 3 times" hides which fields). Keep field names if needed.
Cross-item collapse: "Vikas made 5 changes" across many items. Different grouping key.

Permission tree walker

High likely Hard Tree / ACL

Problem

Workspaces contain boards. Boards contain items. Permissions can be set at any level. Implement can_view(user, item).

Live demo

Visualize who can see what

User:

Code

def can_view(user, item):
    for node in [item, item.board, item.board.workspace]:
        decision = node.acl.decide(user)
        if decision in (ALLOW, DENY):
            return decision == ALLOW
        if not node.inherits:
            return False
    return False  # default deny

Follow-ups

Bulk list: "all items user can see on a board." Precompute user's effective ACL once, intersect with item set.
Groups: user inherits permissions from groups. Walk the user's group set.
Cache: key on (user, item, version). Invalidate on ACL change.

Parse mentions in a comment

Easy Strings / Sets

Problem

Given comment text and a user/group directory, return distinct user_ids to notify. Watch for @group expanding to multiple users, and dedup.

Live demo

Mention Resolver

Known groups: @team=[alice,bob], @devs=[bob,carol]. Known users: @alice, @bob, @carol, @dave.

Code

import re

def resolve_mentions(text, users, groups):
    # users: set[str], groups: dict[str, set[str]]
    mentions = re.findall(r"@(\w+)", text)
    recipients = set()
    for name in mentions:
        if name in users:
            recipients.add(name)
        elif name in groups:
            recipients |= groups[name]
    return recipients   # dedup via set

Same set-union pattern as Q1. If they ask this, default to set for resolution + dedup.

Top K active items in last hour

Medium Heap / Streaming

Problem

Stream of activity events. Periodically return top K items by event count in the last hour.

Approach

Counts per item using a hashmap.
Sliding window: deque of (timestamp, item) → on read, evict old, decrement count.
Top K: min-heap of size K, ordered by count.

Code

import heapq
from collections import deque

class TopK:
    def __init__(self, k, window):
        self.k, self.window = k, window
        self.events = deque()
        self.counts = {}

    def record(self, item, ts):
        self.events.append((ts, item))
        self.counts[item] = self.counts.get(item, 0) + 1

    def top(self, now):
        while self.events and self.events[0][0] < now - self.window:
            _, item = self.events.popleft()
            self.counts[item] -= 1
            if self.counts[item] == 0:
                del self.counts[item]
        return heapq.nlargest(self.k, self.counts.items(), key=lambda x: x[1])

Scale follow-up

If item space is huge (millions) and you need approximate counts, use Count-Min Sketch — bounded memory, bounded error, ideal for streaming top-K at scale.

Product Engineering 6 problems

Design + code a user-facing feature. They'll judge your UX intuition along with your code.

Real-time collaborative cursors

High likely Hard WebSockets / Pub-Sub

Problem

Show other users' cursors and selections live on the same board. Sub-100ms latency, handle 50+ users.

Architecture

Client ─[WebSocket]─▶ Edge LB ─▶ WS Server ─[pub]─▶ Redis Pub/Sub ▲ │ └──[sub]─────────────┘ │ broadcast to all other clients on board

Implementation details

Client opens WS, subscribes to board:{id}.
Throttle cursor emit to ~50ms (20Hz). Don't flood.
Server broadcasts via Redis pub/sub so it works across N WS servers.
Don't persist cursor data — ephemeral. Only presence (who's online) goes to a Redis SET.
Presence heartbeat every 30s. On disconnect/timeout: SREM from presence.

Gotchas to mention

Privacy: verify user has board access before subscribing. Re-check on every event.
Reconnection storms: add jitter on reconnect (1–5s random delay).
Mobile / background tabs: pause emit when not visible.
Dropped events: cursor position is idempotent — losing one is fine.

Filter + group items on a board

High likely Medium Tradeoffs

Decision framework

Item count	Strategy	Why
< 500	Client-side	Already loaded. Instant. Zero server cost.
500 – 10k	Hybrid: server-side filter, client paginate	Index on filterable cols. Stream pages.
> 10k	Server-side, virtualize UI	Never load all client-side. Window of ~50 rows visible.

Group view payload

{
  groups: [
    { key: "Done", count: 142, items: [...first 50...] },
    { key: "In Progress", count: 37, items: [...] },
    { key: "Stuck", count: 8, items: [...] }
  ],
  total: 187,
  has_more: { "Done": true }
}

Server returns counts + first N per group. Expanding a group fetches more pages.

Index strategy

For server-side filter: composite index (board_id, status, owner_id) covers common filters. For free-text search later: separate ES index.

Undo / Redo for board edits

Medium Command Pattern

Approach

Each user action → Command object with do() and undo(). Two stacks: undo, redo.

class UpdateCell(Command):
    def __init__(self, item, col, new_val):
        self.item, self.col = item, col
        self.old_val = item[col]
        self.new_val = new_val
    def do(self): self.item[self.col] = self.new_val
    def undo(self): self.item[self.col] = self.old_val

Hard parts

Collaborative undo: Alice's undo shouldn't trample Bob's later edit. Common policy: each user undoes only their own actions, rebased against current state.
Server authority: client undo sends command to server. Server validates, applies, broadcasts.
Limits: cap undo stack at ~50 entries. Older actions become non-undoable.

Search across all boards

Medium Elasticsearch / Indexing

Stack

Elasticsearch/OpenSearch index per workspace (or sharded).
Document includes permissions: { users: [...], groups: [...] } for ACL filter.
Async indexing: item update → Kafka → indexer worker → ES.

Query-time permission filter

{
  "query": {
    "bool": {
      "must": [
        { "match": { "text": "deploy" } }
      ],
      "filter": [
        { "terms": { "permissions.users": ["vikas"] } },
        { "terms": { "permissions.groups": ["engineering", "leads"] } }
      ]
    }
  }
}

Client UX

Debounce keystrokes ~150ms.
Cancel in-flight request when user types again (AbortController).
Cache recent queries (~5 last). Show instantly on re-type.

Pinned + recent boards in sidebar

Easy API design

Schema

user_pins(user_id, board_id, position) — position is a fractional index for drag-reorder.
user_visits(user_id, board_id, last_visited_at) — upsert on each board open.

Single endpoint

GET /sidebar
{
  pinned: [board, board, ...],  # ordered by position
  recent: [board, board, ...]   # top 5 by last_visited_at, excluding pinned
}

One round-trip. Don't make the client make two requests for adjacent UI.

Bulk operations on selected items

Medium UX / Transactions

API

POST /items/bulk_update
{
  ids: [1, 2, 3, ..., 200],
  changes: { status: "Done" }
}

// Response - per-item status
{
  results: [
    { id: 1, ok: true },
    { id: 2, ok: false, error: "permission denied" },
    ...
  ]
}

Client flow (optimistic)

Apply changes to all selected items immediately in UI.
Send batch request.
On response: revert any items where ok: false. Toast errors.

Large batches (10k+ items)

Switch to async job: POST /items/bulk_update_async returns job_id. Client polls or subscribes via WebSocket for progress. Show progress bar.

System Design 6 problems

monday.com cares about event-driven patterns, failure tolerance, and tradeoff reasoning. They want to hear "what if X breaks" answers.

Webhook system for integrations

Very likely Hard Event-driven

This is monday.com's canonical system-design question — explicitly mentioned in their engineering blog. Have it cold.

Architecture

┌──────────┐ ┌──────────┐ ┌──────────────┐ ┌─────────────┐ │ Item │────▶│ Event │────▶│ Subscription │────▶│ Delivery │ │ Change │ │ Bus │ │ Lookup │ │ Queue │ └──────────┘ │ (Kafka) │ │ (Redis/DB) │ │ (per-sub) │ └──────────┘ └──────────────┘ └──────┬──────┘ ▼ ┌─────────────┐ │ Delivery │ │ Workers │──▶ Customer URL │ HMAC sign │ (HTTPS POST) └──────┬──────┘ │ on fail ▼ ┌─────────────┐ │ Retry + │ │ DLQ + alert │ └─────────────┘

Components

Event Bus (Kafka): all item changes published. Partition by item_id for ordering.
Subscription Registry: board_id → list of (url, secret, event_types).
Dispatcher: consumes events, looks up subs, enqueues delivery jobs.
Delivery Workers: HMAC-sign payload, POST to URL, parse response.
Retry/DLQ: exp backoff (1m, 5m, 30m, 2h, 12h). After 5 failures → DLQ + user dashboard alert.

Critical concerns to verbalize

Ordering: Webhooks for same item must arrive in order. Kafka partition by item_id guarantees this.
Slow consumers: One slow URL can't block others. Per-subscription delivery queues OR worker pools partitioned by URL.
Security: HMAC-SHA256 sign payload. User verifies signature. Reject SSRF (no 127.0.0.1, no metadata endpoints).
Replay: User dashboard shows failed deliveries with "Replay" button. Re-enqueue from event log.
Idempotency: Send X-Event-Id header; instruct subscribers to dedup.

Real-time board updates (multi-user editing)

High likely Hard WS / Pub-Sub

Architecture

Client ─WS─▶ Edge LB ─▶ WS Server ─writes─▶ DB ─CDC─▶ Change Events │ │ ▼ ▼ Redis Pub/Sub ◀─────────────────── publishes │ Other WS servers subscribed to same channel push to their connected clients

Flow

Client opens WS, subscribes to board:N.
Edit happens → API writes to DB → publishes event to Redis pub/sub on board:N.
All WS servers subscribed to that channel push to their connected clients on that board.

Concerns

Conflict resolution: last-write-wins per cell is usually fine. For free-text columns: CRDT or operational transform.
Catch-up after disconnect: Client reconnects with last_event_id. Server replays missed events from Redis stream or DB.
Permissions: verify on subscribe. Re-verify on each broadcast (or cache effective ACL).
Connection scaling: 100k concurrent → sticky-less WS servers + shared Redis. Use a connection-broker library (Centrifugo, Socket.IO adapter).

Notification system at scale

Hard Fan-out / Queue

Pipeline

Event ──▶ Notification Service ──▶ Preference Filter ──▶ Channel Routers ├─▶ Email queue → SES/Sendgrid ├─▶ Push queue → FCM / APNs └─▶ In-app DB store + WS push to online users

Features to bring up

Per-user preferences: mute channels, quiet hours, digest mode.
Digest mode: instead of immediate, batch per user. Cron rolls up daily/hourly digest.
Throttling: max 1 email per user per hour. Redis token bucket.
Dedup: hash of (user, event_id) checked before send. (The Q1 pattern at scale.)
Templating: rendered per recipient (locale, name, item title). Edge cache.

Multi-tenant database design

Hard Sharding / Isolation

Three patterns

Pattern	Isolation	Cost	Use for
Shared DB + shared schema, `account_id` col	Low (row-level)	Low	Mass-market SaaS, freemium
Shared DB, schema per tenant	Medium	Medium	Mid-market, simpler migrations risky
DB per tenant	High	High	Enterprise tier, regulated, large accounts

monday.com is shared with row-level isolation. account_id on every table, indexed first in composites.

Sharding strategy

Shard by account_id — all an account's data lives together. Joins stay cheap.
Lookup service maps account_id → shard_id (cached in app).
Big accounts get dedicated shards (move via online migration).
Cross-shard analytics: ETL to data warehouse.

Activity log / audit trail

Medium Append-only / Column Store

Write path

Async via Kafka. Never block user writes on logging.

Storage tiers

Hot: last 90 days in Postgres or ClickHouse. Partitioned by (account_id, month).
Cold: > 90 days as Parquet in S3. Queried via Athena/Trino.
Analytics: stream into BigQuery / Snowflake for product analytics.

Privacy / GDPR

User deletion request → tombstone records (don't hard-delete due to legal hold).
Mask PII fields on tombstoning. Keep IDs for referential integrity.
Retention policy per tier of account (Enterprise vs Free).

File uploads & attachments

Medium Object Storage / CDN

Flow

Client requests upload URL: POST /uploads → {url, fields}.
Server returns S3 presigned POST. Bytes never touch app servers.
Client uploads directly to S3.
S3 ObjectCreated event → Lambda → virus scan → quarantine vs clean bucket.
Async: generate thumbnails (image), extract preview (PDF).
Client polls or subscribes to attachment-ready event.

Serving

Public attachments: CDN (CloudFront) with origin S3.
Private: signed URLs expiring in 1h. Sign via Lambda@Edge if you need permission check at edge.
Thumbnails: separate cache key. Different size variants.