coding
Walmart Labs
Walmart

Top-K in Time Window - Coding Question, Walmart Labs

Topics:
Sliding Window
Priority Queue
Heap
Roles:
Software Engineer
Backend Engineer
SDE
Experience:
Entry Level
Mid Level
Senior

Question Description

Problem summary

You must build a component that records events (item_id, timestamp) and answers frequency-based top‑k queries over a closed interval [t_start, t_end]. You will implement record and query APIs, then extend the design to support explicit eviction of old events, out‑of‑order (late) arrivals, and an optimized version when timestamps come from a bounded discrete domain.

What you'll be asked to do

You should show an implementation that maintains per‑timestamp or per‑bucket counts and can return up to k item IDs with the highest access counts in [t_start, t_end]. Expect to discuss data structures such as hash maps for counts, time buckets (sliding-window aggregates), and a min‑heap (priority queue) to compute top‑k efficiently. You should explain complexity tradeoffs for update and query paths and how tie‑breaking is handled.

Flow & extension stages

  1. Basic API: implement record(item_id, timestamp) and query(t_start, t_end, k) with correct closed‑interval semantics.
  2. Eviction: add evict_older_than(cutoff) to drop data < cutoff and document correctness guarantees for queries >= cutoff.
  3. Out‑of‑order events: introduce an allowed lateness parameter or accept full reaggregation; state whether late arrivals can retroactively change previous query answers.
  4. Bounded timestamps: if timestamps lie in a fixed range, switch to array/circular buffer buckets for O(1) eviction and lower memory overhead.

Skill signals interviewers look for

You should demonstrate: time/space complexity analysis, selection of appropriate aggregates vs raw logs, correctness under inclusive intervals, handling late data semantics (bounded lateness vs eventual consistency), and tradeoffs between exact and approximate top‑k (e.g., Space‑Saving / Count‑Min Sketch for high throughput).

Common Follow-up Questions

  • How would you change the design to support continuous streaming queries (always-on sliding window top-k) with low latency?
  • Describe an approximate variant using Space‑Saving or Count‑Min Sketch: how do you bound error and memory, and when is it acceptable?
  • How would you shard or partition the system to handle very high write throughput and still compute global top-k over a time window?
  • If events carry weights or durations (not just counts), how would you adapt recording, aggregation, and top-k computation?

Related Questions

1Design a rate limiter using sliding windows and fixed buckets
2Find top-k frequent elements in an array (streaming version)
3Sliding window maximum / windowed aggregation problems
4Implement a time-series aggregator with eviction and late-arrival handling

Explore More Questions

Practice This Question with AI

Get real-time hints, detailed requirements, and insightful analysis of the question.

Top-K in Time Window - Coding Question, Walmart Labs | Voker