Roblox ML System Design: Real-time Game Recommendations
Question Description
You must design a real-time game recommendation system for a large gaming platform that matches users to games by lifetime average playtime per user and per game. The core task is to ingest session events (user, game, start/end, duration), compute lifetime averages (cumulative playtime / count) for both users and games, and use those aggregates in a low-latency pipeline that serves personalized recommendations within 100–200 ms.
Start by defining the high-level flow: event ingestion → streaming aggregator (incremental updates to per-user and per-game sums/counts) → feature store / materialized view → candidate generation (popularity, playtime-similarity, content-based or approximate k-NN) → learning-to-rank reranker that uses lifetime average playtime plus contextual features → online cache and recommendation API. Include a batch job (daily) to recompute long-running aggregates and long-tail corrections while the stream keeps recent totals fresh.
You should demonstrate knowledge in streaming frameworks (Kafka/Kinesis + Flink/Spark Streaming), stateful exactly-once or idempotent updates, in-memory stores for low-latency (Redis, memcached), wide-column stores for scale (Cassandra/Bigtable), feature store design, and LTR model integration. Discuss trade-offs: eventual consistency vs strict accuracy, incremental vs batch recompute, handling cold-start games/users, skew/hot-keys for popular titles, freshness windows, and metrics (playtime uplift, CTR, latency).
Common Follow-up Questions
- •How would you compute and maintain exact lifetime average playtime with streaming data to tolerate out-of-order and duplicate events (what semantics and frameworks do you choose)?
- •How do you handle cold-start users and games so lifetime averages don't bias recommendations toward only established titles?
- •Design the candidate generation step for a catalog of millions of games: what offline indexes or approximate nearest neighbor strategies do you use to meet 100–200 ms latency?
- •How would you mitigate hotspotting for extremely popular games (skew) in both the aggregator and the serving layer?
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.