ML System Design: Real-Time Feed Ranking - Pinterest

Question Description

Design a real-time, personalized feed ranking system for a social media app (Pinterest-style) that returns a ranked list of posts when a user opens the app and adapts within seconds to user interactions.

You’ll need to cover the full ML-enabled feed pipeline: event ingestion, offline model training, candidate generation, low-latency online inference, and a streaming feedback loop that updates user embeddings and ranking signals in real time. The system should use precomputed post embeddings and compute/upsert user embeddings on-the-fly from recent interactions (e.g., weighted averages, session encoders), then score candidates with a fast ranker (e.g., a lightweight neural model or gradient boosted trees) before applying business constraints for freshness and diversity.

Suggested flow in an interview: ingest events (Kafka/PubSub) → update online feature store / user embedding service → trigger candidate generation (fan-out + ANN search) → apply ranker and re-ranker → cache and serve via API gateway → stream engagement back for model training and metrics. Discuss how you’d use approximate nearest neighbor (HNSW/FAISS), feature stores, caching (Redis), and streaming processors (Flink/Beam) to meet millisecond-to-second latency.

Skill signals interviewers look for: embedding design and dimensionality tradeoffs, ANN indexing and sharding, online vs offline feature computation, latency and throughput engineering, handling cold-start and stale content, A/B testing and observability, and strategies to ensure diversity and avoid feedback loops. You should be able to propose concrete latency targets, cost/scale trade-offs, and deployment/versioning approaches for continuous improvement.

Pinterest ML System Design: Real-Time Personalized Feed

Question Description

Common Follow-up Questions

Related Questions

Explore More Questions

Practice This Question with AI