Pinterest ML System Design: Real-Time Personalized Feed
Question Description
Design a real-time, personalized feed ranking system for a social media app (Pinterest-style) that returns a ranked list of posts when a user opens the app and adapts within seconds to user interactions.
You’ll need to cover the full ML-enabled feed pipeline: event ingestion, offline model training, candidate generation, low-latency online inference, and a streaming feedback loop that updates user embeddings and ranking signals in real time. The system should use precomputed post embeddings and compute/upsert user embeddings on-the-fly from recent interactions (e.g., weighted averages, session encoders), then score candidates with a fast ranker (e.g., a lightweight neural model or gradient boosted trees) before applying business constraints for freshness and diversity.
Suggested flow in an interview: ingest events (Kafka/PubSub) → update online feature store / user embedding service → trigger candidate generation (fan-out + ANN search) → apply ranker and re-ranker → cache and serve via API gateway → stream engagement back for model training and metrics. Discuss how you’d use approximate nearest neighbor (HNSW/FAISS), feature stores, caching (Redis), and streaming processors (Flink/Beam) to meet millisecond-to-second latency.
Skill signals interviewers look for: embedding design and dimensionality tradeoffs, ANN indexing and sharding, online vs offline feature computation, latency and throughput engineering, handling cold-start and stale content, A/B testing and observability, and strategies to ensure diversity and avoid feedback loops. You should be able to propose concrete latency targets, cost/scale trade-offs, and deployment/versioning approaches for continuous improvement.
Common Follow-up Questions
- •How would you handle cold-start users and new posts in a feed ranking system that relies on embeddings and historical signals?
- •Describe your approach to building and sharding an ANN index (e.g., HNSW/FAISS) to serve billions of post embeddings with low latency and high throughput.
- •If a trending event causes a traffic spike, what autoscaling, caching, and degradation strategies would you apply to preserve SLAs while keeping recommendations relevant?
- •How do you prevent feedback loops and filter bubbles when using online engagement to update user embeddings and retrain ranking models in near real time?
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.