ml system design
Pinterest
Instagram
Twitter

Pinterest ML System Design: Real-Time Personalized Feed

Topics:
Recommender Systems
Real-Time Inference
Embeddings
Roles:
Software Engineer
ML Engineer
Recommender Engineer
Experience:
Mid Level
Senior
Staff

Question Description

Design a real-time, personalized feed ranking system for a social media app (Pinterest-style) that returns a ranked list of posts when a user opens the app and adapts within seconds to user interactions.

You’ll need to cover the full ML-enabled feed pipeline: event ingestion, offline model training, candidate generation, low-latency online inference, and a streaming feedback loop that updates user embeddings and ranking signals in real time. The system should use precomputed post embeddings and compute/upsert user embeddings on-the-fly from recent interactions (e.g., weighted averages, session encoders), then score candidates with a fast ranker (e.g., a lightweight neural model or gradient boosted trees) before applying business constraints for freshness and diversity.

Suggested flow in an interview: ingest events (Kafka/PubSub) → update online feature store / user embedding service → trigger candidate generation (fan-out + ANN search) → apply ranker and re-ranker → cache and serve via API gateway → stream engagement back for model training and metrics. Discuss how you’d use approximate nearest neighbor (HNSW/FAISS), feature stores, caching (Redis), and streaming processors (Flink/Beam) to meet millisecond-to-second latency.

Skill signals interviewers look for: embedding design and dimensionality tradeoffs, ANN indexing and sharding, online vs offline feature computation, latency and throughput engineering, handling cold-start and stale content, A/B testing and observability, and strategies to ensure diversity and avoid feedback loops. You should be able to propose concrete latency targets, cost/scale trade-offs, and deployment/versioning approaches for continuous improvement.

Common Follow-up Questions

  • How would you handle cold-start users and new posts in a feed ranking system that relies on embeddings and historical signals?
  • Describe your approach to building and sharding an ANN index (e.g., HNSW/FAISS) to serve billions of post embeddings with low latency and high throughput.
  • If a trending event causes a traffic spike, what autoscaling, caching, and degradation strategies would you apply to preserve SLAs while keeping recommendations relevant?
  • How do you prevent feedback loops and filter bubbles when using online engagement to update user embeddings and retrain ranking models in near real time?

Related Questions

1Design a candidate generation pipeline for a large-scale recommender using collaborative and content-based signals
2How to build an online feature store for low-latency ML inference and streaming updates
3Architect a scalable ANN serving layer for embedding-based similarity search
4Design an A/B testing and metrics pipeline to evaluate ranking model changes in a live feed

Explore More Questions

Practice This Question with AI

Get real-time hints, detailed requirements, and insightful analysis of the question.

ML System Design: Real-Time Feed Ranking - Pinterest | Voker