ml system design
LinkedIn
Google
Yelp

LinkedIn ML System Design: Real-Time Nearby Recommendations

Topics:
Recommender Systems
Real-time Systems
Geospatial Indexing
Roles:
Machine Learning Engineer
Data Scientist
Software Engineer
Experience:
Mid Level
Senior
Staff

Question Description

Design a backend recommender that returns personalized nearby locations (restaurants, cafes, attractions) to mobile users in real time. You must accept streaming GPS coordinates and optional preferences via a REST API, apply efficient geospatial filtering, generate a candidate set, rank results by personalized relevance, and return a top-N list (name, distance, rating) within a tight latency SLA.

Start by defining the high-level flow: ingestion (mobile -> API -> gateway), geospatial filtering (H3/R-tree or geo-index query), candidate generation (popularity + collaborative or content-based candidates), online ranking (learned model or lightweight scoring), caching and edge serving, and feedback logging (clicks, visits, ratings) back into your training and feature pipelines. Consider how streaming frameworks (Kafka, Flink) and a feature store enable real-time features for moving users.

Key trade-offs and constraints you should discuss: meeting 100–200 ms latency (caching, pre-computed candidates, model size), scaling to millions of DAU (horizontal sharding, regional replicas, CDNs), accuracy vs. freshness (how often to refresh candidate pools), and privacy/GDPR compliance for location data (minimization, TTLs, consent).

Skills you’ll show: geospatial indexing, low-latency serving, candidate generation strategies, real-time feature engineering, ranking/model deployment, A/B testing, and secure data handling. Be prepared to justify architecture choices and describe monitoring, fallbacks, and cold-start solutions.

Common Follow-up Questions

  • How would you design the candidate generation stage to guarantee sub-200ms end-to-end latency for a moving user?
  • How do you handle cold-start users and new locations so personalization remains useful and accurate?
  • Describe how you would incorporate streaming contextual features (time of day, weather) into online ranking without violating latency constraints.
  • What privacy-preserving techniques would you apply to location logs to satisfy GDPR while maintaining personalization quality?
  • How would you monitor and evaluate relevance in production? Propose online metrics, offline experiments, and an A/B testing strategy.

Related Questions

1Design a scalable feature store and serving layer for real-time ML features used by recommendations
2Architect a location-based search and POI query service with geospatial indexing and proximity ranking
3Design an online ranking system for personalized feeds with streaming user interactions
4How to build a real-time candidate generation pipeline for millions of users using approximate nearest neighbors (ANN)
5Design an end-to-end offline + online training pipeline to update recommender models with feedback loops

Explore More Questions

Practice This Question with AI

Get real-time hints, detailed requirements, and insightful analysis of the question.

ML System Design: Real-Time Nearby Recommendations | Voker