ml system design
Airbnb
Uber
Etsy

Airbnb ML System Design: Customer LTV Prediction

Topics:
Churn Prediction
Demand Forecasting
Time Series Forecasting
Roles:
Software Engineer
ML Engineer
Data Scientist
Experience:
Mid Level
Senior
Staff

Question Description

You are asked to design a scalable machine learning system that predicts Customer Lifetime Value (LTV) for a consumer platform such as Airbnb. The goal is to estimate the total revenue a user will generate over their entire relationship with the product so the business can prioritize marketing spend, retention strategies, and customer segmentation.

Start by outlining high-level components: data ingestion, raw event and transaction stores, feature engineering pipelines (batch and streaming), model training and evaluation, prediction serving (batch and online), and monitoring/alerting. Explain how you will transform raw behavioral logs, transactions, and demographics into features like RFM (recency, frequency, monetary), churn indicators, purchase cadence, average order value, and engagement signals. Consider data quality, joins across user identifiers, and handling censored users (new users without long histories).

In the interview flow you should: (1) clarify LTV definition and business horizon (30/90/365 days or lifetime), (2) choose modeling families (survival analysis, time-series, parametric regression, or deep learning) and justify trade-offs, (3) sketch data pipelines and storage (cold vs hot storage), (4) show how predictions are consumed (API vs batch for marketing), and (5) describe monitoring, retraining cadence, and A/B evaluation.

Skill signals the interviewer looks for: distributed data architecture, feature engineering for temporal data, handling censoring and covariate shift, evaluating calibration and business impact, latency vs cost trade-offs, and a plan for ongoing model governance and reliability.

Common Follow-up Questions

  • How would you model censored user histories and use survival analysis to estimate LTV for new users?
  • Describe how you would detect and handle covariate shift between training data and current traffic (data drift) and trigger retraining.
  • How do you design feature pipelines to support both low-latency online features and cost-efficient batch features?
  • How would you quantify uncertainty in LTV predictions and use that uncertainty in decisions like bid-level marketing or personalization?

Related Questions

1Design a churn prediction system for a marketplace platform
2How to build a cohort-based LTV evaluation and offline validation pipeline
3Design a scalable feature store for time-series user features
4How to forecast user lifetime purchases using survival analysis and time-series models

Explore More Questions

Practice This Question with AI

Get real-time hints, detailed requirements, and insightful analysis of the question.

LTV Prediction System Design - Airbnb ML Interview | Voker