Airbnb ML System Design: Customer LTV Prediction
Question Description
You are asked to design a scalable machine learning system that predicts Customer Lifetime Value (LTV) for a consumer platform such as Airbnb. The goal is to estimate the total revenue a user will generate over their entire relationship with the product so the business can prioritize marketing spend, retention strategies, and customer segmentation.
Start by outlining high-level components: data ingestion, raw event and transaction stores, feature engineering pipelines (batch and streaming), model training and evaluation, prediction serving (batch and online), and monitoring/alerting. Explain how you will transform raw behavioral logs, transactions, and demographics into features like RFM (recency, frequency, monetary), churn indicators, purchase cadence, average order value, and engagement signals. Consider data quality, joins across user identifiers, and handling censored users (new users without long histories).
In the interview flow you should: (1) clarify LTV definition and business horizon (30/90/365 days or lifetime), (2) choose modeling families (survival analysis, time-series, parametric regression, or deep learning) and justify trade-offs, (3) sketch data pipelines and storage (cold vs hot storage), (4) show how predictions are consumed (API vs batch for marketing), and (5) describe monitoring, retraining cadence, and A/B evaluation.
Skill signals the interviewer looks for: distributed data architecture, feature engineering for temporal data, handling censoring and covariate shift, evaluating calibration and business impact, latency vs cost trade-offs, and a plan for ongoing model governance and reliability.
Common Follow-up Questions
- •How would you model censored user histories and use survival analysis to estimate LTV for new users?
- •Describe how you would detect and handle covariate shift between training data and current traffic (data drift) and trigger retraining.
- •How do you design feature pipelines to support both low-latency online features and cost-efficient batch features?
- •How would you quantify uncertainty in LTV predictions and use that uncertainty in decisions like bid-level marketing or personalization?
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.