Airbnb System Design: Scalable Multi-Channel Notifications
Question Description
What you'll be asked
You are asked to design a backend notification system for an Airbnb-scale platform that supports both event-triggered (real-time) and batch (scheduled/promotional) notifications. Notifications must be routed to multiple channels — email, SMS, push, and social — and respect user preferences, rate limits, retry policies, and delivery tracking. You should cover generation, storage, delivery, monitoring, and extensibility.
High-level flow and stages
- Event ingestion: user activity or batch job produces events (via Pub/Sub or Kafka).
- Notification generation: a rules/templating service evaluates triggers, personalization, and channel selection.
- Fan-out & batching: worker pools and stream processors group/aggregate per-user to reduce noise and cost.
- Delivery adapters: pluggable channel adapters (SMTP, SMS gateway, push providers) with retries and DLQs.
- Persistence & history: store notifications in a NoSQL timeline/Inbox for offline access and auditing.
- Monitoring & throttling: metrics, SLA alerts, rate-limiters and backpressure to protect external providers.
Skills & signals to demonstrate
You should show knowledge of event streaming (Kafka/Pub-Sub), message ordering/partitioning, idempotency and at-least-once semantics, retry/DLQ patterns, horizontal scaling of worker pools, cost-aware batching, schema/versioning for templates, and observability (metrics, logs, tracing). Discuss trade-offs (latency vs. cost, ordering vs. throughput) and how to design for extensibility and high availability.
Common Follow-up Questions
- •How would you implement idempotency and deduplication to achieve exactly-once semantics (or mitigate duplicates) across retries and restarts?
- •Design the delivery adapters and third-party integration strategy: how do you handle provider rate limits, backoff, and per-channel throttling?
- •How would you optimize batch notifications for cost and throughput (segment precomputation, chunking, parallelism) while preserving user preferences and throttling?
- •How do you guarantee ordering for chat-related notifications? Describe partitioning, per-user queues, and trade-offs between strict ordering and latency.
- •Which observability and SLA metrics would you track (e.g., latency, success rate, DLQ size) and how would you design alerts and incident response?
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.