ML System Design Guide — Scalable AI Chatbot for Uber

Question Description

You are asked to design the backend for a scalable AI-powered chatbot that serves millions of daily users, supports multiple sessions per user, persists full chat histories, and returns model-generated replies with low latency.

Core content:

Build a real-time messaging stack (API Gateway → Auth → Message Ingest) that accepts user messages via HTTP/WebSocket and enqueues them for processing. Use a durable message queue (Kafka, Pulsar) to decouple ingestion from inference and persist events for reliability.
Design an inference layer for online AI responses: a model-serving cluster (KFServing, Triton, or managed model endpoints) with autoscaling, request batching, and GPU/CPU tiering. Implement context retrieval (last N messages or retrieved embeddings from a vector DB) to supply the model with session history while keeping token usage and latency in check.
Persist chat histories in a partitioned, highly-available store (e.g., DynamoDB/Cassandra + S3 for long-term archives) with per-session keys, timestamps, and metadata. Use strong or configurable consistency for reads/writes, idempotent writes, and optimistic concurrency for session updates.

Flow & stages:

Client sends message → API gateway/auth → enqueue message.
Worker retrieves message, reads session context, fetches embeddings if needed.
Dispatch to model-serving endpoint (streaming responses supported) → write AI reply back to history store and publish to user via WebSocket/SSE.

Skill signals:

You should demonstrate distributed-systems design (sharding, partitioning, replication), low-latency model deployment (batching vs. streaming), persistence strategies for chat history (consistency, compaction, archival), session management, monitoring/SLAs, and security (encryption, access control, privacy). Be prepared to justify trade-offs (cost, latency, consistency) and propose metrics, autoscaling knobs, and failure recovery strategies.

Uber ML System Design: Scalable AI Chatbot with History

Question Description

Common Follow-up Questions

Related Questions

Explore More Questions

Practice This Question with AI