Adobe ML System Design: Personalized Q&A Assistant
Question Description
Overview
You are asked to design a natural-language Q&A assistant for a marketing platform (think Adobe Experience Platform). The assistant must answer two classes of questions: general platform questions (pulled from public documentation) and personalized questions about a user's account data (via secure APIs). You should ensure scope limitation so the assistant says "I am unable to answer this type of question." for unrelated queries.
High-level flow / stages
- Ingest and index public documentation (crawl, normalize, create embeddings) and maintain a separate secure index for per-customer metadata or precomputed summaries.
- Front-end receives a query and performs authentication/authorization checks.
- Perform query classification: general vs personalized vs out-of-scope.
- For general queries, run retrieval over public docs and synthesize a concise answer. For personalized queries, call authenticated APIs, fetch/aggregate relevant account data, then summarize.
- Apply safety filters, provenance/attribution, caching, and return the answer within latency targets.
Skill signals you should demonstrate
- Retrieval-augmented generation (RAG) and vector search strategies
- Embeddings, semantic search, and metadata filtering for tenant isolation
- Authentication/authorization patterns (OAuth, role-based ACLs) and data governance
- Caching, batching, and async pipelines to meet 2–3s latency and scale to thousands of users
- Instrumentation, monitoring, and techniques to reduce hallucinations (source attribution, confidence thresholds)
Design choices should balance accuracy, security, latency, scalability and operational cost while providing clear provenance and scope limitation for liability management.
Common Follow-up Questions
- •How would you design tenant-isolated vector stores and metadata filtering so personalized retrieval never leaks other customers' data?
- •What caching and precomputation strategies would you use to keep personalized query latency under 2–3 seconds at thousands of concurrent users?
- •How do you handle hallucinations and ensure factual accuracy when synthesizing answers from documentation and live account data (provenance, confidence scores)?
- •Which embedding models and vector index (e.g., approximate nearest neighbor) architectures would you pick and why? Discuss cost and accuracy trade-offs.
- •Describe authentication and authorization flows (OAuth, RBAC) you’d implement to securely fetch customer data and audit access.
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.