RAG System Design Interview - OpenAI ML Engineer Guide

Question Description

You are asked to design a Retrieval-Augmented Generation (RAG) system for a large enterprise to support internal document Q&A and a customer support chatbot. The system should ingest millions of confidential documents, perform semantic search over embeddings, and synthesize accurate, context-aware answers in real time while enforcing privacy and access controls.

Core content — what you'll be asked

Document ingestion & indexing: automated parsing for PDFs, DOCX, TXT; text cleaning; chunking strategies; metadata extraction and embeddings generation; storing vectors and metadata in a scalable vector database.
Query processing & retrieval: natural-language preprocessing, intent parsing, hybrid retrieval (semantic embeddings + keyword filters), ANN search (HNSW/IVF), re-ranking and top-k passage selection with >90% retrieval precision targets.
Context & generation: maintaining conversational state for multi-turn queries, prompt construction with retrieved passages, LLM answer synthesis with source citations and hallucination mitigation.

High-level flow/stages

Ingest -> normalize -> chunk -> embed -> index. 2. Query -> preprocess -> retrieve top-k -> re-rank -> assemble context. 3. LLM synthesize answer -> citation & redact -> return. 4. Fallback/escalation to human agents if confidence low.

Skill signals you should demonstrate

You should show knowledge of vector search engines and ANN algorithms, embedding/model selection, latency optimization (caching, sharding, batching), security (encryption, RBAC, audit logging), training pipelines for retrievers/generators (contrastive losses, supervised reranking, offline & online eval), and metrics (precision@k, MRR, latency percentiles, user feedback loops). Also discuss deployment concerns: autoscaling, cost trade-offs, model update strategies, and monitoring for drift and hallucinations.

Use concrete trade-offs and design choices; explain how they meet the non-functional requirements (scalability, low latency, accuracy, security, reliability, maintainability, and cost efficiency).

OpenAI ML System Design: Scalable Enterprise RAG

Question Description

Common Follow-up Questions

Related Questions

Explore More Questions

Practice This Question with AI