Anthropic System Design: Stateless Prompt Playground
Question Description
You are asked to design a stateless Prompt Engineering Playground backend (similar to an LLM playground) where each request to the model is independent unless a user explicitly saves context. The system must accept very large prompts (10MB+), support multiple browser windows/tabs per user, and let users share or export full conversations via unique links or JSON.
Core content
- Ingest: accept large prompt uploads via presigned object-storage URLs or chunked streaming to avoid overloading the app server. Use content-addressable storage (hash + dedupe) for large files.
- Preprocessing: chunk and optionally compress or summarize very large prompts before sending to the LLM. Consider client-side or edge pre-processing to reduce bandwidth and cost.
- LLM access: treat the LLM as an external API. Use a queuing layer (message queue or stream processor) to batch, rate-limit, and retry calls. Stream tokens back to the user (WebSocket/SSE) for low-latency UX.
- Persistence & sharing: store minimal metadata in a primary DB (User, Conversation, PromptRef, ResponseRef) and keep large payloads in object storage. Generate shareable links with randomized IDs or signed short-lived tokens; allow JSON export.
Flow / stages you should explain in an interview
- Client upload (presigned PUT / multipart / chunked). 2. Server validates, stores metadata, and returns a prompt reference. 3. Preprocessor (sync/async) chunks or summarizes large prompts. 4. Queue/worker sends request to LLM API, streams response back via WebSocket or SSE. 5. Persist response references and make the conversation shareable.
Skill signals to demonstrate
- Distributed systems: queues, workers, backpressure, and horizontal scaling.
- Streaming & performance: SSE/WebSockets, partial response delivery, chunking strategies.
- Cost optimization: deduplication, summarization, batching, and caching of frequent prompts.
- Data modeling & indexing: how you model User, Conversation, Prompt, Response, and your indexing strategy for fast retrieval.
- Security & reliability: signed URLs, access control on shared links, retries, idempotency, and monitoring.
When answering, walk through trade-offs (latency vs. cost, client vs. server preprocessing, strict statelessness vs. saved context) and be explicit about failure modes and mitigation (partial uploads, LLM timeouts, replay protection).
Common Follow-up Questions
- •How would you optimize for cost when multiple users submit similar large prompts repeatedly? (hint: caching, content-addressable storage, deduplication)
- •Design the sharing and access-control model: how do you implement expiring share links, permission scopes, and anonymous access safely?
- •Explain how you'd stream partial LLM responses to the client while ensuring resume/reconnect semantics across multiple browser tabs.
- •How do you guarantee true statelessness server-side while letting users 'save' context optionally? Describe idempotency, session tokens, and persisted context references.
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.