Coupang System Design: Large Video Upload (Backend)
Question Description
You are asked to design the backend for uploading very large videos (multiple GB) to a platform similar to YouTube. The focus is on the core upload flow from the client selecting a file through to the file being durably stored and initial validation/processing triggered.
Core content
You should describe how the system generates a unique upload ID, accepts chunked uploads, tracks progress, and supports resumable transfers after network interruptions. Explain short-term chunk buffering, how/when chunks are assembled or referenced in object storage, and how the system triggers initial processing tasks such as format validation and metadata extraction once all chunks arrive.
High-level flow/stages
- Upload initiation: auth, create upload session ID, client gets chunk size and endpoints.
- Chunked upload: client uploads chunks (concurrent/parallel), server acknowledges each chunk and updates progress.
- Resumption and recovery: how you record uploaded chunks and allow resume from last successful chunk after failures.
- Finalization: verify integrity, move data from temporary to persistent object storage, trigger processing queue.
Skill signals
You should demonstrate understanding of object storage semantics, data replication and consistency, resumable-upload protocols (e.g., byte-range or session-based), progress reporting, cost trade-offs (temporary vs. long-term storage), scalability patterns (sharding, stateless upload proxies, CDNs), and fault-tolerant designs (checksums, retry/backoff, idempotency).
When preparing, be ready to discuss trade-offs (latency vs cost, single-pass assembly vs reference-joining) and how you'd instrument and test recovery paths.
Common Follow-up Questions
- •How would you integrate a CDN and edge caching to reduce latency for large uploads and first-play viewing?
- •Design the authorization and rate-limiting strategy for uploads—how do you prevent abuse while preserving resume functionality?
- •How would you implement cross-region replication and disaster recovery for uploaded videos, and what consistency model would you choose?
- •Describe how you would support server-side deduplication and content-addressable storage for uploaded chunks to save bandwidth and storage costs.
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.