backend system design
Coupang
Google
YouTube

Coupang System Design: Large Video Upload (Backend)

Topics:
File Storage
Object Storage
Data Replication
Roles:
Software Engineer
Backend Engineer
Site Reliability Engineer
Experience:
Mid Level
Senior
Staff

Question Description

You are asked to design the backend for uploading very large videos (multiple GB) to a platform similar to YouTube. The focus is on the core upload flow from the client selecting a file through to the file being durably stored and initial validation/processing triggered.

Core content

You should describe how the system generates a unique upload ID, accepts chunked uploads, tracks progress, and supports resumable transfers after network interruptions. Explain short-term chunk buffering, how/when chunks are assembled or referenced in object storage, and how the system triggers initial processing tasks such as format validation and metadata extraction once all chunks arrive.

High-level flow/stages

  • Upload initiation: auth, create upload session ID, client gets chunk size and endpoints.
  • Chunked upload: client uploads chunks (concurrent/parallel), server acknowledges each chunk and updates progress.
  • Resumption and recovery: how you record uploaded chunks and allow resume from last successful chunk after failures.
  • Finalization: verify integrity, move data from temporary to persistent object storage, trigger processing queue.

Skill signals

You should demonstrate understanding of object storage semantics, data replication and consistency, resumable-upload protocols (e.g., byte-range or session-based), progress reporting, cost trade-offs (temporary vs. long-term storage), scalability patterns (sharding, stateless upload proxies, CDNs), and fault-tolerant designs (checksums, retry/backoff, idempotency).

When preparing, be ready to discuss trade-offs (latency vs cost, single-pass assembly vs reference-joining) and how you'd instrument and test recovery paths.

Common Follow-up Questions

  • How would you integrate a CDN and edge caching to reduce latency for large uploads and first-play viewing?
  • Design the authorization and rate-limiting strategy for uploads—how do you prevent abuse while preserving resume functionality?
  • How would you implement cross-region replication and disaster recovery for uploaded videos, and what consistency model would you choose?
  • Describe how you would support server-side deduplication and content-addressable storage for uploaded chunks to save bandwidth and storage costs.

Related Questions

1Design a resumable file upload API for large binaries (client + server protocol)
2Design a scalable video ingestion pipeline with transcoding and metadata extraction
3Design a content delivery system for large videos including CDN and regional replication

Explore More Questions

Practice This Question with AI

Get real-time hints, detailed requirements, and insightful analysis of the question.

Large Video Upload System Design - Coupang Backend | Voker