ml system design
Google
YouTube
TikTok

Google ML System Design: Fuzzy Video Deduplication

Topics:
Embedding Models
Approximate Nearest Neighbor
Vector Search
Roles:
Machine Learning Engineer
ML Engineer
Software Engineer (ML)
Experience:
Mid Level
Senior
Staff

Question Description

You must design a fuzzy deduplication system that detects near-duplicate short videos in real time at Google-scale. The system should ingest millions of uploads per day, make a deduplication decision in seconds, and allow creators to appeal false positives.

Core requirements: use learned embeddings (video-frame + audio + text fusion) to detect fuzzy matches rather than exact hashes; support extremely high concurrency (thousands RPS); be fault tolerant and cost-aware; and include a human-in-the-loop appeal and calibration workflow.

High-level flow: (1) lightweight pre-filter (perceptual hashing, uploader metadata, and text/audio heuristics) to reject obvious uniques; (2) frame sampling and feature extraction with a compact embedding model; (3) two-stage retrieval: low-dim ANN for recall (HNSW/IVF+PQ via FAISS/ScaNN/Milvus) then high-dim rerank for precision; (4) decision logic (block/soft-flag/warn) and immediate notification with appeal links; (5) human review queue that feeds labeled cases back for offline retraining and threshold calibration.

Skill signals you should show: designing scalable ANN indices and sharding, embedding model tradeoffs (128 vs 512 dims), latency vs accuracy modeling (memory, network, search cost, P95 latency), metrics (precision/recall/F1, A/B and offline evaluation), operational concerns (monitoring, rollback, cost, consistency), and designing a robust human-in-the-loop feedback loop to reduce false positives over time.

Common Follow-up Questions

  • How would you quantify and model the tradeoff between embedding dimension, memory footprint, and P95 latency (show calculations for 128 vs 512 dims)?
  • Design the ANN sharding and replication strategy to support thousands of concurrent nearest-neighbor queries with low tail latency—how do you handle hot shards?
  • What techniques would you use to reduce false positives (creator-appealed false matches) while preserving recall—discuss cascade thresholds, multimodal signals, and reranking?
  • How would you instrument and A/B test the deduplication pipeline in production to validate accuracy and user impact before full rollout?

Related Questions

1Design a scalable video fingerprinting and content ID system for copyright enforcement
2Build a low-latency approximate nearest neighbor service for billion-vector search
3Design a human-in-the-loop ML workflow for content moderation with retraining pipelines

Explore More Questions

Practice This Question with AI

Get real-time hints, detailed requirements, and insightful analysis of the question.

Google ML System Design: Fuzzy Video Deduplication | Voker