Snapchat ML System Design: Real-Time Multimodal Moderation
Question Description
You're asked to design a real-time multimodal harmful content detection system for a large social platform (text, images, video). The goal is to flag hate speech, harassment, graphic violence, explicit material, and spam in incoming posts, integrate with recommendations to down-rank or filter content, and surface suspicious items for human review while keeping latency, scale, and accuracy constraints in mind.
High-level flow: ingest posts via API/streaming, pre-process text (tokenization, profanity masks) and media (thumbnailing, keyframe extraction, audio-to-text), run fast heuristics and lightweight classifiers for immediate filtering, and then run a multimodal fusion model (text + image + audio/video features) to produce a harmfulness score and category. Use thresholding and an ensemble of models to decide actions: auto-block, down-rank, flag for review, or shadow-mode logging for offline evaluation.
What you'll be evaluated on: system architecture for horizontal scaling (sharding, batching, model serving), low-latency inference strategies (quantization, pruning, model distillation, GPU/accelerator utilization), data engineering for continuous retraining (handling class imbalance, label quality), and product/policy trade-offs (precision vs recall, explainability to moderators, privacy constraints). Be ready to discuss metrics (precision, recall, F1, latency percentiles), monitoring and auditing, feedback loops from human review, and how you would evolve the pipeline as new harmful content types emerge.
Common Follow-up Questions
- •How would you design the model serving layer to meet <100ms latency at 10k req/s — discuss batching, caching, GPU vs CPU, and autoscaling strategies?
- •What evaluation and monitoring pipeline would you implement to detect model drift, measure precision/recall in production, and trigger retraining?
- •How would you handle video moderation specifically (frame sampling, temporal models, audio transcription) while minimizing compute cost?
- •Discuss strategies to make detections explainable for moderators and to reduce false positives without sacrificing recall.
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.