Bloomberg System Design: Global News Aggregation & Trending
Question Description
You are asked to design the backend for a global news platform that ingests articles from publishers and the Newser API (which only returns one region per call), processes real-time user interactions, computes trending scores, and serves a low-latency, personalized news feed and search to users worldwide.
Key challenges you must address include aggregating across regions despite Newser's per-region call limit, respecting API rate limits and quotas, supporting publisher push ingestion, and calculating real-time trending with time-decay while keeping results consistent and fast.
High-level flow/stages you should cover:
- Regioned ingestion: parallel region workers, regional fetch scheduler, and a deduplication layer to merge duplicate articles.
- Stream processing: durable message bus (e.g., Kafka) and stream processors (Flink/Beam) to compute time-decayed trending scores and materialize views.
- Storage & indexing: a write-optimized store for raw articles, a search index (Elasticsearch/OpenSearch) for keyword queries, and a fast cache (Redis + CDN) for feeds.
- Serving layer: feed and search APIs, personalization filters (language, interest, region), and push updates (WebSockets/SSE) for real-time UI.
Skill signals interviewers look for: distributed systems and scaling patterns, stream processing and windowing, search indexing and real-time index updates, API design and rate-limit strategies, data modeling for time-decay metrics, caching and cache invalidation, and operational concerns (monitoring, retries, fault tolerance).
In your design, justify trade-offs for consistency vs. latency, strategies to reduce Newser API cost (batching, caching, region prioritization), and how you would test correctness and recover from failures.
Common Follow-up Questions
- •How would you design the ingestion scheduler to minimize Newser API calls and handle regional rate limits while ensuring freshness?
- •Describe the time-decay trending algorithm and how you would implement it in a stream processor to support sliding windows and high cardinality.
- •How would you ensure search indices reflect near-real-time updates (new articles and interaction-based boosts) without overloading the indexer?
- •Explain your approach to personalization: how do you combine global trending scores with user preferences (interests, language, region) at low latency?
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.