Twitter Hashtag Aggregator System Design Interview - Google

Question Description

Designing a Twitter-like hashtag aggregator tests your ability to build high-throughput, low-latency analytics pipelines that support real-time and historical queries.

You are asked to ingest live tweet events (hashtags, timestamps, engagement metrics), compute aggregated metrics over configurable windows (minute/hour/day), and serve queries such as top trending hashtags, time-series counts, and engagement summaries. At interview time you should walk through an end-to-end flow: ingest → stream processing & windowing → aggregation/storage → query API and dashboard.

Start by explaining ingestion (Pub/Sub/Kafka), partitioning by hashtag or user, and how you’ll shard for scale. Then describe stateful stream processing: tumbling/sliding windows, watermarking to handle late events, and fault-tolerant state backends (e.g., Kafka Streams/Flink with durable checkpoints). Discuss aggregation storage: a real-time store (Druid/ClickHouse/Redis timeseries) for fast top-K queries and a cold OLAP store (BigQuery/S3 + batch jobs) for long-range analytics.

Cover read patterns and DB trade-offs: range queries and heavy aggregations favor columnar/TSDB or OLAP systems; write-heavy counters can use wide-column stores with materialized aggregates. Finally, highlight non-functional considerations: partitioning, replication, backpressure, monitoring, and SLOs for 100ms read latencies. Demonstrate these skills to show you can reason about throughput, correctness, durability, and operational complexity.

Google System Design: Twitter Hashtag Aggregator Guide

Question Description

Common Follow-up Questions

Related Questions

Explore More Questions

Practice This Question with AI