NVIDIA System Design Interview: Distributed Rate Limiter
Question Description
You are asked to design a distributed rate limiter that integrates with an API gateway serving millions of users. The goal is to protect backend services from excessive traffic while keeping per-request latency low and allowing flexible, real-time configuration.
Core problem: For each incoming request your gateway must evaluate identifiers (user ID, API key, IP) against configurable rules and decide whether to allow, delay, or reject the request (HTTP 429). You must support at least two algorithms (e.g., token bucket and sliding window log), provide metrics and logs, and enable dynamic updates for different classes of users (free vs premium).
How an interview typically flows:
- Clarify scope: single region vs multi-region, per-user vs per-endpoint limits, soft vs hard limits.
- Propose high-level architecture: components, data flow, and where the limiter plugs into the API gateway.
- Drill into design choices: algorithm selection, distributed counters, state sharding, consistency/accuracy trade-offs, failure modes and fallbacks.
- Walk through scaling and HA: horizontal scaling, replication, and latency budget.
- Add observability and operational controls: metrics, dashboards, and runtime config rollout.
Skill signals you should show: distributed-systems reasoning (consensus/consistency trade-offs), data structures for counters, caching and persistence choices (e.g., Redis, consistent hashing), algorithm pros/cons (token bucket vs sliding window), and operational thinking for monitoring, throttling strategies, and dynamic config rollout.
Common Follow-up Questions
- •How would you implement global (cross-region) rate limits while keeping per-request latency under 10ms?
- •Compare token bucket and sliding window log in terms of memory use, accuracy, and burst handling—when would you pick each?
- •If your primary counter store (e.g., Redis cluster) becomes unavailable, what fallback strategies ensure continued enforcement without allowing limit bypass?
- •How would you support complex, hierarchical rules (global, per-tenant, per-endpoint) and evaluate them efficiently at high QPS?
- •Describe how you would roll out dynamic limit changes safely (canary, feature flags) and ensure consistency during transitions.
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.