Distributed Rate Limiter Interview - NVIDIA System Design

Question Description

You are asked to design a distributed rate limiter that integrates with an API gateway serving millions of users. The goal is to protect backend services from excessive traffic while keeping per-request latency low and allowing flexible, real-time configuration.

Core problem: For each incoming request your gateway must evaluate identifiers (user ID, API key, IP) against configurable rules and decide whether to allow, delay, or reject the request (HTTP 429). You must support at least two algorithms (e.g., token bucket and sliding window log), provide metrics and logs, and enable dynamic updates for different classes of users (free vs premium).

How an interview typically flows:

Clarify scope: single region vs multi-region, per-user vs per-endpoint limits, soft vs hard limits.
Propose high-level architecture: components, data flow, and where the limiter plugs into the API gateway.
Drill into design choices: algorithm selection, distributed counters, state sharding, consistency/accuracy trade-offs, failure modes and fallbacks.
Walk through scaling and HA: horizontal scaling, replication, and latency budget.
Add observability and operational controls: metrics, dashboards, and runtime config rollout.

Skill signals you should show: distributed-systems reasoning (consensus/consistency trade-offs), data structures for counters, caching and persistence choices (e.g., Redis, consistent hashing), algorithm pros/cons (token bucket vs sliding window), and operational thinking for monitoring, throttling strategies, and dynamic config rollout.

NVIDIA System Design Interview: Distributed Rate Limiter

Question Description

Common Follow-up Questions

Related Questions

Explore More Questions

Practice This Question with AI