ByteDance System Design: Reward Backend for High QPS
Question Description
Overview
You are asked to design a scalable, resilient backend for a mobile app rewarding mechanism where users earn and redeem points. The system must sustain high QPS (e.g., 10k req/s) during promotions while preventing double-spend, preserving auditability, and keeping latency low.
Core problem & constraints
You should cover accurate crediting and deduction of points, near-real-time balance updates, audit logging, user notifications, and graceful behavior under load spikes. Non-functional constraints include 99.9% availability, sub-200ms latency for most requests, fault tolerance, and cost-efficiency. Expect trade-offs between strong and eventual consistency for throughput.
High-level flow / interview stages
- Ingest: API gateway + rate limiter + authentication
- Validation: verify action (ad view, purchase, task) possibly via event callbacks
- Accounting: update user balance (fast path via cache/sharded counter, durable write to ledger)
- Redemption: reserve points, atomically complete deduction, trigger fulfillment
- Audit & monitoring: append immutable transaction logs (event stream) and write to OLTP for reconciliation
In an interview you'll be expected to sketch components, explain data flow, and justify consistency and failure-handling choices.
Skill signals interviewers look for
- Designing stateless services, load balancing, and autoscaling
- Caching strategies (read-through/write-through, sharded counters) and cache invalidation
- Rate limiting (token bucket, per-user/global tiers) and backpressure
- Concurrency control and idempotency to avoid double-spend (optimistic locks, distributed locks, compare-and-swap)
- Durable event logging (Kafka/event sourcing) and reconcilers for eventual consistency
- Monitoring, SLA-aware retries, circuit breakers, and cost/performance trade-offs
Prepare to discuss concrete technologies (Redis, Kafka, RDBMS), sharding, and fallback behavior during outages.
Common Follow-up Questions
- •How would you design the system to guarantee idempotent reward operations and prevent double-spending under retries and network partitions?
- •Describe a sharding strategy for user balances and how you'd rebalance shards without downtime during growth or promotions.
- •How would you implement rate limiting and backpressure for global promotions while preserving a fair per-user experience?
- •If Redis (cache) fails during a peak event, what fallback path ensures correctness and acceptable latency for earning and redeeming points?
- •Explain how you'd design reconciliation and audit pipelines (event sourcing vs ledger writes) to detect and fix inconsistencies in balances.
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.