Snowflake System Design: Versioned Key-Value Store
Question Description
You are asked to design a globally distributed, versioned key-value store that supports high read/write throughput across multiple regions, global versioning for every write, and time-travel queries to fetch historical values.
The core problem: store key-value pairs with a global monotonic version (timestamp or logical clock) per write, support get/put, allow queries by timestamp or version (returning the closest prior version if exact match is missing), list version history for a key, and provide configurable replication and consistency across regions.
Interview flow / stages you should prepare for:
- High-level architecture: partitioning (consistent hashing / range shards), replication topology, and how you assign global versions.
- Data-path details: write and read workflows, local vs cross-region reads, and how time-travel queries locate historical values.
- Failure and recovery: node failure detection, automatic rebalancing, and repair (anti-entropy) between replicas.
- Storage and operations: version compaction, tiered storage for old versions, soft vs hard deletes, and efficient index structures for timestamped versions.
Skill signals examiners expect: distributed systems fundamentals (consensus or logical clocks), replication and consistency trade-offs, storage optimization for historical versions, efficient partitioning and rebalancing, conflict resolution strategies (version vectors, LWW), and practical latency/availability engineering for multi-region services.
Common Follow-up Questions
- •How would you assign and coordinate a global monotonic version across regions (physical timestamps, Lamport clocks, or hybrid logical clocks)? Explain trade-offs for latency and correctness.
- •Describe how you'd implement efficient time-travel reads: indexing, storage layout (append-only logs vs versioned SSTables), and query path for retrieving the nearest prior version.
- •If you choose eventual consistency across regions, how do you detect and resolve conflicting concurrent writes (version vectors, CRDTs, last-write-wins)?
- •Explain storage compaction, retention policies, and tiered storage for historical versions to control costs while maintaining time-travel guarantees.
- •How would you design automated rebalancing and failure recovery to meet 99.99% availability and p99 latency SLAs during node or zone outages?
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.