Multi-threaded Event Logger System Design - Databricks

Question Description

You are asked to design a high-performance, multi-threaded event logger that a thousand concurrent threads will share. The logger must capture structured events (timestamp, thread id, level, message, metadata) without blocking application threads and without losing data under high load.

A practical design you can present uses asynchronous, non-blocking ingestion: application threads serialize minimal event payloads and push them into a concurrent in-memory queue (MPSC or lock-free ring buffer). A background worker (or pool) performs batching, serialization (structured JSON/Avro), and writes to outputs (local files with rotation, console, or remote sinks via a message queue like Kafka). Add a small in-memory buffer per thread or a bounded queue to limit contention and enable batching for throughput and low tail latency.

The interview typically flows: clarify requirements, pick concurrency primitives, sketch components (ingest queue, batcher, output writer, retry/backpressure), discuss durability and failure modes, and trade-offs (latency vs durability). Be explicit about configuration: dynamic log levels, pluggable outputs, and runtime toggles.

Skill signals to show: understanding of lock-free vs mutex designs, batching and flush strategies, backpressure and retry policies, crash recovery (WAL or fsync frequency), throughput/latency trade-offs, and observability (metrics, error handling). Mention common optimizations (Disruptor-style ring buffer, memory pooling, and efficient serializers) and when you’d choose them.

Databricks System Design: Multi-threaded Event Logger

Question Description

Common Follow-up Questions

Related Questions

Explore More Questions

Practice This Question with AI