Distributed Task Scheduler Design - Apple Cloud

You are asked to design a distributed task scheduler that runs cloud background jobs such as data processing, reporting, and batch operations. The core pieces you must cover are a durable task queue, a scheduler that assigns work, and a worker pool that executes tasks reliably and at scale.

Start by describing end-to-end flow: how tasks are submitted via API (with metadata like priority, timeout, dependencies), how they are enqueued into a distributed queue, how the scheduler selects and leases tasks to workers, and how workers execute, acknowledge, retry, or dead-letter tasks. Explain how you’ll support delayed tasks, task dependencies, and priority ordering.

For the task queue and task structure, specify a durable storage choice (e.g., partitioned message broker or persistent stream like Kafka, SQS + durable DB, or Redis Streams) and patterns to avoid duplicates (idempotency keys, dedup store). Define a task schema including: id, type, payload, status, priority, retry_count, max_retries, timestamps (created, scheduled_at, started_at, completed_at), visibility_timeout/lease_id, timeout_ms, dependencies, resource_requirements, and metadata.

Discuss fault tolerance and consistency: use leases/visibility timeouts to prevent double execution, implement exponential backoff and a dead-letter queue for persistent failures, and prefer at-least-once semantics with idempotent handlers (or an exactly-once layer using deduplication). Cover scaling: partition queues, autoscale worker groups, use health checks and leader election for scheduler components, and monitor queue length, processing latency, retries, and error rates.

Skills you should demonstrate: distributed systems design, queueing and partitioning strategies, failure modes and retry semantics, data durability, scaling and autoscaling patterns, and practical choices of technologies for low-latency, reliable background job processing.

Apple System Design: Distributed Task Scheduler in Cloud

Question Description

Common Follow-up Questions

Related Questions

Explore More Questions

Practice This Question with AI