Debug GPT-style Transformer — OpenAI ML Engineer

Question Description

You are given a compact PyTorch GPT-style causal transformer and a small training harness. Your tasks are to diagnose and fix bugs so the model and training loop produce bitwise-identical outputs to a provided reference, add a KV cache for incremental autoregressive decoding, and attach a token-level classifier head (e.g., odd/even).

Core content

The exercise focuses on causal/self-attention correctness, positional embedding combination, model I/O shapes, and training updates. There are exactly four intentional bugs: three are in the model (attention masking, positional addition, and output projection shape) and one is in the training loop (missing optimizer step). You must preserve the external forward signature and return LM logits (and optional classifier logits) so existing harness code still works.

Flow / stages you'll work through

Inspect shapes and numerics on small synthetic batches to locate the three model bugs and correct causal masking, positional embedding usage, and output projection dimensions.
Fix the training-step bug so parameters actually update (maintain determinism with identical seeds and hyperparameters) and verify you can reproduce the reference outputs for a few deterministic steps.
Implement a simple per-layer KV cache API, add an incremental forward that reuses cached keys/values, and assert cached decoding matches full-sequence decoding.
Add a token-level classifier head (small linear layer producing two logits per token), integrate its loss with next-token cross-entropy, and report classifier accuracy on held-out data.

Skill signals

You should demonstrate PyTorch primitives, deterministic RNG handling, attention masking and numerical stability (softmax over -inf masked scores), careful tensor shapes, and minimal, testable changes so the LM outputs remain reproducible when the classifier is disabled. Include end-to-end verification tests for cache correctness and training determinism.

Debug and Extend GPT-style Transformer — OpenAI ML Engineer

Question Description

Common Follow-up Questions

Related Questions

Explore More Questions

Practice This Question with AI