k-Fold Cross-Validation Implementation - Uber

Question Description

What you'll be asked to do

You will implement reusable cross-validation utilities from scratch: a standard k-fold splitter, a stratified k-fold variant for classification, and a time-series (forward-chaining) variant. Each function should (1) produce train/validation index splits, (2) call a provided train_and_predict callable to get per-fold predictions, and (3) return per-fold metrics and an aggregate metric.

Core requirements and flow

Input validation: confirm X and y lengths match and that k (or n_splits) is an integer with 2 ≤ k ≤ n_samples. For stratified folds, verify every class has at least k examples.
Split generation: produce a list of k tuples (train_idx, val_idx). When shuffle=True, use seed to make splits reproducible.
Execution: for each fold call train_and_predict(X_train, y_train, X_val) → y_pred, compute metric(y_val, y_pred), collect per_fold_metrics.
Aggregate metric: return the mean of per-fold metrics: $\bar{m} = \frac{1}{k}\sum_{i=1}^k m_i$

Variant specifics

Stratified: preserve class proportions in each validation fold; allow optional shuffling within class buckets with seed-controlled randomness.
Time-series: enforce temporal order so training indices are strictly earlier than validation indices; support expanding-window and fixed-length rolling-window modes.

Skills you must demonstrate

You should show solid understanding of data shuffling and reproducibility (seed handling), index-based splitting, class-preserving sampling, and time-series leakage avoidance. Be prepared to discuss when each variant is appropriate and trade-offs between k choices (bias–variance, compute cost).

Implement k-Fold Cross-Validation From Scratch — Uber

Question Description

What you'll be asked to do

Core requirements and flow

Variant specifics

Skills you must demonstrate

Common Follow-up Questions

Related Questions

Explore More Questions

Practice This Question with AI