ByteDance ML: Binary Logistic Regression (NumPy)
Question Description
You are asked to implement a binary logistic regression classifier from scratch using NumPy only. The model should accept X (n, d) and binary labels y (n,) and expose learned parameters as public attributes (for example, coef_ and intercept_). The task covers building a class-based API with a training loop that performs a forward pass, computes binary cross-entropy (BCE) loss, backpropagates to compute gradients, and updates parameters via gradient descent across epochs.
The interview flow typically starts with the model signature and data shapes, then moves to implementing fit() with the BCE objective, deriving and coding gradients for weights and bias, and finally writing predict_proba() and predict() that use the learned coef_ and intercept_. You should demonstrate correct vectorized NumPy operations (no external ML libraries), numerical stability in the sigmoid/log loss (e.g., clipping or log-sum-exp style care), and clear public attributes (coef_ shape (d,), intercept_ scalar).
Skill signals interviewers look for: solid understanding of logistic regression and binary cross-entropy, ability to derive and implement gradients, NumPy vectorization and numerical stability, training-loop correctness (learning rate, epochs, convergence checks), and clear API design (fit, predict_proba, predict). Be ready to discuss extensions: regularization, mini-batch or stochastic gradient descent, handling class imbalance, and multiclass generalization via softmax.
Common Follow-up Questions
- •How would you add L2 (ridge) regularization to the loss and update the gradient calculations? Show the modified loss and gradient expressions.
- •Implement mini-batch or stochastic gradient descent versions of fit. How does batch size affect convergence and runtime?
- •How would you make the implementation numerically stable (avoid log(0) or overflow in sigmoid)? What specific code changes would you make?
- •How would you extend this implementation to multiclass classification (softmax) and train with cross-entropy?
- •If labels are imbalanced, how would you modify the loss or training to improve minority-class performance (class weights, focal loss, threshold tuning)?
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.