Visa ML Coding Question: Linear Regression Implementation
Question Description
Train linear regression from scratch using gradient descent and return learned parameters and per-epoch loss history.
You are asked to implement a standard linear model where predictions follow:
y_hat = X @ w + b
and the objective is to minimize the mean squared error (MSE):
L(w, b) = (1/n) * sum_i (y_i - (x_i^T w + b))^2
Your implementation should accept a feature matrix X (n_samples, n_features), target vector y (n_samples,), a learning_rate, and a number of epochs. Use batch gradient descent (vectorized operations recommended) to update parameters. Return a 1-D params array of length n_features + 1 that includes the intercept (bias) — specify in your function docstring whether the intercept is the last or first element (recommendation: place the bias as the last element) — and a loss_history list containing the training MSE after each epoch.
Flow you can expect in an interview:
- Clarify shapes, numeric types, and whether to include an intercept term.
- Derive gradients for w and b and propose a vectorized update rule.
- Implement and run a training loop, computing MSE each epoch and returning params + loss history.
Skill signals to demonstrate: linear algebra/vectorization, gradient derivation, numerical stability (feature scaling), convergence behavior (learning rate choice), and clear function documentation. You may mention analytical (normal equation) comparison as an alternative but implement gradient descent without ML libraries.
Common Follow-up Questions
- •How would you add L2 regularization (Ridge) to your gradient updates and show the modified gradient expressions?
- •Compare batch, stochastic, and mini-batch gradient descent for this problem. How does batch size affect convergence and runtime?
- •How does feature scaling (standardization) change convergence speed and why should you apply it here?
- •Derive and implement the analytical closed-form solution (normal equation). Compare its output and runtime to gradient descent on small and large feature sets.
Related Questions
Explore More Questions
Practice This Question with AI
Get real-time hints, detailed requirements, and insightful analysis of the question.