Understanding Bias-Variance Trade-Off in Machine Learning

CS 2750: Machine LearningLine Fitting + Bias-Variance Trade-off Prof. Adriana KovashkaUniversity of Pittsburgh January 26, 2017

Generalization • How well does a learned model generalize from the data it was trained on to a new test set? Training set (labels known) Test set (labels unknown) Slide credit: L. Lazebnik

Generalization • Components of expected loss • Noise in our observations: unavoidable • Bias: how much the average model over all training sets differs from the true model • Error due to inaccurate assumptions/simplifications made by the model • Variance: how much models estimated from different training sets differ from each other • Underfitting: model is too “simple” to represent all the relevant class characteristics • High bias and low variance • High training error and high test error • Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data • Low bias and high variance • Low training error and high test error Adapted from L. Lazebnik

Bias-Variance Trade-off • Models with too few parameters are inaccurate because of a large bias (not enough flexibility). • Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). Purple dots = possible test points Red dots = training data (all that we see before we ship off our model!) Green curve = true underlying model Blue curve = our predicted model/fit Adapted from D. Hoiem

Polynomial Curve Fitting Slide credit: Chris Bishop

Sum-of-Squares Error Function Slide credit: Chris Bishop

0th Order Polynomial Slide credit: Chris Bishop

1st Order Polynomial Slide credit: Chris Bishop

3rd Order Polynomial Slide credit: Chris Bishop

9th Order Polynomial Slide credit: Chris Bishop

Over-fitting Root-Mean-Square (RMS) Error: Slide credit: Chris Bishop

Data Set Size: 9th Order Polynomial Slide credit: Chris Bishop

Regularization Penalize large coefficient values (Remember: We want to minimize this expression.) Adapted from Chris Bishop

Regularization: Slide credit: Chris Bishop

Polynomial Coefficients Slide credit: Chris Bishop

Polynomial Coefficients No regularization Huge regularization Adapted from Chris Bishop

Regularization: vs. Slide credit: Chris Bishop

Training vs test error Underfitting Overfitting Error Test error Training error High Bias Low Variance Complexity Low Bias High Variance Slide credit: D. Hoiem

The effect of training set size Few training examples Test Error Many training examples High Bias Low Variance Complexity Low Bias High Variance Slide credit: D. Hoiem

The effect of training set size Fixed prediction model Error Testing Generalization Error Training Number of Training Examples Adapted from D. Hoiem

Choosing the trade-off between bias and variance • Need validation set (separate from the test set) Validation error Error Training error High Bias Low Variance Complexity Low Bias High Variance Slide credit: D. Hoiem

Bias-variance (Bishop Sec. 3.2) Figure from Chris Bishop

How to reduce variance? • Get more training data • Regularize the parameters • Choose a simpler classifier Slide credit: D. Hoiem

Remember… • Three kinds of error • Inherent: unavoidable • Bias: due to over-simplifications • Variance: due to inability to perfectly estimate parameters from limited data • Try simple classifiers first • Use increasingly powerful classifiers with more training data (bias-variance trade-off) Adapted from D. Hoiem

Understanding Bias-Variance Trade-Off in Machine Learning