260 likes | 271 Views
Learn about the bias-variance trade-off in machine learning and its impact on model generalization. Explore the concepts of underfitting and overfitting, and discover strategies to reduce variance.
E N D
CS 2750: Machine LearningLine Fitting + Bias-Variance Trade-off Prof. Adriana KovashkaUniversity of Pittsburgh January 26, 2017
Generalization • How well does a learned model generalize from the data it was trained on to a new test set? Training set (labels known) Test set (labels unknown) Slide credit: L. Lazebnik
Generalization • Components of expected loss • Noise in our observations: unavoidable • Bias: how much the average model over all training sets differs from the true model • Error due to inaccurate assumptions/simplifications made by the model • Variance: how much models estimated from different training sets differ from each other • Underfitting: model is too “simple” to represent all the relevant class characteristics • High bias and low variance • High training error and high test error • Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data • Low bias and high variance • Low training error and high test error Adapted from L. Lazebnik
Bias-Variance Trade-off • Models with too few parameters are inaccurate because of a large bias (not enough flexibility). • Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). Purple dots = possible test points Red dots = training data (all that we see before we ship off our model!) Green curve = true underlying model Blue curve = our predicted model/fit Adapted from D. Hoiem
Polynomial Curve Fitting Slide credit: Chris Bishop
Sum-of-Squares Error Function Slide credit: Chris Bishop
0th Order Polynomial Slide credit: Chris Bishop
1st Order Polynomial Slide credit: Chris Bishop
3rd Order Polynomial Slide credit: Chris Bishop
9th Order Polynomial Slide credit: Chris Bishop
Over-fitting Root-Mean-Square (RMS) Error: Slide credit: Chris Bishop
Data Set Size: 9th Order Polynomial Slide credit: Chris Bishop
Data Set Size: 9th Order Polynomial Slide credit: Chris Bishop
Regularization Penalize large coefficient values (Remember: We want to minimize this expression.) Adapted from Chris Bishop
Regularization: Slide credit: Chris Bishop
Regularization: Slide credit: Chris Bishop
Polynomial Coefficients Slide credit: Chris Bishop
Polynomial Coefficients No regularization Huge regularization Adapted from Chris Bishop
Regularization: vs. Slide credit: Chris Bishop
Training vs test error Underfitting Overfitting Error Test error Training error High Bias Low Variance Complexity Low Bias High Variance Slide credit: D. Hoiem
The effect of training set size Few training examples Test Error Many training examples High Bias Low Variance Complexity Low Bias High Variance Slide credit: D. Hoiem
The effect of training set size Fixed prediction model Error Testing Generalization Error Training Number of Training Examples Adapted from D. Hoiem
Choosing the trade-off between bias and variance • Need validation set (separate from the test set) Validation error Error Training error High Bias Low Variance Complexity Low Bias High Variance Slide credit: D. Hoiem
Bias-variance (Bishop Sec. 3.2) Figure from Chris Bishop
How to reduce variance? • Get more training data • Regularize the parameters • Choose a simpler classifier Slide credit: D. Hoiem
Remember… • Three kinds of error • Inherent: unavoidable • Bias: due to over-simplifications • Variance: due to inability to perfectly estimate parameters from limited data • Try simple classifiers first • Use increasingly powerful classifiers with more training data (bias-variance trade-off) Adapted from D. Hoiem