1 / 47

Resampling MEthods

Resampling MEthods. Chapter 05 Disclaimer: This PPT is modified based on IOM 530: Intro. to Statistical Learning "Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R"  (Springer, 2013) with permission from the authors: 

pettyl
Download Presentation

Resampling MEthods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STT592-002: Intro. to Statistical Learning Resampling MEthods Chapter 05 Disclaimer: This PPT is modified based on IOM 530: Intro. to Statistical Learning "Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R"  (Springer, 2013) with permission from the authors:  G. James, D. Witten,  T. Hastie and R. Tibshirani " 

  2. STT592-002: Intro. to Statistical Learning Outline • Cross Validation • The Validation Set Approach • Leave-One-Out Cross Validation • K-fold Cross Validation • Bias-Variance Trade-off for k-fold Cross Validation • Cross Validation on Classification Problems

  3. STT592-002: Intro. to Statistical Learning What are resampling methods? • Tools that involves repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain more information about the fitted model • Model Assessment: estimate test error rates • Model Selection: select appropriate level of model flexibility • They are computationally expensive! But these days we have powerful computers  • Two resampling methods: • Cross Validation • Bootstrapping

  4. STT592-002: Intro. to Statistical Learning Simulation study for resample techniques • Simulation study: • Data: 1, 3, 4, 6, 7, 9, 12, 15, 17, 20. • Validation-set method • K-fold CV (eg: k=5) • LOO-CV

  5. STT592-002: Intro. to Statistical Learning 5.1.1 Typical Approach: Validation Set Approach • To find a set of variables that give lowest test (not training) error rate • With a large data set, we can randomly split data into training and validation(testing) sets • Then use training set to build a model, and choose the model with lowest error rate for validation data Training Data Testing Data

  6. STT592-002: Intro. to Statistical Learning Example: Auto Data • Suppose that we want to predict mpg from horsepower • Two models: • mpg ~ horsepower • mpg ~ horsepower + horspower2 • Which model gives a better fit? • Randomly split Auto data set into training (196 obs.) and validation data (196 obs.) • Fit both models using training data set • Then, evaluate both models using validation data set • The model with lowest validation (testing) MSE is the winner!

  7. STT592-002: Intro. to Statistical Learning Results: Auto Data • Left: Validation error rate for a single split • Right: Validation method repeated 10 times, each time the split is done randomly! • There is a lot of variability among the MSE’s… Not good! We need more stable methods!

  8. STT592-002: Intro. to Statistical Learning The Validation Set Approach • Advantages: • Simple • Easy to implement • Disadvantages: • The validation MSE can be highly variable • Only a subset of observations are used to fit the model (training data). Statistical methods tend to perform worse when trained on fewer observations

  9. STT592-002: Intro. to Statistical Learning 5.1.2 Leave-One-Out Cross Validation (LOOCV) • This method is similar to the Validation Set Approach, but it tries to address the latter’s disadvantages • For each suggested model, do: • Split the data set of size n into • Training data set (blue) size: n -1 • Validation data set (beige) size: 1 • Fit the model using the training data • Validate model using the validation data, and compute the corresponding MSE • Repeat this process n times • The MSE for the model is computed as follows:

  10. STT592-002: Intro. to Statistical Learning LOOCV vs. the Validation Set Approach • LOOCV has less bias • We repeatedly fit the statistical learning method using training data that contains n-1 obs., i.e. almost all the data set is used • LOOCV produces a less variable MSE • The validation approach produces different MSE when applied repeatedly due to randomness in the splitting process, while performing LOOCV multiple times will always yield the same results, because we split based on 1 obs. each time • LOOCV is computationally intensive (disadvantage) • We fit the each model n times!

  11. STT592-002: Intro. to Statistical Learning 5.1.3 k-fold Cross Validation • LOOCV is computationally intensive, so we can run k-fold Cross Validation instead • With k-fold Cross Validation, we divide the data set into K different parts (e.g. K = 5, or K = 10, etc.) • We then remove the first part, fit the model on the remaining K-1 parts, and see how good the predictions are on the left out part (i.e. compute the MSE on the first part) • We then repeat this K different times taking out a different part each time • By averaging the K different MSE’s we get an estimated validation (test) error rate for new observations

  12. STT592-002: Intro. to Statistical Learning K-fold Cross Validation

  13. STT592-002: Intro. to Statistical Learning K-fold Cross Validation

  14. STT592-002: Intro. to Statistical Learning Auto Data: LOOCV vs. K-fold CV • Left: LOOCV error curve • Right: 10-fold CV was run many times, and the figure shows the slightly different CV error rates • LOOCV is a special case of k-fold, where k = n • They are both stable, but LOOCV is more computationally intensive!

  15. STT592-002: Intro. to Statistical Learning Auto Data: Validation Set Approach vs. K-fold CV Approach • Left: Validation Set Approach • Right: 10-fold Cross Validation Approach • Indeed, 10-fold CV is more stable!

  16. STT592-002: Intro. to Statistical Learning K-fold Cross Validation on Three Simulated Date • Blue: True Test MSE • Black: LOOCV MSE • Orange: 10-fold MSE • Refer to chapter 2 for the top graphs, Fig 2.9, 2.10, and 2.11

  17. STT592-002: Intro. to Statistical Learning The Bias-variance trade-off Let’s Go back to Chapter 02: Statistical Learning

  18. Where does the error come from? Disclaimer: This PPT is modified based on Dr. Hung-yi Lee http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html

  19. Estimator Bias + Variance From training data, we find is an estimator of

  20. Bias and Variance of Estimator unbiased • Estimate the mean of a variable x • assume the mean of x is • assume the variance of x is • Estimator of mean • Sample N points:

  21. Bias and Variance of Estimator unbiased • Estimate the mean of a variable x • assume the mean of x is • assume the variance of x is • Estimator of mean • Sample N points: Smaller N Larger N Variance depends on the number of samples

  22. Bias and Variance of Estimator Increase N • Estimate the mean of a variable x • assume the mean of x is • assume the variance of x is • Estimator of variance • Sample N points: Biased estimator

  23. Variance Bias

  24. from 100 samples y = b + w xcp y = b + w1 xcp + w2 (xcp)2 + w3 (xcp)3 y = b + w1 xcp + w2 (xcp)2 + w3 (xcp)3 + w4 (xcp)4 + w5 (xcp)5

  25. Variance y = b + w1 xcp + w2 (xcp)2 + w3 (xcp)3 + w4 (xcp)4 + w5 (xcp)5 y = b + w xcp Large Variance Small Variance Simpler model is less influenced by the sampled data Consider the extreme case f(x) of degree 5

  26. Bias • Bias: If we average all the , is it close to ? Large Bias Assume this is Small Bias

  27. Degree=1 Black curve: the true function Red curves: 5000 Blue curve: the average of 5000 Degree=5 Degree=3

  28. Bias y = b + w1 xcp + w2 (xcp)2 + w3 (xcp)3 + w4 (xcp)4 + w5 (xcp)5 y = b + w xcp Small Bias Large Bias model model

  29. What to do with large bias? • Diagnosis: • If your model cannot even fit the training examples, then you have large bias • If you can fit the training data, but large error on testing data, then you probably have large variance • For bias, redesign your model: • Add more features as input • A more complex model Underfitting Overfitting large bias

  30. What to do with large variance? • More data • Regularization Very effective, but not always practical 10 examples 100 examples May increase bias

  31. Bias v.s. Variance Error from bias Error from variance Overfitting Error observed Underfitting Horizontal Axis: Model Complex Small Bias Large Bias Large Variance Small Variance

  32. Consequently… Average Error on Testing Data error due to "bias" and error due to "variance" A more complex model does not always lead to better performance on testing data.

  33. Cross Validation public private Testing Set Training Set Testing Set Training Set Validation set Using the results of public testing data to tune your model You are making public set better than private set. Err = 0.9 Model 1 Not recommend Err = 0.7 Model 2 Err > 0.5 Model 3 Err = 0.5 Err > 0.5

  34. N-fold Cross Validation Training Set Model 2 Model 3 Model 1 Train Train Val Err = 0.2 Err = 0.4 Err = 0.4 Train Val Train Err = 0.4 Err = 0.5 Err = 0.5 Val Train Train Err = 0.3 Err = 0.6 Err = 0.3 Avg Err = 0.3 Avg Err = 0.5 Avg Err = 0.4 Testing Set Testing Set public private

  35. STT592-002: Intro. to Statistical Learning Resampling Methods—continue... Chapter 05 Disclaimer: This PPT is modified based on IOM 530: Intro. to Statistical Learning

  36. STT592-002: Intro. to Statistical Learning 5.1.4 Bias- Variance Trade-off for k-fold CV • Putting aside that LOOCV is more computationally intensive than k-fold CV… Which is better LOOCV or K-fold CV? • LOOCV is less bias than k-fold CV (when k < n) • But, LOOCV has higher variance than k-fold CV (when k < n) • Thus, there is a trade-off between what to use • Conclusion: • We tend to use k-fold CV with (K = 5 and K = 10) • These are the magical K’s  • It has been empirically shown that they yield test error rate estimates that suffer neither from excessively high bias, nor from very high variance

  37. STT592-002: Intro. to Statistical Learning 5.1.5 Cross Validation on Classification Problems • So far, we have been dealing with CV on regression problems • We can use cross validation in a classification situation in a similar manner • Divide data into K parts • Hold out one part, fit using the remaining data and compute the error rate on the hold out data • Repeat K times • CV error rate is the average over the K errors we have computed

  38. STT592-002: Intro. to Statistical Learning CV to Choose Order of Polynomial • The data set used is simulated (refer to Fig 2.13) • The purple dashed line is the Bayes’ boundary Bayes’ Error Rate: 0.133

  39. STT592-002: Intro. to Statistical Learning CV to Choose Order of Polynomial • Linear Logistic regression (Degree 1) is not able to fit the Bayes’ decision boundary • Quadratic Logistic regression does better than linear Error Rate: 0.201 Error Rate: 0.197

  40. STT592-002: Intro. to Statistical Learning CV to Choose Order of Polynomial • Using cubic and quartic predictors, the accuracy of the model improves Error Rate: 0.160 Error Rate: 0.162

  41. STT592-002: Intro. to Statistical Learning CV to Choose the Order • Brown: Test Error • Blue: Training Error • Black: 10-fold CV Error Logistic Regression KNN

  42. STT592-002: Intro. to Statistical Learning The Bootstrap • Bootstrap is a widely applicable and extremely powerful statistical tool. • Can be used to quantify the uncertainty associated with a given estimator or statistical learning method. • Ideally, we can repeatedly obtain independent data sets from the population

  43. STT592-002: Intro. to Statistical Learning The Bootstrap • In practice, it will not work because for real data we can NOT generate new samples from original population. • Instead obtain distinct data sets by repeatedly sampling observations from original data set. • The sampling is performed with replacement, which means that the replacement same observation can occur more than once in the bootstrap data set.

  44. STT592-002: Intro. to Statistical Learning The Bootstrap: Example #1

  45. STT592-002: Intro. to Statistical Learning Bootstrap: Example #2

  46. STT450-550: Statistical Data Mining Toy examples set.seed(100) sample(1:3, 3, replace = T) # set.seed(15) sample(1:3, 3, replace = T) # set.seed(594) sample(1:3, 3, replace = T) # set.seed(500) sample(1:3, 3, replace = T) # set.seed(200) sample(1:3, 3, replace = T)

  47. STT592-002: Intro. to Statistical Learning 5-fold CV for Time Series Data • Blogs: • https://stats.stackexchange.com/questions/14099/using-k-fold-cross-validation-for-time-series-model-selection • http://francescopochetti.com/pythonic-cross-validation-time-series-pandas-scikit-learn/ • Papers: • https://www.sciencedirect.com/science/article/pii/S0020025511006773

More Related