1 / 36

Lecture 9. Model Inference and Averaging

Lecture 9. Model Inference and Averaging. Instructed by Jinzhu Jia. Outline. Bootstrap and ML method Bayesian method EM algorithm MCMC (Gibbs sampler) Bagging General model average Bumping. The Bootstrap and ML Methods. One Example with one dim data:

elvina
Download Presentation

Lecture 9. Model Inference and Averaging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 9. Model Inference and Averaging Instructed by Jinzhu Jia

  2. Outline • Bootstrap and ML method • Bayesian method • EM algorithm • MCMC (Gibbs sampler) • Bagging • General model average • Bumping

  3. The Bootstrap and ML Methods • One Example with one dim data: • Cubic Spline model: , j=1,2…,7 • Let be the basis matrix Prediction error:

  4. One Example

  5. Bootstrap for the above example • 1. Draw B datasets with each of size N = 50 with replacement • 2. For each data set Z*, we fit a cubic spline • 3. Using B = 200 bootstrap samples, we can obtain 95% confidence bands at each x_i

  6. Connections • Non-parametric bootstrap • Parametric bootstrap: • The process is repeated B times, say B = 200 • The bootstrap data sets: • Conclusion: the parametric bootstrap agree with the least squares! • In general, the parametric bootstrap agree with the MLE.

  7. ML Inference • Density function or probability mass function • Likelihood function • Loglikelihood function

  8. ML Inference • Score function • Information Matrix: • Observed Informaion matrix:

  9. Fisher Information Matrix • Asymptotic result: • Where is the true parameter

  10. Estimate for standard error of Confidence interval:

  11. ML Inforence • confidence region: • Example: revisit the previous smoothing example

  12. Bootstrap V.S. ML • The advantage of bootstrap: it allows us to compute MLE of standard errors even when no formulas are available

  13. Bayesian Methods • Two parts: • 1. sampling model for our data given parameters • 2. prior distribution for parameters: • Finally, we have the posterior distribution:

  14. Bayesian methods • Differences between Bayesian methods and standard (‘frequentist’) methods • BM uses of a prior distribution to express the uncertainty present before seeing the data, • BM allows the uncertainty remaining after seeing the data to be expressed in the form of a posterior distribution.

  15. Bayesian methods: prediction • In contrast, ML method uses to predict future data

  16. Bayesian methods: Example • Revisit the previous example • We first assume known. • Prior:

  17. Bayesian methods: Example

  18. How to choose a prior? • Difficult in general • Sensitivity analysis is needed

  19. EM algorithm • It is used to simplify difficult maximum likelihood problems, especially when there are missing data.

  20. Gaussian Mixture Model

  21. Gaussian Mixture Model • Introduce missing variable • But are unknown • Iterative method: Get expectation of Maximize it

  22. Gaussian Mixture Model

  23. EM algorithm

  24. MCMC for sampling from Posterior • MCMC is used to draw samples from some (posterior) distribution • Gibbs sampling -- Basic idea: • To sample from • Draw • Draw • Repeat

  25. Gibbs sampler: Example

  26. Gibbs sampling for mixtures

  27. Bagging • Bootstrap can be used to assess the accuracy of a prediction or parameter estimate • Bootstrap can also be used to improve the estimate or prediction itself. • Reduce variances of the prediction

  28. Bagging • If is linear in data, then bagging is just itself. • Take cubic smooth spline as an example. • Property: x fixed

  29. Bagging • Bagging is not good for 0-1 loss

  30. Model Averaging and Stacking • A Bayesian viewpoint

  31. Model Weights • Get the weights from BIC

  32. Model Averaging • Frequentist viewpoint Better prediction and less interpretability

  33. Bumping • Find a better single model.

  34. Example: Bumping

  35. Homework • Due May 23 • 1. reproduce Figure 8.2 • 2.reproduce Figures 8.5 and 8.6 • 3. 8.6 P293 in ESLII_print5

More Related