1 / 34

Chapter 8: Model Inference and Averaging

Chapter 8: Model Inference and Averaging. Presented by Hui Fang. Basic Concepts. Statistical inference Using data to infer the distribution that generated the data We observe . We want to infer (or estimate or learn) F or some feature of F such as its mean. Statistical model

rrhone
Download Presentation

Chapter 8: Model Inference and Averaging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8: Model Inference and Averaging Presented by Hui Fang

  2. Basic Concepts • Statistical inference • Using data to infer the distribution that generated the data • We observe . • We want to infer (or estimate or learn) F or some feature of F such as its mean. • Statistical model • A set of distributions ( or a set of densities) • Parametric model • Non parametric model

  3. Statistical Model(1) • Parametric model • A set that can be parameterized by a finite number of parameters • E.g. Assume the data come from a normal distribution, the model is • A parametric model takes the form

  4. Probability density function, PDF, f(x): Cumulative density function,CDF, F(x): Statistical Model(2) • Non-parametric model • A set that cannot be parameterized by a finite number of parameters • E.g. Assume the data comes from

  5. Outline • Model Inference • Maximum likelihood inference (8.2.2) • EM Algorithm (8.5) • Bayesian inference (8.3) • Gibbs Sampling (8.6) • Bootstrap (8.2.1,8.2.3,8.4) • Model Averaging and improvement • Bagging (8.7) • Bumping (8.9)

  6. Parametric Inference • Parametric models: • The problem of inference problem of estimating the parameter • Method • Maximum Likelihood Inference • Bayesian Inference

  7. But you don’t know or MLE: For which is most likely? An Example of MLE Suppose you have

  8. Maximum likelihood estimator:Maximizes Likelihood function Likelihood function • Write Log-likelihood function 2. Work out using high-school calculus 3. Solve the set of simultaneous equations A General MLE strategy Suppose is a vector of parameters. Task: Find MLE for 4. Check you are at a maximum

  9. is true value of Fisher information Information matrix Properties of MLE(?) • Sampling distributions of the maximum likelihood estimator has a limiting normal distribution.(P230)

  10. The parameters are The log-likelihood based on the N training cases is An Example for EM Algorithm(1) • Model Y as a mixture of two normal distribution where with sum of terms is inside the logarithm=>difficult to maximize it

  11. Consider unobserved latent variables : comes from model 2; otherwise from model 1. • Take initial guesses for the parameters • Expectation Step: compute • Maximization Step: compute the values for the parameters which can maximize the log-likelihood given • Iterate steps 2 and 3 until convergence. If we knew the values of An Example for EM Algorithm(2)

  12. An Example for EM Algorithm(3)

  13. We can always recover it, since Bayesian Inference • Prior (knowledge before we see the data): • Sampling model: • After observing data Z, we update our beliefs and form the posterior distribution Doesn’t it cause a problem to throw away the constant? Posterior is proportional to likelihood times prior!

  14. Prediction using inference • Task: predict the values of a future observation • Bayesian approach • Maximum likelihood approach

  15. However, if we can draw samples then we can estimate MCMC(1) General Problem: evaluating can be difficult. where This is Monte Carlo (MC) integration.

  16. As , the Markov chain converges to its stationary distribution. MCMC(2) ? • A stochastic process is an indexed random variable where t maybe time and X is a random variable. • A Markov chain is generated by sampling So, depends only on ,not on p is the transition kernel.

  17. Two key objectives: • Generate a sample from a joint probability distribution • Estimate expectations using generated sample averages ( I.e. doing MC integration) MCMC(3) • Problem: How do we construct a Markov chain whose stationary distribution is our target distribution, ? This is called Markov chain Monte Carlo (MCMC)

  18. Gibbs Sampling(1) • Purpose: Draw from a Joint Distribution • Method: Iterative Conditional Sampling target Draw

  19. Gibbs Sampling(2) • Suppose that • Sample or update in turn: …… Always use the most recent values

  20. An Example for Conditional Sampling • Target distribution: • How to draw samples?

  21. For simplicity, assume the parameters are Recall: Same Example for EM (1) • Model Y as a mixture of two normal distribution where with

  22. EM • Take initial guesses for the parameters • Expectation Step: compute • Maximization Step: compute the values for the parameters which can maximize the log-likelihood given • Iterate steps 2 and 3 until convergence. Gibbs Comparison between EM and Gibbs Sampling • Take initial guesses for the parameters • Repeat for t=1.2.,…. • For i=1,2,…,N generate with • Generate • Continue step 2 until the joint distribution of doesn’t change

  23. Bootstrap(0) • Basic idea: • Randomly draw datasets with replacement from the training data • Each sample has the same size as the original training set Bootstrap samples …… Training sample

  24. bioequivalence Z Y Example for Bootstrap(1)

  25. The estimator is : Example for Bootstrap(2) We want to estimate What is the accuracy of the estimator?

  26. Data: • Statistic(any function of the data): • We want to know Real world Bootstrap world can be estimated with ? Bootstrap(1) • The bootstrap was introduced as a general method for assessing the statistical accuracy of an estimator.

  27. Bootstrap(2)---Detour • Suppose we draw a sample from a distribution .

  28. Bootstrap Variance Estimation • Draw • Compute • Repeat steps 1 and 2, B times, to get • Let Bootstrap(3) • Real world • Bootstrap world

  29. Bootstrap(4) • Non-parametric Bootstrap • Uses the raw data, not a specific parametric model, to generate new datasets • Parametric Bootstrap • Simulate new responses by adding Gaussian noise to the predicted values • Example from the book… • ---estimate • We simulate new (x,y) by

  30. Bootstrap(5)---Summary • Nonparametric bootstrap • No underlying distribution assumption • Parametric bootstrap agrees with maximum likelihood • Bootstrap distribution approximates posterior distribution of parameters with non-informative priors (?)

  31. Bootstrap estimators Bootstrap sample …… Original sample Bagging(1) • Bootstrap: • A way of assessing the accuracy of a parameter estimate or a prediction • Bagging (Bootstrap Aggregating) • Use bootstrap samples to predict data classifiers Classification becomes majority voting

  32. Bagging(2) • Pros • The estimator can be significantly improved if the learning algorithm is unstable. • Some change to training set causes large change in output hypothesis • Reduce the variance, bias unchanged • Cons • Degrade the performance of stable procedures ??? • Lose the structure after bagging

  33. Bootstrap estimators Bootstrap sample …… Original sample Bumping • A stochastic flavor of model selection • Bootstrap Umbrella of Model Parameters • Sample data set, train it, until we are satisfied or tired Compare different models on the training data

  34. Conclusions • Maximum Likelihood vs. Bayesian Inference • EM vs. Gibbs Sampling • Bootstrap • Bagging • Bumping

More Related