1 / 23

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by K.I. Kim Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/. Contents. 1.1 Example: Polynomial Curve Fitting 1.2 Probability Theory 1.2.1 Probability densities

yvonneh
Download Presentation

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch 1. IntroductionPattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by K.I. Kim Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/

  2. Contents • 1.1 Example: Polynomial Curve Fitting • 1.2 Probability Theory • 1.2.1 Probability densities • 1.2.2 Expectations and covariance • 1.2.3 Bayesian probabilities • 1.2.4 The Gaussian distribution • 1.2.5 Curve fitting re-visited • 1.2.6 Bayesian curve fitting • 1.3 Model Selection (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  3. Pattern Recognition • Training set, • Target vector, • Training (learning) phase • Determine • Generalization • Test set • Preprocessing • Feature selection (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  4. Supervised, Unsupervised and Reinforcement Learning • Supervised Learning: with target vector • Classification • Regression • Unsupervised learning: w/o target vector • Clustering • Density estimation • Visualization • Reinforcement learning: maximize a reward • Trade-off between exploration & exploitation (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  5. Example: Polynomial Curve Fitting • N observations • Minimizing error function (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  6. Model Selection & Over-fitting (1/2) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  7. Model Selection & Over-fitting (2/2) • RMS(Root-Mean-Square) Error • Too large → Over-fitting • The more data, the better generalization • Over-fitting is a general property of maximum likelihood (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  8. Regularization • Shrinkage • Ridge regression • Weight decay (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  9. Probability Theory • “What is the overall probability that the selection procedure will pick an apple?” • “Given that we have chosen an orange, what is the probability that the box we chose was the blue one?” (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  10. Rules of Probability (1/2) • Joint probability • Marginal probability • Conditional probability (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  11. Rules of Probability (2/2) • Sum rule • Production rule • Bayes’ theorem Likelihood Prior Normalizing constant Posterior (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  12. Probability densities (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  13. Expectations and Covariances • Expectation • Variance • Covariance (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  14. Bayesian Probabilities-Frequantist vs. Bayesian • Likelihood: • Frequantist • w: a fixed parameter determined by 'estimator‘ • Maximum likelihood: Error function = • Error bars: Obtained by the distribution of possible data sets • Bootstrap • Bayesian • a single data set • a probability distribution w: the uncertainty in the parameters • Prior knowledge • noninformative prior (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  15. Bayesian Probabilities-Expansion of Bayesian Application • Limited application of full Bayesian procedure • from 18th century • Marginalize over the whole of parameter space • Markov chain Monte Carlo • Small-scale problem • Highly efficient deterministic approximation schemes • e.g. variational Bayes, expectation propagation • Large-scale problem (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  16. Gaussian distribution • D-demensional Multivariate Gaussian Distribution (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  17. Gaussian distribution-Example (1/2) • Getting unknown parameters • Data points are i.i.d. • Maximizing with respect to • sample mean: • Maximizing with respect to variance • sample variance: (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  18. Gaussian distribution-Example (2/2) • Bias phenomenon • Limitation of the maximum likelihood approach (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  19. Curve Fitting Re-visited (1/2) • Goal in the curve fitting problem • Prediction for the target variable t given some new input variable x • Determine the unknown w & by maximum likelihood (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  20. Curve Fitting Re-visited (2/2) • maximizing likelihood = minimizing the sum-of-squares error function • Predictive distribution (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  21. Maximum Posterior (MAP) • Add prior probability • : hyperparameter • Minimum ofequals (1.4) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  22. Bayesian Curve Fitting • Marginalization (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  23. Model Selection • Proper model complexity → Good generalization & best model • Measuring the generalization performance • If data are plentiful, divide into training, validation & test set • Otherwise, cross-validate • Leave-one-out technique • Drawbacks • Expensive computation • Using separate data → multiple complexity parameters • New measures of performance • e.g. Akaike information criterion(AIC), Bayesian information criterion(BIC) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

More Related