400 likes | 678 Views
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by K.I. Kim Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/. Contents. 1.1 Example: Polynomial Curve Fitting 1.2 Probability Theory 1.2.1 Probability densities
E N D
Ch 1. IntroductionPattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by K.I. Kim Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/
Contents • 1.1 Example: Polynomial Curve Fitting • 1.2 Probability Theory • 1.2.1 Probability densities • 1.2.2 Expectations and covariance • 1.2.3 Bayesian probabilities • 1.2.4 The Gaussian distribution • 1.2.5 Curve fitting re-visited • 1.2.6 Bayesian curve fitting • 1.3 Model Selection (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Pattern Recognition • Training set, • Target vector, • Training (learning) phase • Determine • Generalization • Test set • Preprocessing • Feature selection (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Supervised, Unsupervised and Reinforcement Learning • Supervised Learning: with target vector • Classification • Regression • Unsupervised learning: w/o target vector • Clustering • Density estimation • Visualization • Reinforcement learning: maximize a reward • Trade-off between exploration & exploitation (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Example: Polynomial Curve Fitting • N observations • Minimizing error function (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Model Selection & Over-fitting (1/2) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Model Selection & Over-fitting (2/2) • RMS(Root-Mean-Square) Error • Too large → Over-fitting • The more data, the better generalization • Over-fitting is a general property of maximum likelihood (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Regularization • Shrinkage • Ridge regression • Weight decay (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Probability Theory • “What is the overall probability that the selection procedure will pick an apple?” • “Given that we have chosen an orange, what is the probability that the box we chose was the blue one?” (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Rules of Probability (1/2) • Joint probability • Marginal probability • Conditional probability (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Rules of Probability (2/2) • Sum rule • Production rule • Bayes’ theorem Likelihood Prior Normalizing constant Posterior (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Probability densities (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Expectations and Covariances • Expectation • Variance • Covariance (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Bayesian Probabilities-Frequantist vs. Bayesian • Likelihood: • Frequantist • w: a fixed parameter determined by 'estimator‘ • Maximum likelihood: Error function = • Error bars: Obtained by the distribution of possible data sets • Bootstrap • Bayesian • a single data set • a probability distribution w: the uncertainty in the parameters • Prior knowledge • noninformative prior (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Bayesian Probabilities-Expansion of Bayesian Application • Limited application of full Bayesian procedure • from 18th century • Marginalize over the whole of parameter space • Markov chain Monte Carlo • Small-scale problem • Highly efficient deterministic approximation schemes • e.g. variational Bayes, expectation propagation • Large-scale problem (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Gaussian distribution • D-demensional Multivariate Gaussian Distribution (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Gaussian distribution-Example (1/2) • Getting unknown parameters • Data points are i.i.d. • Maximizing with respect to • sample mean: • Maximizing with respect to variance • sample variance: (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Gaussian distribution-Example (2/2) • Bias phenomenon • Limitation of the maximum likelihood approach (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Curve Fitting Re-visited (1/2) • Goal in the curve fitting problem • Prediction for the target variable t given some new input variable x • Determine the unknown w & by maximum likelihood (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Curve Fitting Re-visited (2/2) • maximizing likelihood = minimizing the sum-of-squares error function • Predictive distribution (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Maximum Posterior (MAP) • Add prior probability • : hyperparameter • Minimum ofequals (1.4) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Bayesian Curve Fitting • Marginalization (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Model Selection • Proper model complexity → Good generalization & best model • Measuring the generalization performance • If data are plentiful, divide into training, validation & test set • Otherwise, cross-validate • Leave-one-out technique • Drawbacks • Expensive computation • Using separate data → multiple complexity parameters • New measures of performance • e.g. Akaike information criterion(AIC), Bayesian information criterion(BIC) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/