1 / 16

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by J.W. Ha Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/. Contents. 1.4 The Curse of Dimensionality 1.5 Decision Theory 1.6 Information Theroy.

flowe
Download Presentation

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch 1. IntroductionPattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by J.W. Ha Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/

  2. Contents • 1.4 The Curse of Dimensionality • 1.5 Decision Theory • 1.6 Information Theroy (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  3. 1.4 The Curse of Dimensionality • The High Dimesionality Problem • Ex. Mixture of Oil, Water, Gas - 3-Class (Homogeneous, Annular, Laminar) - 12 Input Variables - Scatter Plot of x6, x7 - Predict Point X - Simple and Naïve Approach (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  4. 1.4 The Curse of Dimensionality (Cont’d) • The Shortcomings of Naïve Approach - The number of cells increase exponentially. - Needs a large training data set for cells not to be empty. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  5. 1.4 The Curse of Dimensionality (Cont’d) • Polynomial Curve Fitting Method(M Order) - Althogh D increases, it grows propotionally to Dm • The Volume of High Dimensional Sphere - Concentrated in a thin shell near the space (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  6. 1.4 The Curse of Dimensionality (Cont’d) • Gaussian Distribution (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  7. 1.5 Decision Theory • Make Optimal Decisions - Inferrence Step & Decision Step - Select Higher Posterior Probability • Minimizing the Misclassification Rate • MAP • → Minimizing Colored Area (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  8. 1.5 Decision Theory (Cont’d) • Minimizing the Expected Loss - Class마다 Missclassification의 Damage가 다르다. - Introduction of Loss Function(Cost Function) • MAP • → Minimizing Expected Loss • The Reject Option • Threshold θ • Reject if θ > Posterior Prob. (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  9. 1.5 Decision Theory (Cont’d) • Inference and Decision - Three Distinct Approach 1. Obtain Posterior Probability & Generative Models 2. Obtain Posterior Probability & Discriminative Models 3. Find Discrimitive Function (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  10. 1.5 Decision Theory (Cont’d) • The Reason to Compute the Posterior 1. Minimizing Risk 2. Reject Option 3. Compensating for Class Priors 4. Combining Models (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  11. 1.5 Decision Theory (Cont’d) • Loss Function for Regression - Multiple Target Variable Vector (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  12. 1.5 Decision Theory (Cont’d) • Minkowski Loss (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  13. 1.6 Information Theory • Entropy - The noiseless coding theorem states that the entropy is lower bound on the number of bits needed to transmit the state of a random variable. - Higher Entropy, Lager Uncertainty (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  14. 1.6 Information Theory (Cont’d) • Maximum Entropy Configuration for Continuous Variable - Constraints - Result - The distribution that maximize the differential entropy is the Gaussian • Conditional Entropy : H[x,y] = H[y|x] + H[x] (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  15. 1.6 Information Theory (Cont’d) • Relative Entropy [Kullback-Leibler divergence] • Convexity Function (Jensen’s Inequality) (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  16. 1.6 Information Theory (Cont’d) • Mutual Information - I[x, y] = H[x] – H[x|y] = H[y] – H[y|x] - If x and y are independent, I[x,y] = 0 - the Reduction in the uncertainty about x by virtue of being told the value of y (C) 2006, SNU Biointelligence Lab, http://bi.snu.ac.kr/

More Related