1 / 40

Classification: Probabilistic Generative Model

Explore how probabilistic generative models work for various applications like credit scoring, medical diagnosis, and recognition tasks. Learn about training data, classification functions, and model optimization techniques.

braud
Download Presentation

Classification: Probabilistic Generative Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification: Probabilistic Generative Model Disclaimer: This PPT is modified based on Dr. Hung-yi Lee http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html

  2. Classification Function Class n • Credit Scoring • Input: income, savings, profession, age, past financial history …… • Output: accept or refuse • Medical Diagnosis • Input: current symptoms, age, gender, past medical history …… • Output: which kind of diseases • Handwritten digit recognition • Face recognition • Input: image of a face, output: person Input: output: 0

  3. Example Application

  4. Example Application

  5. How to do Classification • Training data for Classification Classification as Regression? Binary classification as example Training: Class 1 means the target is 1; Class 2 means the target is -1 Testing: closer to 1 class 1; closer to -1 class 2

  6. b + w1x1 + w2x2 = 0 to decrease error Class 2 Class 2 -1 -1 • Multiple class: Class 1 means the target is 1; Class 2 means the target is 2; Class 3 means the target is 3 …… problematic 1 1 outliers x2 x2 >>1 Class 1 Class 1 error y = b + w1x1 + w2x2 x1 x1 Penalize to the examples that are “too correct” … (Bishop, P186)

  7. Ideal Alternatives • Function (Model): • Loss function (indicator function): • Find the best function: • Example: Perceptron, SVM f Output = class 1 Output = class 2 The number of times f get incorrect results on training data. Not Today

  8. Two Boxes STT315: Law of Total Prob: Box 2 Box 1 Prior P(B2) = 1/3 Prior P(B1) = 2/3 P(Blue|B1) = 2/5 P(Blue|B1) = 4/5 P(Green|B1) = 3/5 P(Green|B1) = 1/5 from one of the boxes Where does it come from? (Bayes’ Rule) P(B1 | Blue)

  9. Two Boxes Estimating the Probabilities from training data Class 2 Class 1 P(C1) P(C2) P(x|C2) P(x|C1) Given an x, which class does it belong to Generative Model

  10. Prior Probability Class 1 Class 2 P(C1) P(C2) Water Normal Water and Normal type with ID < 400 for training, rest for testing • Training: 79 Water, 61 Normal P(C1) = 79 / (79 + 61) =0.56 P(C2) = 61 / (79 + 61) =0.44

  11. Probability from Class P(x|C1) = ? = ? Each Pokémon is represented as a vector by its attribute. Feature vectors Training data 79 in total P( |Water) Water Type

  12. P(x|C1) = ? Probability from Class - Feature • Considering Defense and SP Defense …… , 0? P(x|Water)=? Assume the points are sampled from a Gaussian distribution. Water Type

  13. Maximum Likelihoodstart…in order to find mean and covariance matrix

  14. Review: Gaussian (Normal) Distribution Input: vector x, output: probability of sampling x The shape of the function determines by mean and covariance matrix http://www.k-wave.org/documentation/getWin.php https://www.google.com/search?q=2d+gaussian+model&rlz=1C1GCEU_enUS821US821&source=lnms&tbm=isch&sa=X&ved=0ahUKEwj0ivPqgsjgAhUQpFkKHfJpAD0Q_AUIDigB&biw=1024&bih=546

  15. https://blog.slinuxer.com/tag/pca Gaussian Distribution Input: vector x, output: probability of sampling x The shape of the function determines by mean and covariance matrix , ,

  16. https://blog.slinuxer.com/tag/pca Gaussian Distribution Input: vector x, output: probability of sampling x The shape of the function determines by mean and covariance matrix ,

  17. Probability from Class Assume the points are sampled from a Gaussian distribution Find the Gaussian distribution behind them Probability for new points …… , New x How to find them? Water Type

  18. Maximum Likelihood …… , The Gaussian with any mean and covariance matrix can generate these points. Different Likelihood Likelihood of a Gaussian with mean and covariance matrix = the probability of the Gaussian samples …… ,

  19. Maximum Likelihood to estimate mean and covariance matrix We have the “Water” type Pokémons: …… , We assume …… ,generate from the Gaussian () with the maximum likelihood sample covariance matrix sample mean

  20. Maximum Likelihood to find mean 𝝁 and covariance matrix 𝜮 Class 2: Normal Class 1: Water ,

  21. Maximum Likelihoodend…now we can find P(x|C1) = ?P(x|C2) = ?

  22. Now we can do classification  P(C1) = 79 / (79 + 61) =0.56 P(C2) = 61 / (79 + 61) =0.44 If x belongs to class 1 (Water)

  23. Blue points: C1 (Water), Red points: C2 (Normal) How’s the results? Testing data: 47% accuracy  All: total, hp, att, spatt, de, sp de, speed (6 features) 7-dim vector 7 x 7 matrices 64% accuracy …

  24. Modifying Model Class 2: Normal Class 1: Water , The same Less parameters

  25. Modifying Model Ref: Bishop, chapter 4.2.2 • Maximum likelihood “Water” type Pokémons: “Normal” type Pokémons: …… , …… , Find , , maximizing the likelihood and is the same

  26. Modifying Model The boundary is linear The same covariance matrix (linear boundary) All: total, hp, att, spatt, de, sp de, speed 54% accuracy 73% accuracy

  27. Three Steps • Function Set (Model): • Goodness of a function: • The mean and covariance that maximizing the likelihood (the probability of generating data) • Find the best function: easy If , output: class 1 Otherwise, output: class 2

  28. Naïve Bayes Classifier-- Probability distribution • You can always choose the distribution you like  …… …… 1-D Gaussian (diagonal covariance matrix) For binary features, you may assume they are from Bernoulli distributions. If you assume all the dimensions are independent, then you are using Naive Bayes Classifier. If this assumption is not true  generative model

  29. Posterior Probability Sigmoid function

  30. Warning of Math

  31. Posterior Probability , sigmoid

  32. End of Warning

  33. (2) How about directly find w and b? (1) In generative model, we estimate , , , based on Normal assumption.  Then we find w and b

  34. Reference • Bishop: Chapter 4.1 – 4.2 • Data: https://www.kaggle.com/abcsds/pokemon • Useful posts: • https://www.kaggle.com/nishantbhadauria/d/abcsds/pokemon/pokemon-speed-attack-hp-defense-analysis-by-type • https://www.kaggle.com/nikos90/d/abcsds/pokemon/mastering-pokebars/discussion • https://www.kaggle.com/ndrewgele/d/abcsds/pokemon/visualizing-pok-mon-stats-with-seaborn/discussion

  35. Review: STT315: 2.10 The Law of total prob and bayes’ rules • Def: For some positive integer k, let B1,B2,…Bkbe such that Then the collection of sets {B1,B2,…Bk}is said to be a partition of S. Example: Based on the example, we have:

  36. Review: STT315: 2.10 The Law of total prob and bayes’ rules • Thm 2.8 If the events B1, B2,…, Bkconstitute a partition of the sample space S such that P(Bi)≠0 for i=1,2,…,k, then for any event A of S, • Thm 2.9, Bayes’ Rule: If the events B1, B2,…, Bkconstitute a partition of the sample space S such that P(Bi)≠0 for i=1,2,…,k, then for any event A in S such that P(A)≠ 0,

  37. Review: STT315: 2.10 The Law of total prob and bayes’ rules • When r=2, the special case is : • Ex2.124: 40% rep and 60% dem. 30% of rep and 70% of the dem favor an election issue. If a random picked voter is in favor of the issue, find the prob that this person is a dem.

More Related