400 likes | 436 Views
Explore how probabilistic generative models work for various applications like credit scoring, medical diagnosis, and recognition tasks. Learn about training data, classification functions, and model optimization techniques.
E N D
Classification: Probabilistic Generative Model Disclaimer: This PPT is modified based on Dr. Hung-yi Lee http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html
Classification Function Class n • Credit Scoring • Input: income, savings, profession, age, past financial history …… • Output: accept or refuse • Medical Diagnosis • Input: current symptoms, age, gender, past medical history …… • Output: which kind of diseases • Handwritten digit recognition • Face recognition • Input: image of a face, output: person Input: output: 0
How to do Classification • Training data for Classification Classification as Regression? Binary classification as example Training: Class 1 means the target is 1; Class 2 means the target is -1 Testing: closer to 1 class 1; closer to -1 class 2
b + w1x1 + w2x2 = 0 to decrease error Class 2 Class 2 -1 -1 • Multiple class: Class 1 means the target is 1; Class 2 means the target is 2; Class 3 means the target is 3 …… problematic 1 1 outliers x2 x2 >>1 Class 1 Class 1 error y = b + w1x1 + w2x2 x1 x1 Penalize to the examples that are “too correct” … (Bishop, P186)
Ideal Alternatives • Function (Model): • Loss function (indicator function): • Find the best function: • Example: Perceptron, SVM f Output = class 1 Output = class 2 The number of times f get incorrect results on training data. Not Today
Two Boxes STT315: Law of Total Prob: Box 2 Box 1 Prior P(B2) = 1/3 Prior P(B1) = 2/3 P(Blue|B1) = 2/5 P(Blue|B1) = 4/5 P(Green|B1) = 3/5 P(Green|B1) = 1/5 from one of the boxes Where does it come from? (Bayes’ Rule) P(B1 | Blue)
Two Boxes Estimating the Probabilities from training data Class 2 Class 1 P(C1) P(C2) P(x|C2) P(x|C1) Given an x, which class does it belong to Generative Model
Prior Probability Class 1 Class 2 P(C1) P(C2) Water Normal Water and Normal type with ID < 400 for training, rest for testing • Training: 79 Water, 61 Normal P(C1) = 79 / (79 + 61) =0.56 P(C2) = 61 / (79 + 61) =0.44
Probability from Class P(x|C1) = ? = ? Each Pokémon is represented as a vector by its attribute. Feature vectors Training data 79 in total P( |Water) Water Type
P(x|C1) = ? Probability from Class - Feature • Considering Defense and SP Defense …… , 0? P(x|Water)=? Assume the points are sampled from a Gaussian distribution. Water Type
Maximum Likelihoodstart…in order to find mean and covariance matrix
Review: Gaussian (Normal) Distribution Input: vector x, output: probability of sampling x The shape of the function determines by mean and covariance matrix http://www.k-wave.org/documentation/getWin.php https://www.google.com/search?q=2d+gaussian+model&rlz=1C1GCEU_enUS821US821&source=lnms&tbm=isch&sa=X&ved=0ahUKEwj0ivPqgsjgAhUQpFkKHfJpAD0Q_AUIDigB&biw=1024&bih=546
https://blog.slinuxer.com/tag/pca Gaussian Distribution Input: vector x, output: probability of sampling x The shape of the function determines by mean and covariance matrix , ,
https://blog.slinuxer.com/tag/pca Gaussian Distribution Input: vector x, output: probability of sampling x The shape of the function determines by mean and covariance matrix ,
Probability from Class Assume the points are sampled from a Gaussian distribution Find the Gaussian distribution behind them Probability for new points …… , New x How to find them? Water Type
Maximum Likelihood …… , The Gaussian with any mean and covariance matrix can generate these points. Different Likelihood Likelihood of a Gaussian with mean and covariance matrix = the probability of the Gaussian samples …… ,
Maximum Likelihood to estimate mean and covariance matrix We have the “Water” type Pokémons: …… , We assume …… ,generate from the Gaussian () with the maximum likelihood sample covariance matrix sample mean
Maximum Likelihood to find mean 𝝁 and covariance matrix 𝜮 Class 2: Normal Class 1: Water ,
Maximum Likelihoodend…now we can find P(x|C1) = ?P(x|C2) = ?
Now we can do classification P(C1) = 79 / (79 + 61) =0.56 P(C2) = 61 / (79 + 61) =0.44 If x belongs to class 1 (Water)
Blue points: C1 (Water), Red points: C2 (Normal) How’s the results? Testing data: 47% accuracy All: total, hp, att, spatt, de, sp de, speed (6 features) 7-dim vector 7 x 7 matrices 64% accuracy …
Modifying Model Class 2: Normal Class 1: Water , The same Less parameters
Modifying Model Ref: Bishop, chapter 4.2.2 • Maximum likelihood “Water” type Pokémons: “Normal” type Pokémons: …… , …… , Find , , maximizing the likelihood and is the same
Modifying Model The boundary is linear The same covariance matrix (linear boundary) All: total, hp, att, spatt, de, sp de, speed 54% accuracy 73% accuracy
Three Steps • Function Set (Model): • Goodness of a function: • The mean and covariance that maximizing the likelihood (the probability of generating data) • Find the best function: easy If , output: class 1 Otherwise, output: class 2
Naïve Bayes Classifier-- Probability distribution • You can always choose the distribution you like …… …… 1-D Gaussian (diagonal covariance matrix) For binary features, you may assume they are from Bernoulli distributions. If you assume all the dimensions are independent, then you are using Naive Bayes Classifier. If this assumption is not true generative model
Posterior Probability Sigmoid function
Posterior Probability , sigmoid
(2) How about directly find w and b? (1) In generative model, we estimate , , , based on Normal assumption. Then we find w and b
Reference • Bishop: Chapter 4.1 – 4.2 • Data: https://www.kaggle.com/abcsds/pokemon • Useful posts: • https://www.kaggle.com/nishantbhadauria/d/abcsds/pokemon/pokemon-speed-attack-hp-defense-analysis-by-type • https://www.kaggle.com/nikos90/d/abcsds/pokemon/mastering-pokebars/discussion • https://www.kaggle.com/ndrewgele/d/abcsds/pokemon/visualizing-pok-mon-stats-with-seaborn/discussion
Review: STT315: 2.10 The Law of total prob and bayes’ rules • Def: For some positive integer k, let B1,B2,…Bkbe such that Then the collection of sets {B1,B2,…Bk}is said to be a partition of S. Example: Based on the example, we have:
Review: STT315: 2.10 The Law of total prob and bayes’ rules • Thm 2.8 If the events B1, B2,…, Bkconstitute a partition of the sample space S such that P(Bi)≠0 for i=1,2,…,k, then for any event A of S, • Thm 2.9, Bayes’ Rule: If the events B1, B2,…, Bkconstitute a partition of the sample space S such that P(Bi)≠0 for i=1,2,…,k, then for any event A in S such that P(A)≠ 0,
Review: STT315: 2.10 The Law of total prob and bayes’ rules • When r=2, the special case is : • Ex2.124: 40% rep and 60% dem. 30% of rep and 70% of the dem favor an election issue. If a random picked voter is in favor of the issue, find the prob that this person is a dem.