Classification: Probabilistic Generative Model

Classification: Probabilistic Generative Model Disclaimer: This PPT is modified based on Dr. Hung-yi Lee http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html

Classification Function Class n • Credit Scoring • Input: income, savings, profession, age, past financial history …… • Output: accept or refuse • Medical Diagnosis • Input: current symptoms, age, gender, past medical history …… • Output: which kind of diseases • Handwritten digit recognition • Face recognition • Input: image of a face, output: person Input: output: 0

Example Application

How to do Classification • Training data for Classification Classification as Regression? Binary classification as example Training: Class 1 means the target is 1; Class 2 means the target is -1 Testing: closer to 1 class 1; closer to -1 class 2

b + w1x1 + w2x2 = 0 to decrease error Class 2 Class 2 -1 -1 • Multiple class: Class 1 means the target is 1; Class 2 means the target is 2; Class 3 means the target is 3 …… problematic 1 1 outliers x2 x2 >>1 Class 1 Class 1 error y = b + w1x1 + w2x2 x1 x1 Penalize to the examples that are “too correct” … (Bishop, P186)

Ideal Alternatives • Function (Model): • Loss function (indicator function): • Find the best function: • Example: Perceptron, SVM f Output = class 1 Output = class 2 The number of times f get incorrect results on training data. Not Today

Two Boxes Estimating the Probabilities from training data Class 2 Class 1 P(C1) P(C2) P(x|C2) P(x|C1) Given an x, which class does it belong to Generative Model

Prior Probability Class 1 Class 2 P(C1) P(C2) Water Normal Water and Normal type with ID < 400 for training, rest for testing • Training: 79 Water, 61 Normal P(C1) = 79 / (79 + 61) =0.56 P(C2) = 61 / (79 + 61) =0.44

Probability from Class P(x|C1) = ? = ? Each Pokémon is represented as a vector by its attribute. Feature vectors Training data 79 in total P( |Water) Water Type

P(x|C1) = ? Probability from Class - Feature • Considering Defense and SP Defense …… , 0? P(x|Water)=? Assume the points are sampled from a Gaussian distribution. Water Type

Maximum Likelihoodstart…in order to find mean and covariance matrix

Review: Gaussian (Normal) Distribution Input: vector x, output: probability of sampling x The shape of the function determines by mean and covariance matrix http://www.k-wave.org/documentation/getWin.php https://www.google.com/search?q=2d+gaussian+model&rlz=1C1GCEU_enUS821US821&source=lnms&tbm=isch&sa=X&ved=0ahUKEwj0ivPqgsjgAhUQpFkKHfJpAD0Q_AUIDigB&biw=1024&bih=546

https://blog.slinuxer.com/tag/pca Gaussian Distribution Input: vector x, output: probability of sampling x The shape of the function determines by mean and covariance matrix , ,

https://blog.slinuxer.com/tag/pca Gaussian Distribution Input: vector x, output: probability of sampling x The shape of the function determines by mean and covariance matrix ,

Probability from Class Assume the points are sampled from a Gaussian distribution Find the Gaussian distribution behind them Probability for new points …… , New x How to find them? Water Type

Maximum Likelihood …… , The Gaussian with any mean and covariance matrix can generate these points. Different Likelihood Likelihood of a Gaussian with mean and covariance matrix = the probability of the Gaussian samples …… ,

Maximum Likelihood to estimate mean and covariance matrix We have the “Water” type Pokémons: …… , We assume …… ,generate from the Gaussian () with the maximum likelihood sample covariance matrix sample mean

Maximum Likelihood to find mean 𝝁 and covariance matrix 𝜮 Class 2: Normal Class 1: Water ,

Maximum Likelihoodend…now we can find P(x|C1) = ?P(x|C2) = ?

Now we can do classification  P(C1) = 79 / (79 + 61) =0.56 P(C2) = 61 / (79 + 61) =0.44 If x belongs to class 1 (Water)

Blue points: C1 (Water), Red points: C2 (Normal) How’s the results? Testing data: 47% accuracy  All: total, hp, att, spatt, de, sp de, speed (6 features) 7-dim vector 7 x 7 matrices 64% accuracy …

Modifying Model Class 2: Normal Class 1: Water , The same Less parameters

Modifying Model Ref: Bishop, chapter 4.2.2 • Maximum likelihood “Water” type Pokémons: “Normal” type Pokémons: …… , …… , Find , , maximizing the likelihood and is the same

Modifying Model The boundary is linear The same covariance matrix (linear boundary) All: total, hp, att, spatt, de, sp de, speed 54% accuracy 73% accuracy

Three Steps • Function Set (Model): • Goodness of a function: • The mean and covariance that maximizing the likelihood (the probability of generating data) • Find the best function: easy If , output: class 1 Otherwise, output: class 2

Naïve Bayes Classifier-- Probability distribution • You can always choose the distribution you like  …… …… 1-D Gaussian (diagonal covariance matrix) For binary features, you may assume they are from Bernoulli distributions. If you assume all the dimensions are independent, then you are using Naive Bayes Classifier. If this assumption is not true  generative model

Posterior Probability Sigmoid function

Warning of Math

Posterior Probability , sigmoid

End of Warning

(2) How about directly find w and b? (1) In generative model, we estimate , , , based on Normal assumption.  Then we find w and b

Reference • Bishop: Chapter 4.1 – 4.2 • Data: https://www.kaggle.com/abcsds/pokemon • Useful posts: • https://www.kaggle.com/nishantbhadauria/d/abcsds/pokemon/pokemon-speed-attack-hp-defense-analysis-by-type • https://www.kaggle.com/nikos90/d/abcsds/pokemon/mastering-pokebars/discussion • https://www.kaggle.com/ndrewgele/d/abcsds/pokemon/visualizing-pok-mon-stats-with-seaborn/discussion

Review: STT315: 2.10 The Law of total prob and bayes’ rules • Def: For some positive integer k, let B1,B2,…Bkbe such that Then the collection of sets {B1,B2,…Bk}is said to be a partition of S. Example: Based on the example, we have:

Review: STT315: 2.10 The Law of total prob and bayes’ rules • Thm 2.8 If the events B1, B2,…, Bkconstitute a partition of the sample space S such that P(Bi)≠0 for i=1,2,…,k, then for any event A of S, • Thm 2.9, Bayes’ Rule: If the events B1, B2,…, Bkconstitute a partition of the sample space S such that P(Bi)≠0 for i=1,2,…,k, then for any event A in S such that P(A)≠ 0,

Review: STT315: 2.10 The Law of total prob and bayes’ rules • When r=2, the special case is : • Ex2.124: 40% rep and 60% dem. 30% of rep and 70% of the dem favor an election issue. If a random picked voter is in favor of the issue, find the prob that this person is a dem.

Classification: Probabilistic Generative Model