Understanding Naïve Bayes Classifier for Samsung Galaxy S10 Likability Survey

The Naïve Bayes Classifier Chapter 8

Samsung Galaxy S10 Survey • Suppose we conduct a survey about how likely the new Samsung model will be liked by 50 users. • 60% of users say they like it • 40% of the users say they don’t like it • For a new user, what will be your prediction about liking the phone?

Samsung Galaxy S10 Survey • Suppose the survey had the following questions • Do you like the display? (Yes/No) • Do you like the speed? (Yes/No) • Now, if we have a new user, how can we use the answers of these questions by the user to predict whether the user likes the phone, or not? • Find the probability of liking based on the conditions – two questions are the conditions.

Bayes’ Approach P(like| Yes, Yes) In Bayes’ approach, we predict the probability of a class given values of categorical variables. This is a very useful method in situations where finding categorical variable is easy, and finding the exact value of a variable is difficult.

Other Possible Applications • Online security software • Malicious user vs. normal user • Medical diagnosis • Predicting disease based on symptoms • Recommendation Systems • Recommending items on web sites • Profiling • User profiling for promotions

Characteristics • Data-driven, not model-driven • Makes no assumptions about the data

Bayes Classifier: Usage Requires categorical variables Numerical variable must be binned (categorized) Can be used with very large data sets Example: Spell check programs assign your misspelled word to an established “class” (i.e., correctly spelled word)

Exact Bayes: How does it Work? • A new person is observed with Runny nose and Sneezing symptoms. • What is the classification for this person? Flu? Or not Flu? • What is the probability that the person has Flu?

Some Examples for Practice Calculate: Show the use of Bayes’ formula. P(R=Buy | MC=2, NPM=1)? (Interesting!!) P(R= Buy | MC = 2, NPM =3)? P(R=sell | MC=3, NPM =3)? P(R=Hold | MC=2, NPM =1)?

Problems with Exact Bayes Classification • Large amount of data required • Suppose a class Flu(0,1) is predicted based on three symptoms • Runny nose (Yes/No) • Sneezing (Yes/No) • Fever (Yes/N) • Total data required is 2*2*2*2=16 cases, each case represents a combination of values: • (Flu,RN,S,F) = (1,Yes,Yes,Yes), (1,No,Yes,Yes), …, (0,Yes,Yes,Yes), (0,No,Yes,Yes), … 16 records in all at least

Solution – Naïve Bayes • Assume independence of the predictor variables given the class. • Use multiplication rule • Find same probability that record belongs to class C, given predictor values, without limiting calculation to records that share all those same values • In the previous example of (Flu, RN, S,F), for estimating any probability we need (4+4+4=12) records at least • (Flu,RN): (1,Yes), (1,No),(0,Yes),(0,No) – 4 cases • (Flu,S): (1,Yes), (1,No),(0,Yes),(0,No) – 4 cases • (Flu,F): (1,Yes), (1,No),(0,Yes),(0,No) - 4 cases

Calculation Illustration • Possible classes: Fraud (F), Genuine (G) • Record: • Salary – 45000 to 50000 (S1) • - 50001 to 55000 (S2) • Age – 25-30 (A1) • - 31-35 (A2) • Do this calculation for P(G|S2,A2)

Example: Financial Fraud • Target variable: Audit finds fraud, no fraud • Predictors: • Prior pending legal charges (yes/no) • Size of firm (small/large)

Example • Compute using conditional independence assumption (naïve Bayes’): • P(T | Charges = Y, Size = small)

Use Naïve Bayes Now You can compute using Naïve Bayes P(R= Buy | MC = 2, NPM =3), which you cannot compute using Exact Bayes. Try it!

Class Exercise • Compute using conditional independence assumption (naïve Bayes’): • P(F| Charges = Y, Size = large) • P(F | Charges = N, Size = small) • Also, find the exact Bayes’ values. • You classify a case as T if the conditional probability of T>0.5. • For P(F | Charges = N, Size = small), what is the classification (T or F)? • Are classifications (based on value based order of T & F) different between Naïve Bayes and Exact Bayes?

Naïve Bayes, cont. Usually probability estimate does not differ greatly from exact All records are used in calculations, not just those matching predictor values This makes calculations practical in most circumstances Relies on assumption of independence between predictor variables within each class

Independence Assumption Not strictly justified (variables often correlated with one another) Often “good enough”

Advantages Handles purely categorical data well Works well with very large data sets Simple & computationally efficient Probability rankings are more accurate than the actual probability estimates

Shortcomings • Requires large number of records • Exact Bayes requires more data than Naïve Bayes • For calculations to be reliable, the more the data, the better the values • Problematic when a predictor category is not present in training data Assigns 0 probability of response, ignoring information in other variables

Chapter in the Book Chapter 8 in the book

Understanding Naïve Bayes Classifier for Samsung Galaxy S10 Likability Survey