140 likes | 362 Views
Bayes Net Classifiers The Naïve Bayes Model. Oliver Schulte Machine Learning 726. Classification. Suppose we have a target node V such that all queries of interest are of the form P(V=v| values for all other variables).
E N D
Bayes Net ClassifiersThe Naïve Bayes Model Oliver Schulte Machine Learning 726
Classification • Suppose we have a target node V such that all queries of interest are of the formP(V=v| values for all other variables). • Example: predict whether patient has bronchitis given values for all other nodes. • Because we know form of query, we can optimize the Bayes net. • V is called the class variable. • v is called the class label. • The other variables are called features.
Optimizing the Structure • Some nodes are irrelevant to a target node, given the others. • Examples • Can you guess the pattern? • The Markov blanket of a node contains: • The neighbors. • The spouses (co-parents).
The Markov Blanket • The Markov blanket of a node contains: • The neighbors. • The spouses (co-parents).
How to Build a Bayes net classifier • Eliminate nodes not in the Markov blanket. • Feature Selection. • Learn parameters. • Fewer dimensions!
Classification Models • A Bayes net is a very general probability model. • Sometimes want to use more specific models. • More intelligible for some users. • Models make assumptions : if correct → better learning. • Widely used Bayes net-type classifier: Naïve Bayes.
The Naïve Bayes Model • Given class label, features are independent. • Intuition: The only way in which features interact is through the class label. • Also: We don’t care about correlations among features. Temperature Wind Humidity Outlook PlayTennis
The Naive Bayes Classification Model • Exercise: Use the Naive Bayes Assumption to find a simple expression for P(PlayTennis=yes|o,t,w,h) • Solution: • multiply the numbers in each column • Divide by P(o,t,w,h)
Example Normalization: P(PT=yes|features) = 0.0053/0.0053+0.0206 = 20.5%.
Naive Bayes Learning • Use maximum likelihood estimates, i.e. observed frequencies. • Linear number of parameters! • Example: see previous slide. • Weka.NaiveBayesSimple uses Laplace estimation. • For another refinement, can perform feature selection first. • Can also apply boosting to Naive Bayes learning, very competitive. Temperature Wind Humidity Outlook PlayTennis
Ratio/OddsClassification Formula • If we only care about classification, can ignore normalization constant. • Ratios of feature probabilities more numeric stability. • Exercise: Use the Naive Bayes Assumption to find a simple expression for the posterior oddsP(class=yes|features)/P(class = no|features). • Product = 0.26, see examples.xlsx • Positive or negative?
Log-Odds Formula • For even more numeric stability, use logs. • Intuitive interpretation: each feature “votes” for a class,then we add up votes. • Sum = -1.36, see examples.xlsx • Positive or negative? • Linear discriminant: add up feature terms, accept if >0.