310 likes | 470 Views
CHAPTER 7. BAYESIAN NETWORK INDEPENDENCE BAYESIAN NETWORK INFERENCE MACHINE LEARNING ISSUES. Review: Alarm Network. Causality?. When Bayesian Networks reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about
E N D
CHAPTER 7 BAYESIAN NETWORK INDEPENDENCEBAYESIAN NETWORK INFERENCEMACHINE LEARNING ISSUES
Causality? • When Bayesian Networks reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts • BNs need not actually be causal Sometimes no causal net exists over the domain E.g. consider the variables Traffic and RoofDrips End up with arrows that reflect correlation, not causation • What do the arrows really mean? Topology may happen to encode causal structure Topology really encodes conditional independencies
Creating Bayes’ Nets • Last time: we talked about how any fixed Bayesian Network encodes a joint distribution • Today: how to represent a fixed distribution as a Bayesian Network Key ingredient: conditional independence The exercise we did in “causal” assembly of BNs was a kind of intuitive use of conditional independence Now we have to formalize the process • After that: how to answer queries (inference)
Variable Elimination • Still lots of redundant work in the computation tree! • We can save time if we cache all partial results • This is the basic idea behind the variable elimination algorithm • Compute and store factors over variables which represent results of intermediate computations • All CPDs are factors, but not all factors are CPDs • Thus not always “human interpretable” • Just improves efficiency, doesn’t improve worst case time complexity • Still exponential in the number of variables • That’s all we’ll expect you to know!
What to Do About Errors? • Need more features: words aren’t enough! Have you emailed the sender before? Have 1K other people just gotten the same email? Is the sending information consistent? Is the email in ALL CAPS? Do inline URLs point where they say they point? Does the email address you by (your) name? • Naïve Bayes models can incorporate a variety of features, but tend to do best in homogeneous cases (e.g. all features are word occurrences)
Features • A feature is a function which signals a property of the input • Examples: ALL_CAPS: value is 1 iff email in all caps HAS_URL: value is 1 iff email has a URL NUM_URLS: number of URLs in email VERY_LONG: 1 iff email is longer than 1K SUSPICIOUS_SENDER: 1 iff reply-to domain doesn’t match originating server • Features are anything you can think of code to evaluate on an input Some cheap, some very very expensive to calculate Can even be the output of another classifier Domain knowledge goes here! • In Naïve Bayes, how did we encode features?
Generative vs. Discriminative • Generative classifiers: E.g. Naïve Bayes We build a causal model of the variables We then query that model for causes, given evidence • Discriminative classifiers: E.g. Perceptron (next) No causal model, no Bayes rule, often no probabilities Try to predict output directly Loosely: mistake driven rather than model driven