CHAPTER 7

CHAPTER 7 BAYESIAN NETWORK INDEPENDENCEBAYESIAN NETWORK INFERENCEMACHINE LEARNING ISSUES

Review: Alarm Network

Causality? • When Bayesian Networks reflect the true causal patterns:  Often simpler (nodes have fewer parents)  Often easier to think about  Often easier to elicit from experts • BNs need not actually be causal  Sometimes no causal net exists over the domain  E.g. consider the variables Traffic and RoofDrips  End up with arrows that reflect correlation, not causation • What do the arrows really mean?  Topology may happen to encode causal structure  Topology really encodes conditional independencies

Creating Bayes’ Nets • Last time: we talked about how any fixed Bayesian Network encodes a joint distribution • Today: how to represent a fixed distribution as a Bayesian Network  Key ingredient: conditional independence  The exercise we did in “causal” assembly of BNs was a kind of intuitive use of conditional independence  Now we have to formalize the process • After that: how to answer queries (inference)

Conditional Independence

Independence in a BN

Causal Chains

Common Cause

Common Effect

The General Case

Reachability

Reachability (the Bayes Ball)

Example

Inference

Reminder: Alarm Network

Atomic Inference

Inference by Enumeration

Evaluation Tree

Variable Elimination • Still lots of redundant work in the computation tree! • We can save time if we cache all partial results • This is the basic idea behind the variable elimination algorithm • Compute and store factors over variables which represent results of intermediate computations • All CPDs are factors, but not all factors are CPDs • Thus not always “human interpretable” • Just improves efficiency, doesn’t improve worst case time complexity • Still exponential in the number of variables • That’s all we’ll expect you to know!

Classification

Tuning on Held-Out Data

Confidences from a Classifier

Precision vs. Recall

Errors, and What to Do

What to Do About Errors? • Need more features: words aren’t enough!  Have you emailed the sender before?  Have 1K other people just gotten the same email?  Is the sending information consistent?  Is the email in ALL CAPS?  Do inline URLs point where they say they point?  Does the email address you by (your) name? • Naïve Bayes models can incorporate a variety of features, but tend to do best in homogeneous cases (e.g. all features are word occurrences)

Features • A feature is a function which signals a property of the input • Examples:  ALL_CAPS: value is 1 iff email in all caps  HAS_URL: value is 1 iff email has a URL  NUM_URLS: number of URLs in email  VERY_LONG: 1 iff email is longer than 1K  SUSPICIOUS_SENDER: 1 iff reply-to domain doesn’t match originating server • Features are anything you can think of code to evaluate on an input  Some cheap, some very very expensive to calculate  Can even be the output of another classifier  Domain knowledge goes here! • In Naïve Bayes, how did we encode features?

Feature Extractors

Generative vs. Discriminative • Generative classifiers:  E.g. Naïve Bayes  We build a causal model of the variables  We then query that model for causes, given evidence • Discriminative classifiers:  E.g. Perceptron (next)  No causal model, no Bayes rule, often no probabilities  Try to predict output directly  Loosely: mistake driven rather than model driven

Some (Vague) Biology

CHAPTER 7

CHAPTER 7

Presentation Transcript

Chapter 7

Chapter 7

Chapter 7

CHAPTER 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7

Chapter 7