80 likes | 238 Views
Ache. Probe. Cavity. Tooth Ache BN. Cavity: yes, no Tooth Aches: yes, no Dental Probe: catches, doesn’t Conditional independences. Is this a naïve Bayes structure? How many distributions? How many parameters? What kind of distributions? Do we have prior beliefs about the distributions?
E N D
Ache Probe Cavity Tooth Ache BN • Cavity: yes, no • Tooth Aches: yes, no • Dental Probe: catches, doesn’t • Conditional independences Is this a naïve Bayes structure? How many distributions? How many parameters? What kind of distributions? Do we have prior beliefs about the distributions? What kind? How to express? CS446-Fall ’06
Conjugate Distributions / Priors • Pr(Ache | No Cavity) • What is the distribution? • How do we estimate the parameters? Count…what? • Observations are Binomial – sequence of Bernoulli trials • Like flipping a weighted coin • B(n,p) – takes two parameters • n is the number of flips; p is weighting for headsM.L. p is k/n • What kind of prior is natural? • Some guess at p…but how confident are we? We want a prior distribution for p. B(20,1/6) CS446-Fall ’06
Beta is Conjugate to Binomial • Distribution for p with parameters and • Think: -1 “aches” and -1 “non-aches” in the no-cavity condition • Range for Beta… • … is [0,1] • Beta(1,1) is uniform on [0,1]i.e., no prior preference • In general, and need not be integersbut they must be positive CS446-Fall ’06
False Data as Prior • Since we calculate p from observed data… • Prime the pump with some hallucinated data • Initialize our counts with x aches and y non-aches • x / (x + y) is the desired M.L. value for p • (x + y) is the strength or sharpness of the prior belief • What about non Boolean random variables? CS446-Fall ’06
In General for BNs • Discrete random variables take values from a set > 2 • Multinomial distribution instead of Binomial • Conjugate prior is Dirichlet • Dirichlet is essentially a multivariate Beta: • Instead of p and 1-p • Use p1, p2, p3,…pn-1, and 1-Sum(p1, p2, p3,…pn-1) • False-data priors work the same way CS446-Fall ’06
Naïve Bayes • Outlook: sunny, overcast, rain • Temperature: hot, mild, cool • Humidity: high, normal • Wind: weak, strong • PlayTennis: yes, no What is the structure? How many distributions? How many parameters? CS446-Fall ’06
Outlook Temp Humidity Wind Play Naïve Bayes • Outlook: sunny, overcast, rain • Temperature: hot, mild, cool • Humidity: high, normal • Wind: weak, strong • PlayTennis: yes, no What is the structure? How many distributions? How many parameters? CS446-Fall ’06
Strong Assumptions • Each category of interest to be inferredCavity vs. No Cavity OR Play vs. Can’t Playis modeled linearly • Linear = independent = Markov = no interactions = easy • Log probability of category is a hyperplane • Generative model • Calculate the probability of each assignment • Choose the higher (highest) one • What is the decision surface?i.e., what space of discriminative models are we committing to? • What about Bayes nets generally (not just naïve Bayes)? CS446-Fall ’06