820 likes | 913 Views
Bayesian Inference and Networks: Why should a biologist care?. Paul E. Anderson, Ph.D. Why should we care?. It let’s us answer the questions we really want to know!. http://www.sciencemag.org/content/294/5550/2310.full.pdf. Introduction.
E N D
Bayesian Inference and Networks: Why should a biologist care? Paul E. Anderson, Ph.D.
Why should we care? • It let’s us answer the questions we really want to know! http://www.sciencemag.org/content/294/5550/2310.full.pdf P. Anderson, College of Charleston
Introduction • Suppose you are trying to determine if a patient has pneumonia. You observe the following symptoms: • The patient has a cough • The patient has a fever • The patient has difficulty breathing
Introduction You would like to determine how likely the patient has pneumonia given that the patient has a cough, a fever, and difficulty breathing We are not 100% certain that the patient has pneumonia because of these symptoms. We are dealing with uncertainty!
Introduction Now suppose you order a chest x-ray and the results are positive. Your belief that that the patient has pneumonia is now much higher.
Introduction • In the previous slides, what you observed affected your belief that the patient has pneumonia • This is called reasoning with uncertainty • Wouldn’t it be nice if we had some methodology for reasoning with uncertainty? Why in fact, we do...
Bayesian Networks • Bayesian networks help us reason with uncertainty • In the opinion of many AI researchers, Bayesian networks are the most significant contribution in AI in the last 10 years • They are used in many applications eg.: • Spam filtering / Text mining • Speech recognition • Robotics • Diagnostic systems • Syndromic surveillance
Bayesian Networks (An Example) From: Aronsky, D. and Haug, P.J., Diagnosing community-acquired pneumonia with a Bayesian network, In: Proceedings of the Fall Symposium of the American Medical Informatics Association, (1998) 632-636.
The intuition behind the statistics Rephrase the questions in ways we can answer! P. Anderson, College of Charleston
Answering questions about _____ • Fruit on an assembly line • Oranges, grapefruit, lemons, cherries, apples • Sensors measure: • Red intensity • Yellow intensity • Mass (kg) • Approximate volume • At the end of the line, a gate switches to deposit the fruit into the correct bin
Training the algorithm Sensors, scales, etc… Red = 2.125Yellow = 6.143Mass = 134.32Volume = 24.21 Apple
Red = 2.125Yellow = 6.143Mass = 134.32Volume = 24.21 Apple Training (2) Classifier M. Raymer – WSU, FBS
Red = 2.125Yellow = 6.143Mass = 134.32Volume = 24.21 ?? Testing Classifier ! M. Raymer – WSU, FBS
Pattern Matrix M. Raymer – WSU, FBS
Distributions • Bayesian classifiers start with an estimate of the distribution of the features Gaussian Distribution (Continuous) Binomial Distribution (Discrete) M. Raymer – WSU, FBS
Density Estimation • Parametric • Assume a Gaussian (e.g.) distribution. • Estimate the parameters (,). • Non-parametric • Histogram sampling • Bin size is critical • Gaussian smoothingcan help M. Raymer – WSU, FBS
The Gaussian distribution Multivariate (d-dimensional): Univariate: A parametric Bayesian classifier must estimate and from the training samples. M. Raymer – WSU, FBS
Making decisions • Once you have the distributions for • Each feature and • Each class • You can ask questions like… If I have an apple, what is the probability that the diameter will be between 3.2 and 3.5 inches? M. Raymer – WSU, FBS
More decisions… Non-parametric Parametric Count Diameter M. Raymer – WSU, FBS
A Simple Example • You are given a fruit with adiameter of 4” – is it a pear or an apple? • To begin, we need to know the distributions of diameters for pears and apples. M. Raymer – WSU, FBS
Maximum Likelihood Class-Conditional Distributions P(x) 1” 2” 3” 4” 5” 6” M. Raymer – WSU, FBS
What are we asking? • If the fruit is an apple, how likely is itto have a diameter of 4”? • If the fruit is a xenofruit from planet Xircon, how likely is it to have a diameter of 4”? Is this the right question to ask? M. Raymer – WSU, FBS
A Key Problem • We based this decision on (class conditional) • What we really want to use is (posterior probability) • What if we found the fruit in a pear orchard? • We need to know the prior probability of finding an apple or a pear! M. Raymer – WSU, FBS
Statistical decisions… • If a fruit has a diameter of 4”, how likely is it to be an apple? 4” Fruit Apples M. Raymer – WSU, FBS
“Inverting” the question Given an apple, what is the probability that it will have a diameter of 4”? Given a 4” diameter fruit, what is the probability that it is an apple? M. Raymer – WSU, FBS
Prior Probabilities • Prior probability + Evidence Posterior Probability • Without evidence, what is the “prior probability” that a fruit is an apple? M. Raymer – WSU, FBS
The heart of it all • Bayes Rule M. Raymer – WSU, FBS
Bayes Rule or M. Raymer – WSU, FBS
Example Revisited • Is it an ordinary apple or an uncommon pear? M. Raymer – WSU, FBS
Bayes Rule Example M. Raymer – WSU, FBS
Bayes Rule Example M. Raymer – WSU, FBS
Solution M. Raymer – WSU, FBS
Marginal Distributions M. Raymer – WSU, FBS
Combining Marginals • Assuming independent features: • If we assume independence and use Bayes rule, we have a Naïve Bayes decision maker (classifier). M. Raymer – WSU, FBS
Bayes Decision Rule • Provably optimal when the features (evidence) follow Gaussian distributions, and are independent. M. Raymer – WSU, FBS
Likelihood Ratios • When deciding between two possibilities, we don’t need the exact probabilities. We only need to know which one is greater. • The denominator for all the classes is always equal. • Can be eliminated • Useful when there are many possible classes M. Raymer – WSU, FBS
Likelihood Ratio Example M. Raymer – WSU, FBS
Likelihood Ratio Example M. Raymer – WSU, FBS
In-class example: Oranges Grapefruit M. Raymer – WSU, FBS
Example (cont’d) • After observing several hundred fruit pass down the assembly line, we observe that • 72% are oranges • 28% are grapefruit • Fruit ‘x’ • Red intensity = 8.2 • Mass = 7.6 What shall we predict for the class of fruit ‘x’? M. Raymer – WSU, FBS
The whole enchilada and… (Naïve assumption) Repeat for grapefruit and predict the more probable class. M. Raymer – WSU, FBS
The whole enchilada (2) M. Raymer – WSU, FBS
The whole enchilada (3) M. Raymer – WSU, FBS
Conclusion Predict that fruit ‘x’ is a grapefruit, despite the relative scarcity of grapefruits on the conveyor belt. M. Raymer – WSU, FBS
Abbreviated • Since the denominator is the same for all classes, we can just compare: and M. Raymer – WSU, FBS
Likelihood comparison M. Raymer – WSU, FBS
What if we want more complexity? Bayesian Networks P. Anderson, College of Charleston
Bayesian Networks are built upon Independence Variables A and B are independent if any of the following hold: • P(A,B) = P(A)P(B) • P(A | B) = P(A) • P(B | A) = P(B) This says that knowing the outcome of A does not tell me anything new about the outcome of B.
Independence How is independence useful? • Suppose you have n coin flips and you want to calculate the joint distribution P(C1, …, Cn) • If the coin flips are not independent, you need 2n values in the table • If the coin flips are independent, then Each P(Ci) table has 2 entries and there are n of them for a total of 2n values
Conditional Independence Variables A and B are conditionally independent given C if any of the following hold: • P(A, B | C) = P(A | C)P(B | C) • P(A | B, C) = P(A | C) • P(B | A, C) = P(B | C) Knowing C tells me everything about B. I don’t gain anything by knowing A (either because A doesn’t influence B or because knowing C provides all the information knowing A would give)