460 likes | 594 Views
Bayesian Statistics: Asking the “Right” Questions. Michael L. Raymer, Ph.D. Statistical Games. “The defendant’s DNA is consistent with the evidentiary sample, and the defendant’s DNA type occurs with a frequency of one in 10,000,000,000.”.
E N D
Bayesian Statistics: Asking the “Right” Questions Michael L. Raymer, Ph.D.
Statistical Games “The defendant’s DNA is consistent with the evidentiary sample, and the defendant’s DNA type occurs with a frequency of one in 10,000,000,000.” “Only about 0.1% of wife batterers actually murder their wives. Therefore, evidence of abuse and battering should not be admissible in a murder trial.” M. Raymer – WSU, FBS
The Question • “Given the evidentiary DNA typeand the defendant’s DNA type, what is the probability that the evidence sample contains the defendant’s DNA?” • Information available: • How common is each allele in a particular population? • CPI, RMP etc. M. Raymer – WSU, FBS
An Example Problem • Suppose the rate of breast canceris 1% • Mammograms detect breast cancer in 80% of cases where it is present • 10% of the time, mammograms will indicate breast cancer in a healthy patient • If a woman has a positive mammogram result, what is the probability that she has breast cancer? M. Raymer – WSU, FBS
Results • 75% -- 3 • 50% -- 1 • 25% -- 2 • <10% -- a lot M. Raymer – WSU, FBS
Determining Probabilities • Counting all possible outcomes • If you flip a coin 4 times, what is the probability that you will get heads twice? • TTTT THTT HTTT HHTT • TTTH THTHHTTH HHTH • TTHT THHTHTHT HHHT • TTHH THHH HTHH HHHH • P(2 heads) = 6/16 = 0.375 M. Raymer – WSU, FBS
Statistical Preliminaries • Frequency and Probability • We can guess at probabilities by counting frequencies: • P(heads) = 0.5 • The law of large numbers: the more samples we take the closer we will get to 0.5. M. Raymer – WSU, FBS
Distributions • Counting frequencies gives us distributions Gaussian Distribution (Continuous) Binomial Distribution (Discrete) M. Raymer – WSU, FBS
Density Estimation • Parametric • Assume a Gaussian (e.g.) distribution. • Estimate the parameters (,). • Non-parametric • Histogram sampling • Bin size is critical • Gaussian smoothingcan help M. Raymer – WSU, FBS
Combining Probabilities • Non-overlapping outcomes: • Possible Overlap: • Independent Events: TheProduct Rule M. Raymer – WSU, FBS
Product Rule Example • P(Engine > 200 H.P.) = 0.2 • P(Color = red) = 0.3 • Assuming independence: • P(Red & Fast) = 0.2 × 0.3 = 0.06 • 1/4 * 1/10 * 1/6 * 1/8 * 1/5 1/10,000 M. Raymer – WSU, FBS
Statistical Decision Making • One variable: A ring was found at the scene of the crime. The ring is size 11. The defendant’s ring size is also 11. If a random ring were left at the crime scene, what is the probability that it would have been size 11? M. Raymer – WSU, FBS
Multiple Variables • Assume independence: • Note what happens to significant digits! The ring is size 11, and also made of platinum. M. Raymer – WSU, FBS
Which Question? • If a fruit has a diameter of 4”, how likely is it to be an apple? 4” Fruit Apples M. Raymer – WSU, FBS
“Inverting” the question Given an apple, what is the probability that it will have a diameter of 4”? Given a 4” diameter fruit, what is the probability that it is an apple? M. Raymer – WSU, FBS
Forensic DNA Evidence • Given alleles (17, 17), (19, 21),(14, 15.1), what is the probability that a DNA sample belongs to Bob? • Find all (17,17), (19,21), (14,15.1) individuals, how many of them are Bob? • How common are 17, 19, 21, 14, and 15.1 in “the population”? M. Raymer – WSU, FBS
Conditional Probabilities • For related events, we can expressprobability conditionally: • Statistical Independence: M. Raymer – WSU, FBS
Bayesian Decision Making • Terminology • We have an object, and we want to decide if it belongs to a class • Is this fruit a type of apple? • Does this DNA come from a Caucasian American? • Is this car a sports car? • We measure features of the object (evidence): • Size, weight, color • Alleles at various loci M. Raymer – WSU, FBS
Bayesian Notation • Feature/Evidence Vector: • Classes & Posterior Probability: M. Raymer – WSU, FBS
A Simple Example • You are given a fruit with adiameter of 4” – is it a pear or an apple? • To begin, we need to know the distributions of diameters for pears and apples. M. Raymer – WSU, FBS
Maximum Likelihood Class-Conditional Distributions P(x) 1” 2” 3” 4” 5” 6” M. Raymer – WSU, FBS
A Key Problem • We based this decision on (class conditional) • What we really want to use is (posterior probability) • What if we found the fruit in a pear orchard? • We need to know the prior probability of finding an apple or a pear! M. Raymer – WSU, FBS
Prior Probabilities • Prior probability + Evidence Posterior Probability • Without evidence, what is the “prior probability” that a fruit is an apple? • What is the prior probability that a DNA sample comes from the defendant? M. Raymer – WSU, FBS
The heart of it all • Bayes Rule M. Raymer – WSU, FBS
Bayes Rule or M. Raymer – WSU, FBS
Example Revisited • Is it an ordinary apple or an uncommon pear? M. Raymer – WSU, FBS
Bayes Rule Example M. Raymer – WSU, FBS
Bayes Rule Example M. Raymer – WSU, FBS
Posing the question • What are the classes? • What is the evidence? • What is the prior probability? • What is the class-conditional probability? M. Raymer – WSU, FBS
An Example Problem • Suppose the rate of breast canceris 1% • Mammograms detect breast cancer in 80% of cases where it is present • 10% of the time, mammograms will indicate breast cancer in a healthy patient • If a woman has a positive mammogram result, what is the probability that she has breast cancer? M. Raymer – WSU, FBS
Practice Problem Revisited • Classes: healthy, cancer • Evidence: positive mammogram (pos), negative mammogram (neg) • If a woman has a positive mammogram result, what is the probability that she has breast cancer? M. Raymer – WSU, FBS
A Counting Argument • Suppose we have 1000 women • 10 will have breast cancer • 8 of these will have a positive mammogram • 990 will not have breast cancer • 99 of these will have a positive mammogram • Of the 107 women with a positive mammogram, 8 have breast cancer • 8/107 0.075 = 7.5% M. Raymer – WSU, FBS
Solution M. Raymer – WSU, FBS
An Example Problem • Suppose the chance of a randomly chosen person being guilty is .001 • When a person is guilty, a DNA sample will match that individual 99% of the time. • .0001 of the time, a DNA will exhibit a false match for an innocent individual • If a DNA test demonstrates a match, what is the probability of guilt? M. Raymer – WSU, FBS
Solution M. Raymer – WSU, FBS
Marginal Distributions M. Raymer – WSU, FBS
Combining Marginals • Assuming independent features: • If we assume independence and use Bayes rule, we have a Naïve Bayes decision maker (classifier). M. Raymer – WSU, FBS
Bayes Decision Rule • Provably optimum when the features (evidence) follow Gaussian distributions, and are independent. M. Raymer – WSU, FBS
Forensic DNA • Classes: DNA from defendant, DNA not from defendant • Evidence: Allele matches at various loci • Assumption of independence • Prior Probabilities? • Assumed equal (0.5) • What is the true prior probability that an evidence sample came from a particular individual? M. Raymer – WSU, FBS
The Importance of Priors M. Raymer – WSU, FBS
Likelihood Ratios • When deciding between two possibilities, we don’t need the exact probabilities. We only need to know which one is greater. • The denominator for all the classes is always equal. • Can be eliminated • Useful when there are many possible classes M. Raymer – WSU, FBS
Likelihood Ratio Example M. Raymer – WSU, FBS
Likelihood Ratio Example M. Raymer – WSU, FBS
From alleles to identity: • It is relatively easy to find the allele frequencies in the population • Marginal probability distributions • Independence assumption • Class conditional probabilities • Equal prior probabilities • Bayesian posterior probability estimate M. Raymer – WSU, FBS
Thank you. M. Raymer – WSU, FBS
A Key Advantage • The oldest citation: T. Bayes. “An essay towards solving a problem in the doctrine of chances.” Phil. Trans. Roy. Soc., 53, 1763. M. Raymer – WSU, FBS