Solving Problems with Bayesian Probability Theory

Uncertainty Logical approach problem: we do not always know complete truth about the environment Example: Leave(t) = leave for airport t minutes before flight Query: ?

Problems • Why can’t we determine t exactly? • Partial observability • road state, other drivers’ plans • Uncertainty in action outcomes • flat tire • Immense complexity of modelling and predicting traffic

Problems • Three specific issues: • Laziness • Too much work to list all antecedents or consequents • Theoretical ignorance • Not enough information on how the world works • Practical ignorance • If if we know all the “physics”, may not have all the facts

What happens with a purely logical approach? • Either risks falsehood: • “Leave(45) will get me there on time” • Leads to conclusions to weak to do anything with: • “Leave(45) will get me there on time if there’s no snow and there’s no train crossing Route 19 and my tires remain intact and...” • Leave(1440) might work fine, but then I’d have to spend the night in the airport

Solution: Probability • Given the available evidence, Leave(35) will get me there on time with probability 0.04 • Probability address uncertainty, not degree of truth • Degree of truth handled by fuzzy logic • IsSnowing is true to degree 0.2 • Probabilities summarize effects of laziness and ignorance • We will use combination of probabilities and utilities to make decisions

Subjective or Bayesian probability • We will make probability estimates based on knowledge about the world • P(Leave(45) | No Snow) = 0.55 • Not assertions about the world • Probability assessment if the world were a certain way • Probabilities change with new information • P(Leave(45) | No Snow, 5 AM) = 0.75 • Analagous to entailment, not truth

Making decision under uncertainty • Suppose I believe the following: • P(Leave(35) gets me there on time | ...) = 0.04 • P(Leave(45) gets me there on time | ...) = 0.55 • P(Leave(60) gets me there on time | ...) = 0.95 • P(Leave(1440) gets me there on time | ...) = 0.9999 • Which action do I choose? • Depends on my preferences for missing flight vs. eating in airport, etc. • Utility theory used to represent preferences • Decision theory takes into account utility and probabilities

Axioms of Probability • For any propositions A and B: • Example: • A = computer science major • B = born in Minnesota

Notation and Concepts • Unconditional probability or prior probability: • P(Cavity) = 0.1 • P(Weather = Sunny) = 0.55 • corresponds to belief prior to arrival of any new evidence • Weather is a multivalued random variable • Could be one of <Sunny, Rain, Cloudy, Snow> • P(Cavity) shorthand for P(Cavity=true)

Probability Distributions • Probability Distribution gives probability values for all values • P(Weather) = <0.55, 0.05, 0.2, 0.2> • must be normalized: sum to 1 • Joint Probability Distribution gives probability values for combinations of random variables • P(Weather, Cavity) = 4 x 2 matrix

Earthquake=false Earthquake=true Posterior Probabilities • Conditional or Posterior probability: • P(Cavity | Toothache) = 0.8 • For conditional distributions: • P(Weather | Earthquake) =

Posterior Probabilities • More knowledge does not change previous knowledge, but may render old knowledge unnecessary • P(Cavity | Toothache, Cavity) = 1 • New evidence may be irrelevant • P(Cavity | Toothache, Schiller in Mexico) = 0.8

Definition of Conditional Probability • Two ways to think about it

Definition of Conditional Probablity • Another way to think about it • Sanity check: Why isn’t it just: • General version holds for probability distributions: • This is a 4 x 2 set of equations

Bayes’ Rule • Product rule given by • Bayes’ Rule: • Bayes’ rule is extremely useful in trying to infer probability of a diagnosis, when the probability of cause is known.

Bayes’ Rule example • Does my car need a new drive axle? • If a car needs a new drive axle, with 30% probability this car jerks around • P(jerks | needs axle) = 0.3 • Unconditional probabilites: • P(car jerks) = 1/1000 • P(needs axle) = 1/10,000 • Then: • P(needs axle | jerks) = P(jerks | needs axle) P(needs axle) ------------------------------------------ P(jerks) • = (0.3 x 1/10,000) / (1/1000) = .03 • Conclusion: 3 of every 100 cars that jerk need an axle

Not dumb question • Question: • Why should I be able to provide an estimate of P(B|A) to get P(A|B)? • Why not just estimate P(A|B) and be done with the whole thing?

Not dumb question • Answer: • Diagnostic knowledge is often more tenuous than causal knowledge • Suppose drive axles start to go bad in an “epidemic” • e.g. poor construction in a major drive axle brand two years ago is now haunting us • P(needs axle) goes way up, easy to measure • P(needs axle | jerks) should (and does) go up accordingly – but how to estimate? • P(jerks | needs axle) is based on causal information, doesn’t change

Solving Problems with Bayesian Probability Theory

Solving Problems with Bayesian Probability Theory

Presentation Transcript

Uncertainty

Uncertainty

Uncertainty

Uncertainty

UNCERTAINTY

Uncertainty

Uncertainty

Uncertainty

Uncertainty

Uncertainty

Uncertainty

Uncertainty in Uncertainty

Uncertainty

Uncertainty

Uncertainty

Uncertainty

Uncertainty

Uncertainty

Uncertainty

Uncertainty

Uncertainty

Uncertainty