290 likes | 529 Views
Announcements. Homework 8 due today, November 13 ½ to 1 page description of final project due Thursday, November 15 Current Events Christian - now Jeff - Thursday Research Paper due Tuesday, November 20. Probabilistic Reasoning. Lecture 15. Probabilistic Reasoning.
E N D
Announcements • Homework 8 due today, November 13 • ½ to 1 page description of final project due Thursday, November 15 • Current Events • Christian - now • Jeff - Thursday • Research Paper due Tuesday, November 20 CS 484 – Artificial Intelligence
Probabilistic Reasoning Lecture 15
Probabilistic Reasoning • Logic deals with certainties • A → B • Probabilities are expressed in a notation similar to that of predicates in First Order Predicate Calculus: • P(R) = 0.7 • P(S) = 0.1 • P(¬(A Λ B) V C) = 0.2 • 1 = certain; 0 = certainly not CS 484 – Artificial Intelligence
What's the probability that either A is true or B is true? Venn Diagram A A Λ B B P(A V B) = CS 484 – Artificial Intelligence
Conditional Probability • Conditional probability refers to the probability of one thing given that we already know another to be true: • This states the probability of B, given A. A A Λ B B CS 484 – Artificial Intelligence
Calculate • P(R|S) given that the probability of rain is 0.7, the probability of sun is 0.1 and the probability of rain and sun is 0.01 • P(R|S) = • Note: P(A|B) ≠ P(B|A) CS 484 – Artificial Intelligence
Joint Probability Distributions • A joint probability distribution represents the combined probabilities of two or more variables. • This table shows, for example, that P (A Λ B) = 0.11 P (¬A Λ B) = 0.09 • Using this, we can calculate P(A): P(A) = P(A Λ B) + P(A Λ ¬B) = 0.11 + 0.63 = 0.74 A Λ B A B CS 484 – Artificial Intelligence
Bayes’ Theorem • Bayes’ theorem lets us calculate a conditional probability: • P(B) is the prior probability of B. • P(B | A) is the posterior probability of B. CS 484 – Artificial Intelligence
Bayes' Theorem Deduction • Recall: CS 484 – Artificial Intelligence
Medical Diagnosis • Data • 80% of the time you have a cold, you also have a high temperature. • At any one time, 1 in every 10,000 people has a cold • 1 in every 1000 people has a high temperature • Suppose you have a high temperature. What is the likelihood that you have a cold? CS 484 – Artificial Intelligence
Witness Reliability • A hit-and-run incident has been reported, and an eye witness has stated she is certain that the car was a white taxi. • How likely is she right? • Facts: • Yellow taxi company has 90 cars • White taxi company has 10 cars • Expert says that given the foggy weather, the witness has 75% chance of correctly identifying the taxi CS 484 – Artificial Intelligence
Witness Reliability – Prior Probability • Imagine lady shown a sequence of 1000 cars • Expect 900 to be yellow and 100 to be white • Given 75% accuracy, how many will she say are white and yellow • Of 900 yellow cars, says yellow and says white • Of 100 yellow cars, says yellow and says white • What is the probability women says white? • How likely is she right? CS 484 – Artificial Intelligence
Comparing Conditional Probabilities • Medical diagnosis • Probability of cold (C) is 0.0001 • P(HT|C) = 0.8 • Probability of plague (P) is 0.000000001 • P(HT|P) = 0.99 • Relative likelihood of cold and plague CS 484 – Artificial Intelligence
Simple Bayesian Concept Learning (1) • P (H|E) is used to represent the probability that some hypothesis, H, is true, given evidence E. • Let us suppose we have a set of hypotheses H1…Hn. • For each Hi • Hence, given a piece of evidence, a learner can determine which is the most likely explanation by finding the hypothesis that has the highest posterior probability. CS 484 – Artificial Intelligence
Simple Bayesian Concept Learning (2) • In fact, this can be simplified. • Since P(E) is independent of Hiit will have the same value for each hypothesis. • Hence, it can be ignored, and we can find the hypothesis with the highest value of: • We can simplify this further if all the hypotheses are equally likely, in which case we simply seek the hypothesis with the highest value ofP(E|Hi). • This is the likelihood of E givenHi. CS 484 – Artificial Intelligence
Bayesian Belief Networks (1) • A belief network shows the dependencies between a group of variables. • If two variables A and B are independent if the likelihood that A will occur has nothing to do with whether B occurs. • C and D are dependent on A; D and E are dependent on B. • The Bayesian belief network has probabilities associated with each link. E.g., P(C|A) = 0.2, P(C|¬A) = 0.4 CS 484 – Artificial Intelligence
Bayesian Belief Networks (2) • A complete set of probabilities for this belief network might be: • P(A) = 0.1 • P(B) = 0.7 • P(C|A) = 0.2 • P(C|¬A) = 0.4 • P(D|A Λ B) = 0.5 • P(D|A Λ ¬B) = 0.4 • P(D|¬A Λ B) = 0.2 • P(D|¬A Λ ¬B) = 0.0001 • P(E|B) = 0.2 • P(E|¬B) = 0.1 CS 484 – Artificial Intelligence
Bayesian Belief Networks (3) • We can now calculate conditional probabilities: • In fact, we can simplify this, since there are no dependencies between certain pairs of variables – between E and A, for example. Hence: CS 484 – Artificial Intelligence
College Life Example • C = that you will go to college • S = that you will study • P = that you will party • E = that you will be successful in your exams • F = that you will have fun C S P E F CS 484 – Artificial Intelligence
College Life Example C S P E F CS 484 – Artificial Intelligence
College Example • Using the tables to solve problems such as P(C==true, S = true, P = false, E = true, F = false) == P(C,S, ¬P,E, ¬F) • General solution CS 484 – Artificial Intelligence
Noisy-V Function • Want to assume know all reasons for a possible event • E.g. Medical Diagnosis System • P(HT|C) = 0.8 • P(HT|P) = 0.99 • Assume P(HT|C V P) = 1 (?) • Assumption clearly not true • Leak node – represents all other causes • P(HT|O) = 0.9 • Define noise parameters – conditional probabilities for ¬HT • P(¬ HT|C) = 1 – P(HT|C) = 0.2 • P(¬ HT|P) = • P(¬ HT|O) = • Further assumption – the causes of a high temperature are independent of each other and the noisy parameters are independent CS 484 – Artificial Intelligence
Noisy V-Function • Benefit of Noisy V-Function • If cold, plague, and other is all false, P(¬HT) = 1 • Otherwise, P(¬HT) is equal to product of the noise parameters for all the variables that are true • E.g. If plague and other is true and cold is false, P(HT) = 1 – (0.01 * 0.1) = 0.999 • Benefit – don’t need to store as many values as the Bayesian belief network CS 484 – Artificial Intelligence
Bayes’ Optimal Classifier • A system that uses Bayes’ theory to classify data. • We have a piece of data y, and are seeking the correct hypothesis from H1 … H5, each of which assigns a classification to y. • The probability that y should be classified as cjis: • x1 to xn are the training data, and m is the number of hypotheses. • This method provides the best possible classification for a piece of data. • Example: Given some date will classify it as true or false • P(true|x1,…,xn) = • P(false|x1,…,xn) = CS 484 – Artificial Intelligence
The Naïve Bayes Classifier (1) • A vector of data is classified as a single classification. p(ci| d1, …, dn) • The classification with the highest posterior probability is chosen. • The hypothesis which has the highest posterior probability is the maximum a posteriori, or MAP hypothesis. • In this case, we are looking for the MAP classification. • Bayes’ theorem is used to find the posterior probability: CS 484 – Artificial Intelligence
The Naïve Bayes Classifier (2) • Since P(d1, …, dn) is a constant, independent of ci, we can eliminate it, and simply aim to find the classification ci, for which the following is maximised: • We now assume that all the attributes d1, …, dnare independent • So P(d1, …, dn|ci) can be rewritten as: • The classification for which this is highest is chosen to classify the data. CS 484 – Artificial Intelligence
Classifier Example Training Data • New piece of data to classify • (x = 2, y = 3, z =4) • Want P(ci|x=2,y=3,z=4) • P(A) * P(x=2|A) * P(y=3|A) * P(z=4|A) • P(B) * P(x=2|B) * P(y=3|B) * P(z=4|B) CS 484 – Artificial Intelligence
M-estimate • Problem with too little training data • (x=1, y=2, z=2) • P(x=1 | B) = 1/4 • P(y=2 | B) = 2/4 • P(z=2 | B) = 0 • Avoid problem by using M-estimate which pads the computation with additional samples • Conditional probability = (a + mp) / (b + m) • m = 5 (equivalent sample size) • p = 1/num_values_for_category (1/4 for x) • a = training example with category value and classification (x=1 and B is 1) • b = training examples with classification (B is 4) CS 484 – Artificial Intelligence
Collaborative Filtering • A method that uses Bayesian reasoning to suggest items that a person might be interested in, based on their known interests. • If we know that Anne and Bob both like A, B and C, and that Anne likes D then we guess that Bob would also like D. • P(Bob likes Z | Bob likes A, Bob likes B, …, Bob likes Y) • Can be calculated using decision trees: B CS 484 – Artificial Intelligence