Understanding Probability in Artificial Intelligence

Probability in Artificial IntelligenceUnit 3, Introduction to Artificial Intelligence, Stanford online course Made by: Maor Levy, Temple University 2012

Probability expresses uncertainty. • Pervasive in all of Artificial Intelligence • Machine learning • Information Retrieval (e.g., Web) • Computer Vision • Robotics • Based on mathematical calculus. • Probability of a fair coin:

Example: Probability of cancer • P(has cancer) = 0.02 • P(has cancer) = 0.98 • Multiple events: cancer, test result • P(has cancer, test positive) • The problem with joint distributions: it takes numbers to specify them!

Conditional Probability describes the cancer test: • P(test positive | has cancer) = 0.9 • P(has cancer) = 0.2 • Put this together with: Prior probability • P(has cancer) = 0.02 • P(test negative | has cancer) = 0.1 • Total probability is a fundamental rule relating marginal probabilities to conditional probabilities.

In summary: • P(has cancer) = 0.02 • P(¬has cancer) = 0.98 • P(test positive | has cancer) = 0.9 • P(has cancer) = 0.2 • P(test negative | has cancer) = 0.1 • P(test negative | has cancer) = 0.8 • P(cancer) and P(Test positive | cancer) is called the model. • Calculating P(Test positive) is called prediction. • Calculating P(Cancer | test positive) is called diagnostic reasoning.

A belief network consists of: • A directed acyclic graph with nodes labeled with random variables • a domain for each random variable • a set of conditional probability tables for each variable • given its parents (including prior probabilities for nodes with no parents). • A belief network is a graph: the nodes are random variables; there is an arc from the parents of each node into that node. • A belief network is automatically acyclic by construction. • A belief network is a directed acyclic graph (DAG) where nodes are random variables. • The parents of a node n are those variables on which n directly depends. • A belief network is a graphical representation of dependence and independence: • A variable is independent of its non-descendants given its parents.

Whether l1 is lit (L1_lit) depends only on the status of the light (L1_st) and whether there is power in wire w0. Thus, L1_lit is independent of the other variables given L1_st and W0. • In a belief network, W0 and L1_st are parents of L1_lit. • Similarly, W0 depends only on whether there is power in w1, whetherthere is power in w2, the position of switch s2 (S2_pos), and the status of switch s2 (S2_st).

To represent a domain in a belief network, you need to consider: • What are the relevant variables? • What will you observe? • What would you like to find out (query)? • What other features make the model simpler? • What values should these variables take? • What is the relationship between them? This should be expressed in terms of local influence. • How does the value of each variable depend on its parents? This is expressed in terms of the conditional probabilities.

The power network can be used in a number of ways: • Conditioning on the status of the switches and circuit • breakers, whether there is outside power and the position of the switches, you can simulate the lighting. • Given values for the switches, the outside power, and whether the lights are lit, you can determine the posterior probability that each switch or circuit breaker is ok or not. • Given some switch positions and some outputs and some intermediate values, you can determine the probability of any other variable in the network.

A Bayes network is a form of probabilistic graphical model. Specifically, a Bayes network is a directed acyclic graph of nodes representing variables and arcs representing dependence relations among the variables. • A representation of the joint distribution over all the variables represented by nodes in the graph. Let the variables be X(1), ..., X(n). • Let parents(A) be the parents of the node A. • Then the joint distribution for X(1) through X(n) is represented as the product of the probability distributions P(Xi | Parents(Xi)) for i = 1 to n: • If X has no parents, its probability distribution is said to be unconditional, otherwise it is conditional.

Examples of Bayes network:

True Bayesians actually consider conditional probabilities as more basic than joint probabilities. • It is easy to define P(A|B) without reference to the joint probability P(A,B). • Bayes’ Rule: • Back to the cancer example:

Two variables are independent if: • It means that the occurrence of one event makes it neither more nor less probable that the other occurs. • This says that their joint distribution factors into a product two simpler distributions • This implies: • We write • Independence is a simplifying modeling assumption • Empirical joint distributions: at best “close” to independent • For example: • The event of getting a 6 the first time a die is rolled and the event of getting a 6 the second time are independent. • By contrast, the event of getting a 6 the first time a die is rolled and the event that the sum of the numbers seen on the first and second trials is 8 are not independent.

Two events are dependent if the outcome or occurrence of the first affects the outcome or occurrence of the second so that the probability is changed. • Example: A card is chosen at random from a standard deck of 52 playing cards. Without replacing it, a second card is chosen. What is the probability that the first card chosen is a queen and the second card chosen is a jack? • Probabilities: • P(queen on first pick) = • P(jack on 2nd pick given queen on 1st pick) = • P(queen and jack) =

X and Y are conditionally independent given a third event Z precisely if the occurrence or non-occurrence of X and the occurrence or non-occurrence of Y are independent events in their conditional probability distribution given Z. We write:

The Naïve network has • Bayes Network needs only 47 numerical probabilities to specify the joint.

Active Triples Inactive Triples • Are X and Y conditionally independent given evidence vars {Z}? • Yes, if X and Y “separated” by Z • Look for active paths from X to Y • No active paths = independence! • A path is active if each triple is active: • Causal chain A  B  C where B is unobserved (either direction) • Common cause A  B  C where B is unobserved • Common effect (aka v-structure) A  B  C where B or one of its descendants is observed • All it takes to block a path is a single inactive segment

L R B R B • Examples: T D T T’ Yes T’ Yes Yes Yes

Overview: • Bayes network: • Graphical representation of joint distributions • Efficiently encode conditional independencies • Reduce number of parameters from exponential to linear (in many cases)

Understanding Probability in Artificial Intelligence