760 likes | 919 Views
CS188: Computational Models of Human Behavior. Introduction to graphical models slide Credits: Kevin Murphy, mark pashkin , zoubin ghahramani and jeff bilmes . Reasoning under uncertainty.
E N D
CS188: Computational Models of Human Behavior Introduction to graphical modelsslide Credits: Kevin Murphy, mark pashkin, zoubinghahramani and jeffbilmes
Reasoning under uncertainty • In many settings, we need to understand what is going on in a system when we have imperfect or incomplete information • For example, we might deploy a burglar alarm to detect intruders • But the sensor could be triggered by other events, e.g., earth-quake • Probabilities quantify the uncertainties regarding the occurrence of events
Probability spaces • A probability space represents our uncertainty regarding an experiment • It has two parts: • A sample space , which is the set of outcomes • the probability measure P, which is a real function of the subsets of • A set of outcomes A is called an event. P(A) represents how likely it is that the experiment’s actual outcome be a member of A
An example • If our experiment is to deploy a burglar alarm and see if it works, then there could be four outcomes: = {(alarm, intruder), (no alarm, intruder), (alarm, no intruder), (no alarm, no intruder)} • Our choice of P has to obey these simple rules …
The three axioms of probability theory • P(A)≥0 for all events A • P()=1 • P(A U B) = P(A) + P(B) for disjoint events A and B
Example • Let’s assign a probability to each outcome ω • These probabilities must be non-negative and sum to one
Marginal probability • Marginal probability is then the unconditional probability P(A) of the event A; that is, the probability of A, regardless of whether event B did or did not occur. • For example, if there are two possible outcomes corresponding to events B and B', this means that • P(A) = P(AB) + P(AB’) • This is called marginalization
Example • If P is defined by then P({(intruder, alarm)|(intruder, alarm),(no intruder, alarm)})
The product rule • The probability that A and B both happen is the probability that A happens and B happens, given A has occurred
The chain rule • Applying the product rule repeatedly: P(A1,A2,…,Ak) = P(A1) P(A2|A1)P(A3|A2,A1)…P(Ak|Ak-1,…,A1) • Where P(A3|A2,A1) = P(A3|A2A1)
Bayes’ rule • Use the product rule both ways with P(AB) • P(A B) = P(A)P(B|A) • P(A B) = P(B)P(A|B)
Inference • One of the central problems of computational probability theory • Many problems can be formulated in these terms. Examples: • The probability that there is an intruder given the alarm went off is pI|A(true, true) • Inference requires manipulating densities
Probabilistic graphical models • Combination of graph theory and probability theory • Graph structure specifies which parts of the system are directly dependent • Local functions at each node specify how different parts interaction • Bayesian Networks = Probabilistic Graphical Models based on directed acyclic graph • Markov Networks = Probabilistic Graphical Models based on undirected graph
Bayesian Networks • Nodes are random variables • Edges represent dependence – no directed cycles allowed) • P(X1:N) = P(X1)P(X2|X1)P(X3|X1,X2) = P(Xi|X1:i-1) = P(Xi|Xi) x1 x2 x6 x7 x3 x4 x5
Example • Water sprinkler Bayes net P(C,S,R,W)=P(C)P(S|C)P(R|C,S)P(W|C,S,R) chain rule =P(C)P(S|C)P(R|C)P(W|C,S,R) since R S|C =P(C)P(S|C)P(R|C)P(W|S,R) since W C|R,S
Problem with naïve representation of the joint probability • Problems with the working with the joint probability • Representation: big table of numbers is hard to understand • Inference: computing a marginal P(Xi) takes O(2N) time • Learning: there are O(2N) parameters to estimate • Graphical models solve the above problems by providing a structured representation for the joint • Graphs encode conditional independence properties and represent families of probability distribution that satisfy these properties
Bayesian networks provide a compact representation of the joint probability
Approach: build a Bayes’ net and use Bayes’s rule to get class probability
Conditional independence properties of Bayesian networks: chains
Conditional independence properties of Bayesian networks: common cause
Conditional independence properties of Bayesian networks: explaining away
Joint distribution of an undirected graphical model Complexity scales exponentially as 2n for binary random variable if we use a naïve approach to computing the partition function