930 likes | 1.19k Views
Part II: Graphical models. Challenges of probabilistic models. Specifying well-defined probabilistic models with many variables is hard (for modelers) Representing probability distributions over those variables is hard (for computers/learners)
E N D
Challenges of probabilistic models • Specifying well-defined probabilistic models with many variables is hard (for modelers) • Representing probability distributions over those variables is hard (for computers/learners) • Computing quantities using those distributions is hard (for computers/learners)
Representing structured distributions Four random variables: X1 coin toss produces heads X2 pencil levitates X3 friend has psychic powers X4 friend has two-headed coin Domain {0,1} {0,1} {0,1} {0,1}
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Joint distribution • Requires 15 numbers to specify probability of all values x1,x2,x3,x4 • N binary variables, 2N-1 numbers • Similar cost when computing conditional probabilities
How can we use fewer numbers? Four random variables: X1 coin toss produces heads X2 coin toss produces heads X3 coin toss produces heads X4 coin toss produces heads Domain {0,1} {0,1} {0,1} {0,1}
Statistical independence • Two random variables X1 and X2 are independent if P(x1|x2) = P(x1) • e.g. coinflips: P(x1=H|x2=H) = P(x1=H) = 0.5 • Independence makes it easier to represent and work with probability distributions • We can exploit the product rule: If x1, x2, x3, and x4 are all independent…
Expressing independence • Statistical independence is the key to efficient probabilistic representation and computation • This has led to the development of languages for indicating dependencies among variables • Some of the most popular languages are based on “graphical models”
Part II: Graphical models • Introduction to graphical models • representation and inference • Causal graphical models • causality • learning about causal relationships • Graphical models and cognitive science • uses of graphical models • an example: causal induction
Part II: Graphical models • Introduction to graphical models • representation and inference • Causal graphical models • causality • learning about causal relationships • Graphical models and cognitive science • uses of graphical models • an example: causal induction
Graphical models • Express the probabilistic dependency structure among a set of variables (Pearl, 1988) • Consist of • a set of nodes, corresponding to variables • a set of edges, indicating dependency • a set of functions defined on the graph that specify a probability distribution
Undirected graphical models X3 X4 • Consist of • a set of nodes • a set of edges • a potential for each clique, multiplied together to yield the distribution over variables • Examples • statistical physics: Ising model, spinglasses • early neural networks (e.g. Boltzmann machines) X1 X2 X5
Directed graphical models X3 X4 • Consist of • a set of nodes • a set of edges • a conditional probability distribution for each node, conditioned on its parents, multiplied together to yield the distribution over variables • Constrained to directed acyclic graphs (DAGs) • Called Bayesian networks or Bayes nets X1 X2 X5
Bayesian networks and Bayes • Two different problems • Bayesian statistics is a method of inference • Bayesian networks are a form of representation • There is no necessary connection • many users of Bayesian networks rely upon frequentist statistical methods • many Bayesian inferences cannot be easily represented using Bayesian networks
Properties of Bayesian networks • Efficient representation and inference • exploiting dependency structure makes it easier to represent and compute with probabilities • Explaining away • pattern of probabilistic reasoning characteristic of Bayesian networks, especially early use in AI
Properties of Bayesian networks • Efficient representation and inference • exploiting dependency structure makes it easier to represent and compute with probabilities • Explaining away • pattern of probabilistic reasoning characteristic of Bayesian networks, especially early use in AI
P(x4) X4 X3 P(x3) P(x1|x3, x4) X1 X2 P(x2|x3) Efficient representation and inference Four random variables: X1 coin toss produces heads X2 pencil levitates X3 friend has psychic powers X4 friend has two-headed coin
The Markov assumption Every node is conditionally independent of its non-descendants, given its parents where Pa(Xi) is the set of parents of Xi (via the product rule)
P(x4) X4 X3 P(x3) P(x1|x3, x4) X1 X2 P(x2|x3) Efficient representation and inference Four random variables: X1 coin toss produces heads X2 pencil levitates X3 friend has psychic powers X4 friend has two-headed coin 1 1 4 2 total = 7 (vs 15) P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)
Reading a Bayesian network • The structure of a Bayes net can be read as the generative process behind a distribution • Gives the joint probability distribution over variables obtained by sampling each variable conditioned on its parents
P(x4) P(x3) P(x1|x3, x4) P(x2|x3) Reading a Bayesian network Four random variables: X1 coin toss produces heads X2 pencil levitates X3 friend has psychic powers X4 friend has two-headed coin X4 X3 X1 X2 P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)
Reading a Bayesian network • The structure of a Bayes net can be read as the generative process behind a distribution • Gives the joint probability distribution over variables obtained by sampling each variable conditioned on its parents • Simple rules for determining whether two variables are dependent or independent • Independence makes inference more efficient
P(x4) X4 X3 P(x3) P(x1|x3, x4) X1 X2 P(x2|x3) Computing with Bayes nets P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)
P(x4) X4 X3 P(x3) P(x1|x3, x4) X1 X2 P(x2|x3) Computing with Bayes nets sum over 8 values P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)
P(x4) X4 X3 P(x3) P(x1|x3, x4) X1 X2 P(x2|x3) Computing with Bayes nets P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)
P(x4) X4 X3 P(x3) P(x1|x3, x4) X1 X2 P(x2|x3) Computing with Bayes nets sum over 4 values P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)
Computing with Bayes nets • Inference algorithms for Bayesian networks exploit dependency structure • Message-passing algorithms • “belief propagation” passes simple messages between nodes, exact for tree-structured networks • More general inference algorithms • exact: “junction-tree” • approximate: Monte Carlo schemes (see Part IV)
Properties of Bayesian networks • Efficient representation and inference • exploiting dependency structure makes it easier to represent and compute with probabilities • Explaining away • pattern of probabilistic reasoning characteristic of Bayesian networks, especially early use in AI
Rain Sprinkler Grass Wet Explaining away Assume grass will be wet if and only if it rained last night, or if the sprinklers were left on:
Rain Sprinkler Grass Wet Explaining away Compute probability it rained last night, given that the grass is wet:
Rain Sprinkler Grass Wet Explaining away Compute probability it rained last night, given that the grass is wet:
Rain Sprinkler Grass Wet Explaining away Compute probability it rained last night, given that the grass is wet:
Rain Sprinkler Grass Wet Explaining away Compute probability it rained last night, given that the grass is wet:
Rain Sprinkler Grass Wet Between 1 and P(s) Explaining away Compute probability it rained last night, given that the grass is wet:
Rain Sprinkler Grass Wet Both terms = 1 Explaining away Compute probability it rained last night, given that the grass is wet and sprinklers were left on:
Rain Sprinkler Grass Wet Explaining away Compute probability it rained last night, given that the grass is wet and sprinklers were left on:
Rain Sprinkler Grass Wet Explaining away “Discounting” to prior probability.
Sprinkler IF Wet AND NOT Sprinkler THEN Rain Contrast w/ production system Rain Grass Wet • Formulate IF-THEN rules: • IF Rain THEN Wet • IF Wet THEN Rain • Rules do not distinguish directions of inference • Requires combinatorial explosion of rules
Contrast w/ spreading activation Rain Sprinkler • Excitatory links: RainWet, SprinklerWet Grass Wet • Observing rain, Wet becomes more active. • Observing grass wet, Rain and Sprinkler become more active • Observing grass wet and sprinkler, Rain cannot become less active. No explaining away!
Contrast w/ spreading activation Rain Sprinkler • Excitatory links: RainWet, SprinklerWet • Inhibitory link: RainSprinkler Grass Wet • Observing grass wet, Rain and Sprinkler become more active • Observing grass wet and sprinkler, Rain becomes less active: explaining away
Contrast w/ spreading activation Rain Burst pipe Sprinkler Grass Wet • Each new variable requires more inhibitory connections • Not modular • whether a connection exists depends on what others exist • big holism problem • combinatorial explosion
Contrast w/ spreading activation (McClelland & Rumelhart, 1981)
Graphical models • Capture dependency structure in distributions • Provide an efficient means of representing and reasoning with probabilities • Allow kinds of inference that are problematic for other representations: explaining away • hard to capture in a production system • more natural than with spreading activation
Part II: Graphical models • Introduction to graphical models • representation and inference • Causal graphical models • causality • learning about causal relationships • Graphical models and cognitive science • uses of graphical models • an example: causal induction
Causal graphical models • Graphical models represent statistical dependencies among variables (ie. correlations) • can answer questions about observations • Causal graphical models represent causal dependencies among variables (Pearl, 2000) • express underlying causal structure • can answer questions about both observations and interventions (actions upon a variable)
P(x4) X4 X3 P(x3) P(x1|x3, x4) X1 X2 P(x2|x3) Bayesian networks Nodes: variables Links: dependency Each node has a conditional probability distribution Data: observations of x1, ..., x4 Four random variables: X1 coin toss produces heads X2 pencil levitates X3 friend has psychic powers X4 friend has two-headed coin
P(x4) X4 X3 P(x3) P(x1|x3, x4) X1 X2 P(x2|x3) Causal Bayesian networks Nodes: variables Links:causality Each node has a conditional probability distribution Data: observations of and interventions onx1, ..., x4 Four random variables: X1 coin toss produces heads X2 pencil levitates X3 friend has psychic powers X4 friend has two-headed coin
Cut all incoming links for the node that we intervene on Compute probabilities with “mutilated” Bayes net hold down pencil P(x4) X4 X3 P(x3) X P(x1|x3, x4) X1 X2 P(x2|x3) Interventions Four random variables: X1 coin toss produces heads X2 pencil levitates X3 friend has psychic powers X4 friend has two-headed coin
C B B B C B E E Learning causal graphical models • Strength: how strong is a relationship? • Structure: does a relationship exist?
C B C B E E Causal structure vs. causal strength • Strength: how strong is a relationship? B B
C B w0 w1 C B E w0 E Causal structure vs. causal strength • Strength: how strong is a relationship? • requires defining nature of relationship B B