1 / 48

RELATED CLASS

RELATED CLASS. CS 262 Z – SEMINAR IN CAUSAL MODELING CURRENT TOPICS IN COGNITIVE SYSTEMS INSTRUCTOR: JUDEA PEARL Spring Quarter Monday and Wednesday, 2-4pm A74 Haines Hall (location may change, check Schedule of Classes prior) Prerequisites: Bayesian Networks or Commonsense.

sanjiv
Download Presentation

RELATED CLASS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RELATED CLASS CS 262 Z – SEMINAR IN CAUSAL MODELING CURRENT TOPICS IN COGNITIVE SYSTEMS INSTRUCTOR: JUDEA PEARL Spring Quarter Monday and Wednesday, 2-4pm A74 Haines Hall (location may change, check Schedule of Classes prior) Prerequisites: Bayesian Networks or Commonsense

  2. Judea Pearl University of California Los Angeles THE MATHEMATICS OF CAUSE AND EFFECT

  3. David Hume (1711–1776) “I would rather discover one causal law than be King of Persia.” Democritus (460-370 B.C.) “Development of Western science is based on two great achievements: the invention of the formal logical system (in Euclidean geometry) by the Greek philosophers, and the discovery of the possibility to find out causal relationships by systematic experiment (during the Renaissance).” A. Einstein, April 23, 1953

  4. HUME’S LEGACY Analytical vs. empirical claims Causal claims are empirical All empirical claims originate from experience.

  5. THE TWO RIDDLESOF CAUSATION • What empirical evidence legitimizes a cause-effect connection? • What inferences can be drawn from causal information? and how?

  6. The Art ofCausal Mentoring “Easy, man! that hurts!”

  7. OLD RIDDLES IN NEW DRESS • How should a robotacquirecausal • information from the environment? • How should a robotprocesscausal • information received from its • creator-programmer?

  8. CAUSATION AS A PROGRAMMER'S NIGHTMARE • Input: • “If the grass is wet, then it rained” • “if we break this bottle, the grass • will get wet” • Output: • “If we break this bottle, then it rained”

  9. CAUSATION AS APROGRAMMER'S NIGHTMARE (Cont.) ( Lin, 1995) • Input: • A suitcase will open iff both • locks are open. • The right lock is open • Query: • What if we open the left lock? • Output: • The right lock might get closed.

  10. THE BASIC PRINCIPLES Causation= encoding of behavior under interventions Interventions= surgeries on mechanisms Mechanisms= stable functional relationships = equations + graphs

  11. CAUSATION AS A PROGRAMMER'S NIGHTMARE • Input: • “If the grass is wet, then it rained” • “if we break this bottle, the grass • will get wet” • Output: • “If we break this bottle, then it rained”

  12. P Joint Distribution Q(P) (Aspects of P) Data Inference TRADITIONAL STATISTICAL INFERENCE PARADIGM e.g., Infer whether customers who bought product A would also buy product B. Q = P(B | A)

  13. FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES Probability and statistics deal with static relations P Joint Distribution P Joint Distribution Q(P) (Aspects of P) Data change Inference What happens when P changes? e.g., Infer whether customers who bought product A would still buy Aif we were to double the price.

  14. FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES What remains invariant when Pchanges say, to satisfy P(price=2)=1 P Joint Distribution P Joint Distribution Q(P) (Aspects of P) Data change Inference Note: P(v)P (v | price = 2) P does not tell us how it ought to change e.g. Curing symptoms vs. curing diseases e.g. Analogy: mechanical deformation

  15. Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization Confounding / Effect Instrument Holding constant Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility FROM STATISTICAL TO CAUSAL ANALYSIS: 1. THE DIFFERENCES (CONT)

  16. Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization Confounding / Effect Instrument Holding constant Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility • No causes in – no causes out (Cartwright, 1989) } statistical assumptions + data causal assumptions  causal conclusions FROM STATISTICAL TO CAUSAL ANALYSIS: 2. MENTAL BARRIERS • Causal assumptions cannot be expressed in the mathematical language of standard statistics.

  17. Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization Confounding / Effect Instrument Holding constant Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility • No causes in – no causes out (Cartwright, 1989) } statistical assumptions + data causal assumptions  causal conclusions • Non-standard mathematics: • Structural equation models (Wright, 1920; Simon, 1960) • Counterfactuals (Neyman-Rubin (Yx), Lewis (xY)) FROM STATISTICAL TO CAUSAL ANALYSIS: 2. MENTAL BARRIERS • Causal assumptions cannot be expressed in the mathematical language of standard statistics.

  18. THE STRUCTURAL MODEL PARADIGM Joint Distribution Data Generating Model Q(M) (Aspects of M) Data Inference M – Oracle for computing answers to Q’s. e.g., Infer whether customers who bought product A would still buy A if we were to double the price.

  19. FAMILIAR CAUSAL MODEL ORACLE FOR MANIPILATION X Y Z INPUT OUTPUT

  20. STRUCTURAL CAUSAL MODELS • Definition: A structural causal model is a 4-tuple • V,U, F, P(u), where • V = {V1,...,Vn} are observable variables • U={U1,...,Um} are background variables • F = {f1,...,fn} are functions determining V, • vi = fi(v, u) • P(u) is a distribution over U • P(u) and F induce a distribution P(v) over observable variables

  21. CAUSAL MODELS AT WORK(The impatient firing-squad) U (Court order) C (Captain) A B (Riflemen) D (Death)

  22. CAUSAL MODELS AT WORK (Glossary) U: Court orders the execution C: Captain gives a signal A: Rifleman-A shoots B: Rifleman-B shoots D: Prisoner dies =: Functional Equality (new symbol) U C=U C A=C B=C A B D=AB D

  23. SENTENCES TO BE EVALUATED U S1. prediction: AD S2. abduction: DC S3. transduction: AB S4. action: CDA S5. counterfactual: D D{A} S6. explanation: Caused(A, D) C A B D

  24. STANDARD MODEL FOR STANDARD QUERIES • S1. (prediction):If rifleman-A shot, the prisoner is dead,A  D • S2. (abduction):If the prisoner is alive, then the Captain did not signal, • D  C • S3. (transduction):If rifleman-A shot, then B shot as well, • A B U iff C iff iff A B OR D

  25. WHY CAUSAL MODELS? GUIDE FOR SURGERY S4. (action): If the captain gave no signal and Mr. Adecides toshoot, the prisoner will die: C  DA, and B will not shoot: C BA U C B A D

  26. WHY CAUSAL MODELS? GUIDE FOR SURGERY  TRUE S4. (action): If the captain gave no signal and Mr. Adecides toshoot, the prisoner will die: C  DA, and B will not shoot: C BA U C B A D

  27. 3-STEPS TO COMPUTING COUNTERFACTUALS U U TRUE TRUE C C FALSE FALSE A B A B D D TRUE TRUE S5. If the prisoner is dead, he would still be dead if A were not to have shot. DDA Abduction Action Prediction U TRUE C A B D

  28. COMPUTING PROBABILITIES OF COUNTERFACTUALS U U P(u|D) P(u) P(u|D) P(u|D) C C FALSE FALSE A B A B D D TRUE P(DA|D) P(S5). The prisoner is dead. How likely is it that he would be dead if A were not to have shot. P(DA|D) = ? Abduction Action Prediction U C A B D

  29. or <U, V, F, P(u)> P(y | do(x)) P(Yx=y) CAUSAL MODEL (FORMAL) M = <U, V, F> U - Background variables V - Endogenous variables F - Set of functions{U V\Vi Vi} vi =fi (pai , ui ) Submodel: Mx = <U, V, Fx>, representingdo(x) Fx= Replaces equation for X with X=x Actions and Counterfactuals: Yx(u) = Solution of Y in Mx

  30. WHY COUNTERFACTUALS? Action queries are triggered by (modifiable) observations, demanding abductive step, i.e., counterfactual processing. E.g., Troubleshooting Observation:The output is low Action query: Will the output get higher – if we replace the transistor? Counterfactual query:Would the output be higher – had the transistor been replaced?

  31. APPLICATIONS • . Predicting effects of actions and policies • . Learning causal relationships from • assumptions and data • . Troubleshooting physical systems and plans • . Finding explanations for reported events • . Generating verbal explanations • . Understanding causal talk • . Formulating theories of causal thinking

  32. Joint probabilities of counterfactuals: The super-distribution P* is derived from M. Parsimonous, consistent, and transparent CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: “Y would be y (in situation u), had X beenx,” denoted Yx(u) = y, means: The solution for Y in a mutilated model Mx, (i.e., the equations for X replaced by X = x) with input U=u, is equal to y.

  33. AXIOMS OF CAUSAL COUNTERFACTUALS Y would bey, hadXbeenx(in stateU = u) • Definiteness • Uniqueness • Effectiveness • Composition • Reversibility

  34. GRAPHICAL – COUNTERFACTUALS SYMBIOSIS Every causal graph expresses counterfactuals assumptions, e.g., X  Y Z • Missing arrows X Z 2. Missing arcs Y Z Assumptions are guaranteed consistency. Assumptions are readable from the graph.

  35. DERIVATION IN CAUSAL CALCULUS Genotype (Unobserved) Smoking Tar Cancer Probability Axioms P (c |do{s})=tP (c | do{s},t) P (t |do{s}) Rule 2 = tP (c |do{s},do{t})P (t |do{s}) Rule 2 = tP (c |do{s},do{t})P (t | s) Rule 3 = tP (c |do{t})P (t | s) Probability Axioms = stP (c |do{t},s) P (s|do{t}) P(t |s) Rule 2 = stP (c | t, s) P (s|do{t})P(t |s) Rule 3 = stP (c | t, s) P (s) P(t |s)

  36. G Gx Moreover, P(y | do(x)) =åP(y | x,z) P(z) (“adjusting” for Z) z THE BACK-DOOR CRITERION Graphical test of identification P(y | do(x)) is identifiable in G if there is a set Z of variables such thatZd-separates X from YinGx. Z1 Z1 Z2 Z2 Z Z3 Z3 Z4 Z5 Z5 Z4 X X Z6 Y Y Z6

  37. RECENT RESULTS ON IDENTIFICATION • do-calculus is complete • Complete graphical criterion for identifying causal effects (Shpitser and Pearl, 2006). • Complete graphical criterion for empirical testability of counterfactuals (Shpitser and Pearl, 2007).

  38. DETERMINING THE CAUSES OF EFFECTS (The Attribution Problem) • Your Honor! My client (Mr. A) died BECAUSE • he used that drug.

  39. DETERMINING THE CAUSES OF EFFECTS (The Attribution Problem) • Your Honor! My client (Mr. A) died BECAUSE • he used that drug. • Court to decide if it is MORE PROBABLE THAN • NOT that A would be alive BUT FOR the drug! • P(? | A is dead, took the drug) > 0.50 PN =

  40. THE PROBLEM • Semantical Problem: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”

  41. THE PROBLEM • Semantical Problem: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and ydid in fact occur.” • Answer: • Computable from M

  42. Analytical Problem: • Under what condition can PN(x,y) be learned from statistical data, i.e., observational, experimental and combined. THE PROBLEM • Semantical Problem: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.”

  43. TYPICAL THEOREMS (Tian and Pearl, 2000) • Bounds given combined nonexperimental and experimental data • Identifiability under monotonicity (Combined data) • corrected Excess-Risk-Ratio

  44. CAN FREQUENCY DATA DECIDE LEGAL RESPONSIBILITY? ExperimentalNonexperimental do(x) do(x)xx Deaths (y) 16 14 2 28 Survivals (y) 984 986 998 972 1,000 1,000 1,000 1,000 • Nonexperimental data: drug usage predicts longer life • Experimental data: drug has negligible effect on survival • Plaintiff: Mr. A is special. • He actually died • He used the drug by choice • Court to decide (given both data): • Is it more probable than not that A would be alive • but for the drug?

  45. WITH PROBABILITY ONE 1 P(yx | x,y)  1 SOLUTION TO THE ATTRIBUTION PROBLEM • Combined data tell more that each study alone

  46. CONCLUSIONS • Structural-model semantics, enriched with logic • and graphs, provides: • Complete formal basis for causal reasoning • Powerful and friendly causal calculus • Lays the foundations for asking more difficult questions: What is an action? What is free will? Should robots be programmed to have this illusion?

  47. RELATED CLASS CS 262 Z – SEMINAR IN CAUSAL MODELING CURRENT TOPICS IN COGNITIVE SYSTEMS INSTRUCTOR: JUDEA PEARL Spring Quarter Monday and Wednesday, 2-4pm A74 Haines Hall (location may change, check Schedule of Classes prior) Prerequisites: Bayesian Networks or Commonsense

More Related