A Strategy for Making Predictions under Manipulation

A Strategy for Making Predictions under Manipulation Ioannis Tsamardinos Assistant Professor Computer Science Department, University of Crete ICS, Foundation for Research and Technology - Hellas Laura E. Brown Ph.D. Candidate Dept. Biomedical Inf., Vanderbilt Univ.

Selecting a Formulation of Causality V2 • Causal Bayesian Networks • Cross Sectional Data • No explicit notion of time • No feedback cycles allows • Edges express causal relations • Distribution expressed as V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

Effect of Manipulation V2 V1 V3 T V4 V5 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete

Effect of Manipulation V2 V2 E V1 V3 V1 V3 T T V4 V4 External Manipulator V5 V5 V6 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete

Effect of Manipulation V2 V2 E V1 V3 V1 V3 T T V4 V4 Other parents are removed V5 V5 V6 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete

Effect of Manipulation V2 E V1 V3 M the set of manipulated variables T V4 V5 V6 J Pearl. Causality, Models, Reasoning, and Inference, 2000. I. Tsamardinos, CSD, University of Crete

Types of Predictive Tasks • No manipulations • Known set of manipulated variables M • From data following P(V) • Predict data following PM(V) • The way manipulations are performed is unknown, i.e. PM(Vi | E) are uknown • Unknown M I. Tsamardinos, CSD, University of Crete

The Markov Blanket of T V2 • The set of direct causes, direct effects, and direct causes of direct effects V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

The Manipulated Markov Blanket of T V2 • The set of direct causes, direct effects, and direct causes of direct effects in the manipulated distribution • E.g. V1 and V5 V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

Properties of MB(T) • The smallest-size, most-predictive subset of variables • All and only the variables we need for building optimal predictive models I. Tsamardinos and C. F. Aliferis. Towards principled feature selection: Relevancy, Filters and Wrappers. AI & Statistics, 2003. I. Tsamardinos, CSD, University of Crete

A. No Manipulations • Find the MB(T) • Fit a model from training data for P(T | MBM(T)), using only the the variables of the MB(T) I. Tsamardinos, CSD, University of Crete

B. Known M • Find the MBM(T) • Fit a model from training data, using only the variables of the MBM(T) • Proposition: PM(T | MBM(T)) = P(T | MBM(T)) provided there are no manipulated spouses of T that is a descendant of T in the unmanipulated distribution I. Tsamardinos, CSD, University of Crete

Can Be Fit From Unmanipulated Data V2 • M = {V1 , V5} • PM(T | MBM(T)) = P(T | MBM(T)) V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

Cannot Be Fit From Unmanipulated Data V2 • M = {V1, V4 } • PM(T | MBM(T))  P(T | MBM(T)) V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

Unknown Manipulations M • Find the direct causes of T • Fit a model from training data, using only the the variables that are direct causes of T • Only the direct causes remain in MBM(T) under any manipulation I. Tsamardinos, CSD, University of Crete

Learning Bayesian Networks • Many algorithms that can learn the network exist • Discrete data : MMHC1 • Mixed: Bach2 • Find the graph, find the MBM(T), fit a model and you are done • … or are you? 1. I Tsamardinos, LE Brown, and CF Aliferis. Machine Learning, 65(1):31, 2006. 2. F.R. Bach and M.I. Jordan. NIPS-02 I. Tsamardinos, CSD, University of Crete

Faithfulness and Parity Functions • All BN methods assume Faithfulness • Causes and effects have detectable conditional pairwise associations with T • T = V1XOR V3 • No pairwise association between T and V1 V1 V3 T I. Tsamardinos, CSD, University of Crete

Parity Functions in Feature Space V1 V2 • T = V1XOR V2 • No pairwise association T, V1 • Construct New Feature • V1 V2 • Pairwise associations become apparent T V1 V2 V1V2 T I. Tsamardinos, CSD, University of Crete

Feature Space Markov Blanket • Map Data to Feature Space • Learn the Markov Blanket in Feature Space I. Tsamardinos, CSD, University of Crete

Feature Space Markov Blanket • Map Data to Feature Space • Brute force is inefficient • Indirectly map to feature space using an SVM • Assume: low SVM weight of a feature implies low association of the feature with T • Produce only the top weighted features! (recently developed heuristic method) • Learn the Markov Blanket in Feature Space • Run HITON1 1. C. F. Aliferis, I. Tsamardinos, and A. Statnikov. AMIA 2003. I. Tsamardinos, CSD, University of Crete

Inducting the MB(T) • Run MMMB1, RFE2, FSMB3, no feature selection • Build predictive models • If there is a large discrepancy in predicting performance consult FSMB • If there are “parity”-like variables, add the corresponding constructed features in the data before learning the network • I Tsamardinos, CF Aliferis, and A Statnikov. KDD 2003. • I. Guyon, et. al. Machine Learning, 46(1-3):389{422}, 2002. • submitted for publication I. Tsamardinos, CSD, University of Crete

Hidden Variables and Confounding V2 V1 V3 H1 H1 , H2hidden variables Dashed edges appear in the marginal network Marginal MB(T) showed in green H2 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

Hidden Variables and Confounding V2 V1 V3 H1 H1 , H2hidden variables Dashed edges appear in the marginal network Redish edges are “removed” by manipulations Manipulations of V5 , V3lead to errors in estimating MBM(T) (bluish nodes) H2 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

Finding Non-Confounded Edges Proposition: V = O H, O are observable, H are not. P(V) is faithful to a Causal Bayesian Network . If • S O, I(V1 ; T | S) • S O, I(V3 ; T | S) • S O, I(V5 ;T | S) •  Z1 O, s.t. I(V1 ; V3 | S) •  Z2 O, s.t. I(V1 ; V5 | S) • I(V1 ; V3 | Z1  {T}) • I(V1 ; V5 | Z2  {T}) Then there is a causal path T to V5 (edge T V5 is causal) V2 V1 V3 T V6 V5 I. Tsamardinos, CSD, University of Crete

Finding Non-Confounded Edges Proposition: V = O H, O are observable, H are not. P(V) is faithful to a Causal Bayesian Network . If • S O, I(V1 ; T | S) • S O, I(V3 ; T | S) • S O, I(V5 ;T | S) •  Z1 O, s.t. I(V1 ; V3 | S) •  Z2 O, s.t. I(V1 ; V5 | S) • I(V1 ; V3 | Z1  {T}) • I(V1 ; V5 | Z2  {T}) Then there is a causal path T to V5 (edge T V5 is causal) V2 V1 V3 T V6 H V5 I. Tsamardinos, CSD, University of Crete

Finding Non-Confounded Edges • Use to test to • Orient some edges • Find truly causal (non-confounded) edges • Extension of basic idea presented in [1] 1. S. Mani, P. Spirtes, and G.F. Cooper. UAI 2006. I. Tsamardinos, CSD, University of Crete

Finding the MBM(T) • Edge existence: BN learning algorithm • Edge orientation: • Learn the network, convert to PDAG, obtain compelled edges • Confounding test • Edge confounding • Confounding test • Weigh evidence and decide on orientation and absence of confounding I. Tsamardinos, CSD, University of Crete

Finding the MBM(T) V2 Non-confounded Oriented but could be confounded Undirected Manipulated Nodes V1 V3 V7 T Vi V4 V5 Are V7 , V3part of MBM(T)? Is V4 part of MBM(T)? V6 I. Tsamardinos, CSD, University of Crete

Results I. Tsamardinos, CSD, University of Crete

Limitations • Most time spent or REGED • Conditional independence tests were sometimes inappropriate • New methods not optimized or fully tested • Model averaging should be used • Formal methods for weighing the evidence are needed I. Tsamardinos, CSD, University of Crete

Conclusions • General basis of theory and algorithms for predictions under manipulation • New algorithms for addressing lack of faithfulness and hidden confounding variables • The strategy can be implemented using the new and existing algorithms • Many open directions/problems • Faithfulness • Acyclicity • Hidden variables • Timed data I. Tsamardinos, CSD, University of Crete

A Strategy for Making Predictions under Manipulation

A Strategy for Making Predictions under Manipulation

Presentation Transcript

Making Predictions

Climate Change - Making Predictions

Making Predictions

Making Predictions

Strategy Instruction for Making Predictions and Making Connections

Making Predictions

Making predictions

Lesson 42 Making Predictions

MAKING PREDICTIONS ABOUT FUTURE

Making Predictions

Making Predictions

Making Inferences or Predictions

MAKING PREDICTIONS ABOUT FUTURE

Making Predictions

Making predictions

Making Predictions

MAKING PUNNETT PREDICTIONS

MAKING PUNNETT PREDICTIONS

2-2 Making Predictions

MAKING PUNNETT PREDICTIONS

Making Inferences, Drawing Conclusions, Making Predictions