310 likes | 412 Views
A Strategy for Making Predictions under Manipulation. Ioannis Tsamardinos Assistant Professor Computer Science Department, University of Crete ICS, Foundation for Research and Technology - Hellas. Laura E. Brown Ph.D. Candidate Dept. Biomedical Inf., Vanderbilt Univ.
E N D
A Strategy for Making Predictions under Manipulation Ioannis Tsamardinos Assistant Professor Computer Science Department, University of Crete ICS, Foundation for Research and Technology - Hellas Laura E. Brown Ph.D. Candidate Dept. Biomedical Inf., Vanderbilt Univ.
Selecting a Formulation of Causality V2 • Causal Bayesian Networks • Cross Sectional Data • No explicit notion of time • No feedback cycles allows • Edges express causal relations • Distribution expressed as V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Effect of Manipulation V2 V1 V3 T V4 V5 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete
Effect of Manipulation V2 V2 E V1 V3 V1 V3 T T V4 V4 External Manipulator V5 V5 V6 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete
Effect of Manipulation V2 V2 E V1 V3 V1 V3 T T V4 V4 Other parents are removed V5 V5 V6 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete
Effect of Manipulation V2 E V1 V3 M the set of manipulated variables T V4 V5 V6 J Pearl. Causality, Models, Reasoning, and Inference, 2000. I. Tsamardinos, CSD, University of Crete
Types of Predictive Tasks • No manipulations • Known set of manipulated variables M • From data following P(V) • Predict data following PM(V) • The way manipulations are performed is unknown, i.e. PM(Vi | E) are uknown • Unknown M I. Tsamardinos, CSD, University of Crete
The Markov Blanket of T V2 • The set of direct causes, direct effects, and direct causes of direct effects V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
The Manipulated Markov Blanket of T V2 • The set of direct causes, direct effects, and direct causes of direct effects in the manipulated distribution • E.g. V1 and V5 V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Properties of MB(T) • The smallest-size, most-predictive subset of variables • All and only the variables we need for building optimal predictive models I. Tsamardinos and C. F. Aliferis. Towards principled feature selection: Relevancy, Filters and Wrappers. AI & Statistics, 2003. I. Tsamardinos, CSD, University of Crete
A. No Manipulations • Find the MB(T) • Fit a model from training data for P(T | MBM(T)), using only the the variables of the MB(T) I. Tsamardinos, CSD, University of Crete
B. Known M • Find the MBM(T) • Fit a model from training data, using only the variables of the MBM(T) • Proposition: PM(T | MBM(T)) = P(T | MBM(T)) provided there are no manipulated spouses of T that is a descendant of T in the unmanipulated distribution I. Tsamardinos, CSD, University of Crete
Can Be Fit From Unmanipulated Data V2 • M = {V1 , V5} • PM(T | MBM(T)) = P(T | MBM(T)) V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Cannot Be Fit From Unmanipulated Data V2 • M = {V1, V4 } • PM(T | MBM(T)) P(T | MBM(T)) V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Unknown Manipulations M • Find the direct causes of T • Fit a model from training data, using only the the variables that are direct causes of T • Only the direct causes remain in MBM(T) under any manipulation I. Tsamardinos, CSD, University of Crete
Learning Bayesian Networks • Many algorithms that can learn the network exist • Discrete data : MMHC1 • Mixed: Bach2 • Find the graph, find the MBM(T), fit a model and you are done • … or are you? 1. I Tsamardinos, LE Brown, and CF Aliferis. Machine Learning, 65(1):31, 2006. 2. F.R. Bach and M.I. Jordan. NIPS-02 I. Tsamardinos, CSD, University of Crete
Faithfulness and Parity Functions • All BN methods assume Faithfulness • Causes and effects have detectable conditional pairwise associations with T • T = V1XOR V3 • No pairwise association between T and V1 V1 V3 T I. Tsamardinos, CSD, University of Crete
Parity Functions in Feature Space V1 V2 • T = V1XOR V2 • No pairwise association T, V1 • Construct New Feature • V1 V2 • Pairwise associations become apparent T V1 V2 V1V2 T I. Tsamardinos, CSD, University of Crete
Feature Space Markov Blanket • Map Data to Feature Space • Learn the Markov Blanket in Feature Space I. Tsamardinos, CSD, University of Crete
Feature Space Markov Blanket • Map Data to Feature Space • Brute force is inefficient • Indirectly map to feature space using an SVM • Assume: low SVM weight of a feature implies low association of the feature with T • Produce only the top weighted features! (recently developed heuristic method) • Learn the Markov Blanket in Feature Space • Run HITON1 1. C. F. Aliferis, I. Tsamardinos, and A. Statnikov. AMIA 2003. I. Tsamardinos, CSD, University of Crete
Inducting the MB(T) • Run MMMB1, RFE2, FSMB3, no feature selection • Build predictive models • If there is a large discrepancy in predicting performance consult FSMB • If there are “parity”-like variables, add the corresponding constructed features in the data before learning the network • I Tsamardinos, CF Aliferis, and A Statnikov. KDD 2003. • I. Guyon, et. al. Machine Learning, 46(1-3):389{422}, 2002. • submitted for publication I. Tsamardinos, CSD, University of Crete
Hidden Variables and Confounding V2 V1 V3 H1 H1 , H2hidden variables Dashed edges appear in the marginal network Marginal MB(T) showed in green H2 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Hidden Variables and Confounding V2 V1 V3 H1 H1 , H2hidden variables Dashed edges appear in the marginal network Redish edges are “removed” by manipulations Manipulations of V5 , V3lead to errors in estimating MBM(T) (bluish nodes) H2 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Finding Non-Confounded Edges Proposition: V = O H, O are observable, H are not. P(V) is faithful to a Causal Bayesian Network . If • S O, I(V1 ; T | S) • S O, I(V3 ; T | S) • S O, I(V5 ;T | S) • Z1 O, s.t. I(V1 ; V3 | S) • Z2 O, s.t. I(V1 ; V5 | S) • I(V1 ; V3 | Z1 {T}) • I(V1 ; V5 | Z2 {T}) Then there is a causal path T to V5 (edge T V5 is causal) V2 V1 V3 T V6 V5 I. Tsamardinos, CSD, University of Crete
Finding Non-Confounded Edges Proposition: V = O H, O are observable, H are not. P(V) is faithful to a Causal Bayesian Network . If • S O, I(V1 ; T | S) • S O, I(V3 ; T | S) • S O, I(V5 ;T | S) • Z1 O, s.t. I(V1 ; V3 | S) • Z2 O, s.t. I(V1 ; V5 | S) • I(V1 ; V3 | Z1 {T}) • I(V1 ; V5 | Z2 {T}) Then there is a causal path T to V5 (edge T V5 is causal) V2 V1 V3 T V6 H V5 I. Tsamardinos, CSD, University of Crete
Finding Non-Confounded Edges • Use to test to • Orient some edges • Find truly causal (non-confounded) edges • Extension of basic idea presented in [1] 1. S. Mani, P. Spirtes, and G.F. Cooper. UAI 2006. I. Tsamardinos, CSD, University of Crete
Finding the MBM(T) • Edge existence: BN learning algorithm • Edge orientation: • Learn the network, convert to PDAG, obtain compelled edges • Confounding test • Edge confounding • Confounding test • Weigh evidence and decide on orientation and absence of confounding I. Tsamardinos, CSD, University of Crete
Finding the MBM(T) V2 Non-confounded Oriented but could be confounded Undirected Manipulated Nodes V1 V3 V7 T Vi V4 V5 Are V7 , V3part of MBM(T)? Is V4 part of MBM(T)? V6 I. Tsamardinos, CSD, University of Crete
Results I. Tsamardinos, CSD, University of Crete
Limitations • Most time spent or REGED • Conditional independence tests were sometimes inappropriate • New methods not optimized or fully tested • Model averaging should be used • Formal methods for weighing the evidence are needed I. Tsamardinos, CSD, University of Crete
Conclusions • General basis of theory and algorithms for predictions under manipulation • New algorithms for addressing lack of faithfulness and hidden confounding variables • The strategy can be implemented using the new and existing algorithms • Many open directions/problems • Faithfulness • Acyclicity • Hidden variables • Timed data I. Tsamardinos, CSD, University of Crete