380 likes | 564 Views
From Association Analysis to Causal Discovery. Prof Jiuyong Li University of South Australia. Association analysis. Diapers -> Beer Bread & Butter -> Milk. P ositive correlation of birth rate to stork population. increasing the stork population would increase the birth rate ?.
E N D
From Association Analysis to Causal Discovery Prof Jiuyong Li University of South Australia
Association analysis • Diapers -> Beer • Bread & Butter -> Milk
Positive correlation of birth rate to stork population • increasing the stork population would increase the birth rate?
Further evidence for Causality ≠ AssociationsSimpson paradox
Association and Causal Relationship • Two variables X and Y. • Prob(Y | X)≠ P(Y), X is associated with Y (association rules) • Prob(Y | do X) ≠ Prob(Y | X) • How does Y vary when X changes? • The key, How to estimate Prob(Y | do X)? • In association analysis, the relationship of X and Y is analysed in isolation. • However, the relationship between X and Y is affected by other variables.
Causal discovery 1 • Randomised controlled trials • Gold standard method • Expensive • Infeasible • Association = causation
Causal discovery 2 • Bayesian network based causal inference • Do-calculus (Pearl 2000) • IDA (Maathuis et al. 2009) • To infer causal effects in a Bayesian network. • However • Constructing a Bayesian network is NP hard • Low scalability to large number of variables
Leaning causal structures CCC B • PC algorithm (Spirtes, Glymour and Scheines) • Not (A ╨ B | Z), there is an edge between A and B. • The search space exponentially increases with the number of variables. • Constraint based search • CCC (G. F. Cooper, 1997) • CCU (C. Silverstein et. al. 2000) • Efficiently removing non-causal relationships. A C ABC, ABC, CAB CCU B A C ABC
Association rules • Many efficient algorithms • Hundreds of thousands to millions of rules. • Many are spurious. • Interpretability • Association rules do not indicate causal effects.
Causal rules • Discover causal relationships using partial association and simulated cohort study. • Do not rely on Bayesian network structure learning. The discovery of causal rules also have strong theoretical support. • Discover both single cause and combined causes. • Can be discovered efficiently. • Z. Jin, J. Li, L. Liu, T. D. Le, B. Sun, and R. Wang, Discovery of causal rules using partial association. ICDM, 2012 • J. Li, T. D. Le, L. Liu, J. Liu, Z. Jin, and B. Sun. Mining causal association rules. In Proceedings of ICDM Workshop on Causal Discovery (CD), 2013.
Problem Discover causal rules from large databases of binary variables A Y C Y BF Y DE Y
Partial association test K K I J I J I K J Nonzero partial association M. W. Birch, 1964.
Partial association test – an example 4. Partial association test.
Fast partial association test • K denotes all possible variable combinations, the number is very large. • Counting the frequencies of the combinations is also time consuming. • Our solution: • Sort data and count frequencies of the equivalence classes. • Only use the combinations existing in the data set.
Pruning strategies Definition (Redundant causal rules): Assume that X⊂ W, if X → Y is a causal rule, rule W → Y is redundant as it does not provide new information. Definition (Condition for testing causal rules):We only test a combined causal rule XV → Y if X and Y have a zero association and V and Y have a zero association (cannot pass the qui-square test in step 3).
Algorithm positive association x 1. Prune the variable set (support) 2. Create the contingency table for each variable X • 3. Calculate the • If go to next step zero association • If move X to a set N • 4. Partial association test. • If PA(X, Y, K) is nonzero then XY is a causal rule. 5. Repeat 1-4 for each variable which is the combination of variables in set N
Experimental evaluations • We use the Arrhythmia data set in UCI machine learning repository. • We need to classify the presence and absence of cardiac arrhythmia. The data set contains 452 records and each record obtains 279 data attributes and one class attribute • Our results are quite consistent with the results from CCC method. • Some rules in CCC are removed by our method as they cannot pass the partial association test. • Our method can discover the combined rules. CCC and CCU methods are not set to discover these rules.
Experimental evaluations Figure 1: Extraction Time Comparison (20K Records) Figure 1: Extraction Time Comparison (100K Records)
Summary 1 • Simpson paradox • Associations might be inconsistent in subsets • Partial association test • Test the persistency of associations in all possible partitions. • Statistically sound. • Efficiency in sparse data. • What else?
Cohort study 1 Defined population Not expose Expose Not have a disease Have a disease Not have a disease Have a disease • Prospective: follow up. • Retrospective: look back. Historic study.
Cohort study 2 • Cohorts: share common characteristics but exposed or not exposed. • Determine how the exposure causes an outcome. • Measure: odds ratio = (a/b) / (c/d)
Limitations of cohort study • Need to know a hypothesis beforehand • Domain experts determine the control variables. • Collect data and test the hypothesis. • Not for data exploration. • We need • Given a data set without any hypotheses. • An automatic method to find and validate hypotheses. • For data exploration.
Control variables Outcome Cause • If we do not control covariates (especially those correlated to the outcome), we could not determine the true cause. • Too many control variables result too few matched cases in data. • How many people with the same race, gender, blood type, hair colour, eye colour, education level, …. • Irrelevant variables should not be controlled. • Eye colour may not relevant to the study. Other factors
Matches • Exact matching • Exact matches on all covariates. Infeasible. • Limited exact matching • Exact matches on a few key covariates. • Nearest neighbour matching • Find the closest neighbours • Propensity score matching • Based on the predicted effect of a treatment of covariates.
Method1 Discover causal association rules from large databases of binary variables A Y Fair dataset
Methods • A: Exposure variable • {B,C,D,E,F}: controlled variable set. • Rows with the same color for the controlled variable set are called matched record pairs. Fair dataset • An association rule is a causal association rule if: A Y
Algorithm x x • Remove irrelevant variables (support, local support, association) For each association rule (e. g. ) A Y • Find the exclusive variables of the exposure variable (support, association), i.e. G, F. The controlled variable set = {B, C, D, E}. 3. Find the fair dataset. Search for all matched record pairs 4. Calculate the odds-ratio to identify if the testing rule is causal 5. Repeat 2-4 for each variable which is the combination of variables. Only consider combination of non-causal factors.
Experimental evaluations CAR CCC CCU Figure 1: Extraction Time Comparison (20K Records)
Causality – Judea Pearl +1 +0.8 Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.
Methods • IDA • Maathuis, H. M., Colombo, D., Kalisch, M., and Buhlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nature Methods, 7(4), 247–249.
Conclusions • Association analysis has been widely used in data mining, but associations do not indicate causal relationships. • Association rule mining can be adapted for causal relationship discovery by combining some statistical methods. • Partial association test • Cohort study • They are efficient alternatives for causal Bayesian network based methods. • They are capable of finding combined causal factors.
Discussions • Causality and classification • Estimate prob (Y| do X) instead of prob (Y|X). • Feature section versus controlled variable selection. • Evaluation of causes. • Not classification accuracy • Bayesian networks??
Research Collaborators • Jixue Liu • Lin Liu • Thuc Le • Jin Zhou • Bin-yu Sun
Thank you for listening Questions please ??