1 / 34

Results of the Causality Challenge

Results of the Causality Challenge. Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe Pellet, IBM Zürich Gregory F. Cooper, Pittsburg University Peter Spirtes, Carnegie Mellon. …your health?. …climate changes?.

Download Presentation

Results of the Causality Challenge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Results of the Causality Challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André Elisseeff and Jean-Philippe Pellet, IBM Zürich Gregory F. Cooper, Pittsburg University Peter Spirtes, Carnegie Mellon clopinet.com/causality

  2. …your health? …climate changes? … the economy? Causal discovery What affects… Which actions will have beneficial effects? clopinet.com/causality

  3. The system External agent Systemic causality clopinet.com/causality

  4. Feature Selection Y X Predict Y from features X1, X2, … Select most predictive features. clopinet.com/causality

  5. Y Y X Causation Predict the consequences of actions: Under “manipulations” by an external agent, some features are no longer predictive. clopinet.com/causality

  6. Challenge Design clopinet.com/causality

  7. Available data • A lot of “observational” data. Correlation  Causality! • Experiments are often needed, but: • Costly • Unethical • Infeasible • This challenge, semi-artificial data: • Re-simulated data • Real data with artificial “probes” clopinet.com/causality

  8. Challenge datasets Toy datasets Four tasks clopinet.com/causality

  9. On-line feed-back clopinet.com/causality

  10. Difficulties • Violated assumptions: • Causal sufficiency • Markov equivalence • Faithfulness • Linearity • “Gaussianity” • Overfitting (statistical complexity): • Finite sample size • Algorithm efficiency (computational complexity): • Thousands of variables • Tens of thousands of examples clopinet.com/causality

  11. Evaluation • Fulfillment of an objective • Prediction of a target variable • Predictions under manipulations • Causal relationships: • Existence • Strength • Degree clopinet.com/causality

  12. Setting • Predict a target variable (on training and test data). • Return the set of features used. • Flexibility: • Sorted or unsorted list of features • Single prediction or table of results • Complete entry = xxx0, xxx1, xxx2 results (for at least one dataset). clopinet.com/causality

  13. Metrics • Results ranked according to the test set target prediction performance “Tscore”: • We also assess directly the feature set with a “Fscore”, not used for ranking. clopinet.com/causality

  14. Toy Examples clopinet.com/causality

  15. Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue LUCAS0: natural Car Accident Causality assessmentwith manipulations clopinet.com/causality

  16. Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Car Accident Causality assessmentwith manipulations LUCAS1: manipulated clopinet.com/causality

  17. Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Car Accident Causality assessmentwith manipulations LUCAS2: manipulated clopinet.com/causality

  18. 10 2 5 3 9 4 1 0 6 11 8 • Participants return: S=selected subset 7 11 4 1 2 3 (ordered or not). Goal driven causality • We define: • V=variables of interest • (e.g. MB, direct causes, ...) • We assess causal relevance: Fscore=f(V,S). clopinet.com/causality

  19. Causality assessmentwithout manipulation? clopinet.com/causality

  20. P1 P2 P3 PT Probes Using artificial “probes” Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder LUCAP0: natural Coughing Fatigue Car Accident clopinet.com/causality

  21. Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Car Accident P1 P2 P3 PT Probes Using artificial “probes” LUCAP1&2: manipulated clopinet.com/causality

  22. Scoring using “probes” • What we can compute (Fscore): • Negative class = probes (here, all “non-causes”, all manipulated). • Positive class = other variables (may include causes and non causes). • What we want (Rscore): • Positive class = causes. • Negative class = non-causes. • What we get (asymptotically): Fscore = (NTruePos/NReal) Rscore + 0.5 (NTrueNeg/NReal) clopinet.com/causality

  23. Results clopinet.com/causality

  24. Challenge statistics • Start: December 15, 2007. • End: April 30, 2000 • Total duration: 20 weeks. • Last (complete) entry ranked: Number of ranked entrants Number of ranked submissions clopinet.com/causality

  25. REGED SIDO 1 1 0.9 0.9 0.8 0.8 0.7 0.7 Tscore Tscore 0.6 0.6 0.5 0.5 0 0 0.4 0.4 1 1 2 2 0.3 0.3 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Days into the challenge Days into the challenge MARTI CINA 1 1 0.9 0.9 0.8 0.8 0.7 0.7 Tscore Tscore 0.6 0.6 0.5 0.5 0 0 0.4 0.4 1 1 2 2 0.3 0.3 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Days into the challenge Days into the challenge Learning curves clopinet.com/causality

  26. AUC distribution clopinet.com/causality

  27. REGED clopinet.com/causality

  28. SIDO clopinet.com/causality

  29. CINA clopinet.com/causality

  30. MARTI clopinet.com/causality

  31. Pairwise comparisons clopinet.com/causality

  32. Top ranking methods • According to the rules of the challenge: • Yin Wen Chang: SVM => best prediction accuracy on REGED and CINA. Prize: $400 donated by Microsoft. • Gavin Cawley: Causal explorer + linear ridge regression ensembles => best prediction accuracy on SIDO and MARTI. Prize: $400 donated by Microsoft. • According to pairwise comparisons: • Jianxin Yin and Prof. Zhi Geng’s group: Partial Orientation and Local Structural Learning=> best on Pareto front, new original causal discovery algorithm. Prize: free WCCI 2008 registration. clopinet.com/causality

  33. Pairwise comparisons REGED SIDO MARTI CINA clopinet.com/causality

  34. Conclusion • We have found good correlation between causation and prediction under manipulations. • Several algorithms have demonstrated effectiveness of discovering causal relationships. • We still need to investigate what makes then fail in some cases. • We need to capitalize on the power of classical feature selection methods. clopinet.com/causality

More Related