320 likes | 350 Views
Learn about the discovery of consistent causal structures from observational data and how anomalies differ in causal relationships. Explore the significance of causal structure in data analysis tools.
E N D
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling Group, IBM Rick Lawrence, Manager June 23, 2006
Contributions • Consistent causal structure can be learned from passive observational data • Anomalous examples have a quantitatively differentiable causal structure from normal ones • Causal structure is a significant contribution to the standard analysis tools of independence and likelihood
Outline • Motivation & Problem • Causation Definition • Causal Discovery • Causal Comparisson • Conclusions & Ongoing Work
Motivation • Processors: • Detection: Is this wafer good or bad? • Causation: Why is this wafer bad? • Intervention: How can we fix the problem? • Business: • Detection: Is this business functioning well or not? • Causation: Why is this business not functioning well? • Intervention: What can IBM do to improve performance?
Problem • Interventions are expensive and flawed • What can passively observed data tell us about the causal structure of a process?
Direct Causation X is a direct cause of Y relative to S, iff z,x1 x2 P(Y | X set= x1 , Zset=z) P(Y | X set= x2 , Zset=z) where Z = S - {X,Y} Intervene toset Z = zNot just observe Z = z Asymmetric [Scheines (2005)]
Causal Graphs Causal Directed Acyclic Graph G = {V,E} Each edge X Y represents a direct causal claim: X is a direct cause of Y relative to V [Scheines (2005)]
Probabilistic Independence X and Y are independent iff x1 x2 P(Y | X = x1) = P(Y | X = x2) X and Y areassociatediff X and Y are not independent [Scheines (2005)]
The Causal Markov Axiom Probabilistic Independence Causal Structure Markov Condition In a Causal Graph: each variable V is independent of its non-effects, conditional on its direct causes. [Scheines (2005)]
Causal Structure Statistical Data [Scheines (2005)]
Causal Structure Statistical Data [Scheines (2005)]
Causal Structure Statistical Data [Scheines (2005)]
Statistical Inference • Background Knowledge • Faithfulness • X2 before X3 • - no unmeasured common causes Causal DiscoveryStatistical DataCausal Structure [Scheines (2005)]
Causal Discovery Algorithm • PC algorithm [Spirtes et al., 2000] • Constraint-based search • Only need to know how to test conditional independence • Do not need to measure all causes • Asymptotically correct
PC algorithm • Begin with the fully connected undirected graph • For each pair of nodes, test their independence conditional on all subsets of their neighbors: • i.e., (X _||_ Y | Z)? • If independent for any conditioning • remove edge, record subset conditioned upon • If dependent for all conditionings • leave edge • Orient edges, where possible
Independence Tests [Scheines (2005)]
Edge OrientationRule 1: Colliders [Scheines (2005)]
More Orientation Rules:Rule 2: Avoid forming new colliders [Scheines (2005)]
More Orientation Rules:Rule 3: Avoid forming cycles • If there is an undirected edge between X and Y • And there is a directed path from X to Y • Then direct X-Y as X Y • Given: OK: BAD (cycle): • X Y X Y X Y • Z Z Z
Our Example Rule 2: Colliders Rule 3: No new V-structures Truth fully recovered [Scheines (2005)]
Using causal structure to explain anomalies • Why is one wafer good, and another bad? • Separate data into classes • Form causal graphs on each class • Compare causal structures
Form causal graphs Good Train Good Test Bad
How to compare? • Similarity Score for graphs A and B over common nodes V : • Consider undirected edges as bi-directed • Of all the ordered pairs of variables (x, y) in V, with an arc x y in either A or B • In what percentage is there also x y in the other graph • i.e., (AdjA(x,y) || AdjB(x,y)) && (AdjA(x,y) == AdjB(x,y)) • Difference Graph: • If there is an arc x y in either A or B, but not in both, place the arc x y in the difference graph • i.e., if (AdjA(x,y) != AdjB(x,y)) then AdjDiff(x,y) = True
Comparison Good Train Good Test 59% similar Difference Graph
Comparison Good Train Bad 37% similar Difference Graph
Comparison Good Test Bad 35% similar Difference Graph
Conclusions • Consistent causal structure can be learned from passive observational data • Anomalous examples have a quantitatively differentiable causal structure from normal ones • Causal structure is a significant contribution to the standard analysis tools of independence and likelihood
Ongoing work • Comparing to maximum likelihood and minimum description length techniques • Looking at time-ordering • How do variables influence each other over time? • Using one-class SVM to do clustering • Avoids need for labeled data • Relaxing assumptions • Allow latent variables • Evaluation is difficult without domain expert • Using causal structure to help in clustering
Thank You References • J. Pearl (2000). Causality: Models, Reasoning, and Inference, Cambridge Univ. Press • R. Scheines, Causality Slides http://www.gatsby.ucl.ac.uk/~zoubin/SALD/scheines.pdf • P. Spirtes, C. Glymour, and R. Scheines (2000). Causation, Prediction, and Search, 2nd Edition (MIT Press) ¿ Questions ?