140 likes | 154 Views
Causal Data Mining. Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon. 1. Predictive Data Mining. Finding predictive relationships in data What feature of student behavior predicts learning Who will default on credit cards
E N D
CausalData Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon
1. Predictive Data Mining Finding predictive relationships in data • What feature of student behavior predicts learning • Who will default on credit cards • Who will get an “A” in your course • Which HS students will do well at CMU • Do students cluster by “learning style”
Causal Data Mining Finding causal relationships in data • What feature of student behavior causes learning • What will happen when we make everyone take a reading quiz before each class • What will happen when we program our tutor to intervene to give hints after an error
Predictive Data Mining Data Mining Search Predictive Model Y = f(X1, X2, …Xk)
Predictive Data Mining • Model Classes • Simple Regression • Locally Weighted Regression • Logistic Regression • Neural Nets • Vector Support Machines • Decision Trees • Bayes Net • Naïve Bayes Classifier • Independent Components • Clustering • Etc. Data Mining Search Predictive Model Y = f(X1, X2, …Xk)
Predictive Data Mining Data Mining Search Predictive Model under Constraints Y = f(X1, X2, …Xk), e.g., f Additive functions
Predictive Data Mining Data Mining Search Predictive Model under Constraints Y = f(X1, X2, …Xk), Or Probability Model under Constraints: P(Y | X1, X2, …, Xk), where P Gaussian, with mean 0
Predictive Data Mining Decision Tree Search
Predictive Data Mining ≠Causal Data Mining P(Y | X1, X2, …, Xk) P(Y | X1set, X2, …, Xk) Conditioning is not the same as intervening Teeth Slides
Statistical Inference Background Knowledge - X2 before X3 - no unmeasured common causes Causal DiscoveryStatistical DataCausal Structure
Causal Discovery SoftwareTETRAD IV www.phil.cmu.edu/projects/tetrad
Full Semester Online Course in Causal & Statistical Reasoning
Full Semester Online Course in Causal & Statistical Reasoning • Course is tooled to record certain events: • Logins, page requests, print requests, quiz attempts, quiz scores, voluntary exercises attempted, etc. • Each event was associated with attributes: • Time • student-id • Session-id
2002 2003 Printing and Voluntary Comprehension Checks: 2002 --> 2003