150 likes | 160 Views
Discovering predictive and causal relationships in student behavior to understand what features predict learning, credit card defaults, course grades, success at CMU, and learning styles. Utilizing various data mining techniques such as regression, neural networks, decision trees, and clustering.
E N D
CausalData Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon
1. Predictive Data Mining Finding predictive relationships in data • What feature of student behavior predicts learning • Who will default on credit cards • Who will get an “A” in your course • Which HS students will do well at CMU • Do students cluster by “learning style”
Causal Data Mining Finding causal relationships in data • What feature of student behavior causes learning • What will happen when we make everyone take a reading quiz before each class • What will happen when we program our tutor to intervene to give hints after an error
Predictive Data Mining Data Mining Search Predictive Model Y = f(X1, X2, …Xk)
Predictive Data Mining • Model Classes • Simple Regression • Locally Weighted Regression • Logistic Regression • Neural Nets • Vector Support Machines • Decision Trees • Bayes Net • Naïve Bayes Classifier • Independent Components • Clustering • Etc. Data Mining Search Predictive Model Y = f(X1, X2, …Xk)
Predictive Data Mining Data Mining Search Predictive Model under Constraints Y = f(X1, X2, …Xk), e.g., f Additive functions
Predictive Data Mining Data Mining Search Predictive Model under Constraints Y = f(X1, X2, …Xk), Or Probability Model under Constraints: P(Y | X1, X2, …, Xk), where P Gaussian, with mean 0
Predictive Data Mining Decision Tree Search
Predictive Data Mining ≠Causal Data Mining P(Y | X1, X2, …, Xk) P(Y | X1set, X2, …, Xk) Conditioning is not the same as intervening
Statistical Inference Background Knowledge - X2 before X3 - no unmeasured common causes Causal DiscoveryStatistical DataCausal Structure
Causal Discovery SoftwareTETRAD IV www.phil.cmu.edu/projects/tetrad
Full Semester Online Course in Causal & Statistical Reasoning
Full Semester Online Course in Causal & Statistical Reasoning • Course is tooled to record certain events: • Logins, page requests, print requests, quiz attempts, quiz scores, voluntary exercises attempted, etc. • Each event was associated with attributes: • Time • student-id • Session-id
2002 2003 Printing and Voluntary Comprehension Checks: 2002 --> 2003
Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press) Causality: Models, Reasoning, and Inference, (2000), Judea Pearl, Cambridge Univ. Press Shih, B., Koedinger, K., & Scheines, R. (2008). A Response Time Model for Bottom-Out Hints as Worked Examples. Proceedings of the First Educational Data Mining Conference. Shih, B., Koedinger, K., and Scheines, R. (2007) "Optimizing Student Models for Causality." in Proceedings of the 13th International Conference on Artificial Intelligence in Education. Arnold, A., Beck, J., and Scheines, R. (2006). "Feature Discovery in the Context of Educational Data Mining: An Inductive Approach." Proceedings of the AAAI2006 Workshop on Educational Data Mining, Boston, MA. Scheines, R., Leinhardt, G., Smith, J., and Cho, K. (2005) "Replacing Lecture with Web-Based Course Materials, Journal of Educational Computing Research, 32, 1, 1-26. References