1 / 63

Educational Data Mining: Discovery with Models

Educational Data Mining: Discovery with Models. Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University. In this segment….

Download Presentation

Educational Data Mining: Discovery with Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Educational Data Mining:Discovery with Models Ryan S.J.d. BakerPSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University

  2. In this segment… • We will discuss Discovery with Models in (some) detail

  3. Last time… • We gave a very simple example of Discovery with Models using Bayesian Knowledge Tracing

  4. Uses of Knowledge Tracing • Can be interpreted to learn about skills

  5. Skills from the Algebra Tutor

  6. Which skills could probably be removed from the tutor?

  7. Which skills could use better instruction?

  8. Why do Discovery with Models? • We have a model of some construct of interest or importance • Knowledge • Meta-Cognition • Motivation • Affect • Collaborative Behavior • Helping Acts, Insults • Etc.

  9. Why do Discovery with Models? • We can now use that model to • Find outliers of interest by finding out where the model makes extreme predictions • Inspect the model to learn what factors are involved in predicting the construct • Find out the construct’s relationship to other constructs of interest, by studying its correlations/associations/causal relationships with data/models on the other constructs • Study the construct across contexts or students, by applying the model within data from those contexts or students • And more…

  10. Finding Outliers of Interest • Finding outliers of interest by finding out where the model makes extreme predictions • As in the example from Bayesian Knowledge Tracing • As in Ken’s example yesterday of finding upward spikes in learning curves

  11. Model Inspection • By looking at the features in the Gaming Detector, Baker, Corbett, & Koedinger (2004, in press) were able to see that • Students who game the system and have poor learning • game the system on steps they don’t know • Students who game the system and have good learning • game the system on steps they already know

  12. Model Inspection: A tip • The simpler the model, the easier this is to do • Decision Trees and Linear/Step Regression: Easy.

  13. Model Inspection: A tip • The simpler the model, the easier this is to do • Decision Trees and Linear/Step Regression: Easy. • Neural Networks and Support Vector Machines: Fuhgeddaboudit!

  14. Correlations to Other Constructs

  15. Take Model of a Construct • And see whether it co-occurs with other constructs of interest

  16. Example • Detector of gaming the system (in fashion associated with poorer learning) correlated with questionnaire items assessing various motivations and attitudes(Baker et al, 2008)

  17. Example • Detector of gaming the system (in fashion associated with poorer learning) correlated with questionnaire items assessing various motivations and attitudes(Baker et al, 2008) • Surprise: Nothing correlated very well(correlations between gaming and some attitudes statistically significant, but very weak – r < 0.2)

  18. Example • More on this in a minute…

  19. Studying a Construct Across Contexts • Often, but not always, involves:

  20. Model Transfer

  21. Model Transfer • Richard said that prediction assumes that the • Sample where the predictions are made • Is “the same as” • The sample where the prediction model was made • Not entirely true

  22. Model Transfer • It’s more that prediction assumes the differences “aren’t important” • So how do we know that’s the case?

  23. Model Transfer • You can use a classifier in contexts beyond where it was trained, with proper validation • This can be really nice • you may only have to train on data from 100 students and 4 lessons • and then you can use your classifier in cases where there is data from 1000 students and 35 lessons • Especially nice if you have some unlabeled data set with nice properties • Additional data such as questionnaire data(cf. Baker, 2007; Baker, Walonoski, Heffernan, Roll, Corbett, & Koedinger, 2008)

  24. Validate the Transfer • You should make sure your model is valid in the new context(cf. Roll et al, 2005; Baker et al, 2006) • Depending on the type of model, and what features go into it, your model may or may not be valid for data taken • From a different system • In a different context of use • With a different population

  25. Validate the Transfer • For example • Will an off-task detector trained in schools work in dorm rooms?

  26. Validate the Transfer • For example • Will a gaming detector trained in a tutor where {gaming=systematic guessing, hint abuse} • Work in a tutor where{gaming=point cartels}

  27. Validate the Transfer • However • Will a gaming detector trained in a tutor unit where {gaming=systematic guessing, hint abuse} • Work in a different tutor unit where {gaming=systematic guessing, hint abuse}?

  28. Maybe…

  29. Baker, Corbett, Koedinger, & Roll (2006) • We tested whether • A gaming detector trained in a tutor unit where {gaming=systematic guessing, hint abuse} • Would work in a different tutor unit where {gaming=systematic guessing, hint abuse}

  30. Scheme • Train on data from three lessons, test on a fourth lesson • For all possible combinations of 4 lessons (4 combinations)

  31. Transfer lesson .vs. Training lessons • Ability to distinguish students who game from non-gaming students • Overall performance in training lessons: A’ = 0.85 • Overall performance in test lessons: A’ = 0.80 • Difference is NOT significant, Z=1.17, p=0.24 (using Strube’s Adjusted Z)

  32. So transfer is possible… • Of course 4 successes over 4 lessons from the same tutor isn’t enough to conclude that any model trained on 3 lessons will transfer to any new lesson

  33. What we can say is…

  34. If… • If we posit that these four cases are “successful transfer”, and assume they were randomly sampled from lessons in the middle school tutor…

  35. Maximum Likelihood Estimation

  36. Studying a Construct Across Contexts • Using this detector(Baker, 2007)

  37. Research Question • Do students game the system because of state or trait factors? • If trait factors are the main explanation, differences between students will explain much of the variance in gaming • If state factors are the main explanation, differences between lessons could account for many (but not all) state factors, and explain much of the variance in gaming • So: is the student or the lesson a better predictor of gaming?

  38. Application of Detector • After validating its transfer • We applied the gaming detector across 35 lessons, used by 240 students, from a single Cognitive Tutor • Giving us, for each student in each lesson, a gaming frequency

  39. Model • Linear Regression models • Gaming frequency = Lesson + a0 • Gaming frequency = Student + a0

  40. Model • Categorical variables transformed to a set of binaries • i.e. Lesson = Scatterplot becomes • 3DGeometry = 0 • Percents = 0 • Probability = 0 • Scatterplot = 1 • Boxplot = 0 • Etc…

  41. Metrics

  42. r2 • The correlation, squared • The proportion of variability in the data set that is accounted for by a statistical model

  43. r2 • The correlation, squared • The proportion of variability in the data set that is accounted for by a statistical model

  44. r2 • However, a limitation • The more variables you have, the more variance you should be expected to predict, just by chance

  45. r2 • We should expect • 240 students • To predict gaming better than • 35 lessons • Just by overfitting

  46. So what can we do?

  47. Our good friend BiC • Bayesian Information Criterion(Raftery, 1995) • Makes trade-off between goodness of fit and flexibility of fit (number of parameters)

  48. Predictors

  49. The Lesson • Gaming frequency = Lesson + a0 • 35 parameters • r2 = 0.55 • BiC’ = -2370 • Model is significantly better than chance would predict given model size & data set size

More Related