1 / 39

Educational Data Mining

Educational Data Mining. Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer Interaction Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology

felixj
Download Presentation

Educational Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Educational Data Mining Ryan S.J.d. BakerPSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer Interaction Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University

  2. In this segment… • We will give a brief overview of classes of Educational Data Mining methods • Discussing in detail • Causal Data Mining • An important Educational Data Mining method • Bayesian Knowledge Tracing • One of the key building blocks of many Educational Data Mining analyses

  3. Baker (under review)EDM Methods • Prediction • Clustering • Relationship Mining • Discovery with Models • Distillation of Data for Human Judgment

  4. Coverage at EDM2008(of 31 papers; not mutually exclusive) • Prediction – 45% • Clustering – 6% • Relationship Mining – 19% • Discovery with Models – 13% • Distillation of Data for Human Judgment – 16% • None of the Above – 6%

  5. We will talk about three approaches now • 2 types of Prediction • 1 type of Relationship Mining Tomorrow, 9:30am: Discovery with Models Yesterday: Some examples of Distillation of Data for Human Judgment

  6. Prediction • Pretty much what it says • A student is using a tutor right now.Is he gaming the system or not? (“attempting to succeed in an interactive learning environment by exploiting properties of the system rather than by learning the material”) • A student has used the tutor for the last half hour. How likely is it that she knows the knowledge component in the next step? • A student has completed three years of high school. What will be her score on the SAT-Math exam?

  7. Two Key Types of Prediction This slide adapted from slide by Andrew W. Moore, Google http://www.cs.cmu.edu/~awm/tutorials

  8. Classification • There is something you want to predict (“the label”) • The thing you want to predict is categorical • The answer is one of a set of categories, not a number • CORRECT/WRONG (sometimes expressed as 0,1) • HELP REQUEST/WORKED EXAMPLE REQUEST/ATTEMPT TO SOLVE • WILL DROP OUT/WON’T DROP OUT • WILL SELECT PROBLEM A,B,C,D,E,F, or G

  9. Classification • Associated with each label are a set of “features”, which maybe you can use to predict the label KnowledgeComp pknow time totalactions right ENTERINGGIVEN 0.704 9 1 WRONG ENTERINGGIVEN 0.502 10 2 RIGHT USEDIFFNUM 0.049 6 1 WRONG ENTERINGGIVEN 0.967 7 3 RIGHT REMOVECOEFF 0.792 16 1 WRONG REMOVECOEFF 0.792 13 2 RIGHT USEDIFFNUM 0.073 5 2 RIGHT ….

  10. Classification • The basic idea of a classifier is to determine which features, in which combination, can predict the label KnowledgeComp pknow time totalactions right ENTERINGGIVEN 0.704 9 1 WRONG ENTERINGGIVEN 0.502 10 2 RIGHT USEDIFFNUM 0.049 6 1 WRONG ENTERINGGIVEN 0.967 7 3 RIGHT REMOVECOEFF 0.792 16 1 WRONG REMOVECOEFF 0.792 13 2 RIGHT USEDIFFNUM 0.073 5 2 RIGHT ….

  11. Many algorithms you can use • Decision Trees (e.g. C4.5, J48, etc.) • Logistic Regression • Etc, etc • In your favorite Machine Learning package • WEKA • RapidMiner • KEEL

  12. Regression • There is something you want to predict (“the label”) • The thing you want to predict is numerical • Number of hints student requests (0, 1, 2, 3...) • How long student takes to answer (4.7 s., 8.9 s., 88.2 s., 0.3 s.) • What will the student’s test score be (95%, 84%, 33%, 100%)

  13. Regression KnowledgeComp pknow time totalactions numhints ENTERINGGIVEN 0.704 9 1 0 ENTERINGGIVEN 0.502 10 2 0 USEDIFFNUM 0.049 6 1 3 ENTERINGGIVEN 0.967 7 3 0 REMOVECOEFF 0.792 16 1 1 REMOVECOEFF 0.792 13 2 0 USEDIFFNUM 0.073 5 2 0 …. Associated with each label are a set of “features”, which maybe you can use to predict the label

  14. Regression KnowledgeComp pknow time totalactions numhints ENTERINGGIVEN 0.704 9 1 0 ENTERINGGIVEN 0.502 10 2 0 USEDIFFNUM 0.049 6 1 3 ENTERINGGIVEN 0.967 7 3 0 REMOVECOEFF 0.792 16 1 1 REMOVECOEFF 0.792 13 2 0 USEDIFFNUM 0.073 5 2 0 …. The basic idea of regression is to determine which features, in which combination, can predict the label’s value

  15. Linear Regression • The most classic form of regression is linear regression • Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions

  16. Many more complex algorithms… • Neural Networks • Support Vector Machines • Surprisingly, Linear Regression performs quite well in many cases despite being overly simple • Particularly when you have a lot of data • Which increasingly is not a problem in EDM…

  17. Relationship Mining • Richard Scheines will now talk about one type of relationship mining, Causal Data Mining

  18. Bayesian Knowledge-Tracing The algorithm behind the skill bars … Being improved by Educational Data Mining Key in many EDM analyses and models

  19. Bayesian Knowledge Tracing • Goal: For each knowledge component (KC), infer the student’s knowledge state from performance. • Suppose a student has six opportunities to apply a KC and makes the following sequence of correct (1) and incorrect (0) responses. Has the student has learned the rule? 0 0 1 0 1 1

  20. Model Learning Assumptions • Two-state learning model • Each skill is either learned or unlearned • In problem-solving, the student can learn a skill at each opportunity to apply the skill • A student does not forget a skill, once he or she knows it • Only one skill per action

  21. Model Performance Assumptions • If the student knows a skill, there is still some chance the student will slip and make a mistake. • If the student does not know a skill, there is still some chance the student will guess correctly.

  22. Corbett and Anderson’s Model p(T) Not learned Learned p(L0) p(G) 1-p(S) correct correct Two Learning Parameters p(L0) Probability the skill is already known before the first opportunity to use the skill in problem solving. p(T) Probability the skill will be learned at each opportunity to use the skill. Two Performance Parameters p(G) Probability the student will guess correctly if the skill is not known. p(S) Probability the student will slip (make a mistake) if the skill is known.

  23. Bayesian Knowledge Tracing • Whenever the student has an opportunity to use a skill, the probability that the student knows the skill is updated using formulas derived from Bayes’ Theorem.

  24. Formulas

  25. Knowledge Tracing • How do we know if a knowledge tracing model is any good? • Our primary goal is to predict knowledge

  26. Knowledge Tracing • How do we know if a knowledge tracing model is any good? • Our primary goal is to predict knowledge • But knowledge is a latent trait

  27. Knowledge Tracing • How do we know if a knowledge tracing model is any good? • Our primary goal is to predict knowledge • But knowledge is a latent trait • But we can check those knowledge predictions by checking how well the model predicts performance

  28. Fitting a Knowledge-Tracing Model • In principle, any set of four parameters can be used by knowledge-tracing • But parameters that predict student performance better are preferred

  29. Knowledge Tracing • So, we pick the knowledge tracing parameters that best predict performance • Defined as whether a student’s action will be correct or wrong at a given time • Effectively a classifier

  30. Recent Advances • Recently, there has been work towards contextualizing the guess and slip parameters(Baker, Corbett, & Aleven, 2008a, 2008b) • The intuition:Do we really think the chance that an incorrect response was a slip is equal when • Student has never gotten action right; spends 78 seconds thinking; answers; gets it wrong • Student has gotten action right 3 times in a row; spends 1.2 seconds thinking; answers; gets it wrong

  31. Recent Advances • In this work, P(G) and P(S) are determined by a model that looks at time, previous history, the type of action, etc. • Significantly improves predictive power of method • Probability of distinguishing correct from incorrect increases by about 15% of potential gain • To 71%, so still room for improvement

  32. Uses • Outside of EDM, can be used to drive tutorial decisions • Within educational data mining, there are several things you can do with these models

  33. Uses of Knowledge Tracing • Often key components in models of other constructs • Help-Seeking and Metacognition (Aleven et al, 2004, 2008) • Gaming the System (Baker et al, 2004, in press) • Off-Task Behavior (Baker, 2007)

  34. Uses of Knowledge Tracing • If you want to understand a student’s strategic/meta-cognitive choices, it is helpful to know whether the student knew the skill • Gaming the system means something different if a student already knows the step, versus if the student doesn’t know it • A student who doesn’t know a skill should ask for help; a student who does, shouldn’t

  35. Uses of Knowledge Tracing • Can be interpreted to learn about skills

  36. Skills from the Algebra Tutor

  37. Which skills could probably be removed from the tutor?

  38. Which skills could use better instruction?

  39. END • This last example is a simple example of Discovery with Models • Tomorrow at 9:30am, we’ll discuss some more complex examples

More Related