1 / 107

Educational Data Mining

Educational Data Mining. March 3, 2010. Today’s Class. EDM Assignmen t#5 Mega-Survey. Educational Data Mining.

lixue
Download Presentation

Educational Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Educational Data Mining March 3, 2010

  2. Today’s Class • EDM • Assignment#5 • Mega-Survey

  3. Educational Data Mining • “Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.” • www.educationaldatamining.org

  4. Classes of EDM Method(Romero & Ventura, 2007) • Information Visualization • Web mining • Clustering, Classification, Outlier Detection • Association Rule Mining/Sequential Pattern Mining • Text Mining

  5. Classes of EDM Method(Baker & Yacef, 2009) • Prediction • Clustering • Relationship Mining • Discovery with Models • Distillation of Data For Human Judgment

  6. Prediction • Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables) • Which students are using CVS? • Which students will fail the class?

  7. Clustering • Find points that naturally group together, splitting full data set into set of clusters • Usually used when nothing is known about the structure of the data • What behaviors are prominent in domain? • What are the main groups of students?

  8. Relationship Mining • Discover relationships between variables in a data set with many variables • Association rule mining • Correlation mining • Sequential pattern mining • Causal data mining • Beck & Mostow (2008) article is a great example of this

  9. Discovery with Models • Pre-existing model (developed with EDM prediction methods… or clustering… or knowledge engineering) • Applied to data and used as a component in another analysis

  10. Distillation of Data for Human Judgment • Making complex data understandable by humans to leverage their judgment • Text replays are a simple example of this

  11. Focus of today’s class • Prediction • Clustering • Relationship Mining • Discovery with Models • Distillation of Data For Human Judgment • There will be a term-long class on this, taught by Joe Beck, in coordination with Carolina Ruiz’s Data Mining class, in a future year • Strongly recommended

  12. Prediction • Pretty much what it says • A student is using a tutor right now.Is he gaming the system or not? • A student has used the tutor for the last half hour. How likely is it that she knows the knowledge component in the next step? • A student has completed three years of high school. What will be her score on the SAT-Math exam?

  13. Two Key Types of Prediction This slide adapted from slide by Andrew W. Moore, Google http://www.cs.cmu.edu/~awm/tutorials

  14. Classification • General Idea • Canonical Methods • Assessment • Ways to do assessment wrong

  15. Classification • There is something you want to predict (“the label”) • The thing you want to predict is categorical • The answer is one of a set of categories, not a number • CORRECT/WRONG (sometimes expressed as 0,1) • HELP REQUEST/WORKED EXAMPLE REQUEST/ATTEMPT TO SOLVE • WILL DROP OUT/WON’T DROP OUT • WILL SELECT PROBLEM A,B,C,D,E,F, or G

  16. Classification • Associated with each label are a set of “features”, which maybe you can use to predict the label Skill pknow time totalactions right ENTERINGGIVEN 0.704 9 1 WRONG ENTERINGGIVEN 0.502 10 2 RIGHT USEDIFFNUM 0.049 6 1 WRONG ENTERINGGIVEN 0.967 7 3 RIGHT REMOVECOEFF 0.792 16 1 WRONG REMOVECOEFF 0.792 13 2 RIGHT USEDIFFNUM 0.073 5 2 RIGHT ….

  17. Classification • The basic idea of a classifier is to determine which features, in which combination, can predict the label Skill pknow time totalactions right ENTERINGGIVEN 0.704 9 1 WRONG ENTERINGGIVEN 0.502 10 2 RIGHT USEDIFFNUM 0.049 6 1 WRONG ENTERINGGIVEN 0.967 7 3 RIGHT REMOVECOEFF 0.792 16 1 WRONG REMOVECOEFF 0.792 13 2 RIGHT USEDIFFNUM 0.073 5 2 RIGHT ….

  18. Classification • Of course, usually there are more than 4 features • And more than 7 actions/data points • I’ve recently done analyses with 800,000 student actions, and 26 features

  19. Classification • Of course, usually there are more than 4 features • And more than 7 actions/data points • I’ve recently done analyses with 800,000 student actions, and 26 features • 5 years ago that would’ve been a lot of data • These days, in the EDM world, it’s just a medium-sized data set

  20. Classification • One way to classify is with a Decision Tree (like J48) PKNOW <0.5 >=0.5 TIME TOTALACTIONS <6s. >=6s. <4 >=4 RIGHT WRONG RIGHT WRONG

  21. Classification • One way to classify is with a Decision Tree (like J48) PKNOW <0.5 >=0.5 TIME TOTALACTIONS <6s. >=6s. <4 >=4 RIGHT WRONG RIGHT WRONG Skill pknow time totalactions right COMPUTESLOPE 0.544 9 1 ?

  22. Classification Another way to classify is with step regression Linear regression (discussed later), with a cut-off

  23. And of course… • There are lots of other classification algorithms you can use... • SMO (support vector machine) • In your favorite Machine Learning package

  24. And of course… • There are lots of other classification algorithms you can use... • SMO (support vector machine) • In your favorite Machine Learning package • WEKA

  25. And of course… • There are lots of other classification algorithms you can use... • SMO (support vector machine) • In your favorite Machine Learning package • WEKA • RapidMiner

  26. And of course… • There are lots of other classification algorithms you can use... • SMO (support vector machine) • In your favorite Machine Learning package • WEKA • RapidMiner • KEEL

  27. And of course… • There are lots of other classification algorithms you can use... • SMO (support vector machine) • In your favorite Machine Learning package • WEKA • RapidMiner • KEEL • RapidMiner

  28. And of course… • There are lots of other classification algorithms you can use... • SMO (support vector machine) • In your favorite Machine Learning package • WEKA • RapidMiner • KEEL • RapidMiner • RapidMiner

  29. And of course… • There are lots of other classification algorithms you can use... • SMO (support vector machine) • In your favorite Machine Learning package • WEKA • RapidMiner • KEEL • RapidMiner • RapidMiner • RapidMiner

  30. Comments? Questions?

  31. How can you tell if a classifier is any good?

  32. How can you tell if a classifier is any good? • What about accuracy? • # correct classifications total number of classifications • 9200 actions were classified correctly, out of 10000 actions = 92% accuracy, and we declare victory.

  33. What are some limitations of accuracy?

  34. Biased training set • What if the underlying distribution that you were trying to predict was: • 9200 correct actions, 800 wrong actions • And your model predicts that every action is correct • Your model will have an accuracy of 92% • Is the model actually any good?

  35. What are some alternate metrics you could use?

  36. What are some alternate metrics you could use? • Kappa (Accuracy – Expected Accuracy) (1 – Expected Accuracy)

  37. What are some alternate metrics you could use? • A’ • The probability that if the model is given an example from each category, it will accurately identify which is which

  38. Comparison • Kappa • easier to compute • works for an unlimited number of categories • wacky behavior when things are worse than chance • difficult to compare two kappas in different data sets (K=0.6 is not always better than K=0.5)

  39. Comparison • A’ • more difficult to compute • only works for two categories (without complicated extensions) • meaning is invariant across data sets (A’=0.6 is always better than A’=0.55) • very easy to interpret statistically

  40. Comments? Questions?

  41. What data set should you generally test on? • A vote… • Raise your hands as many times as you like

  42. What data set should you generally test on? • The data set you trained your classifier on • A data set from a different tutor • Split your data set in half (by students), train on one half, test on the other half • Split your data set in ten (by actions). Train on each set of 9 sets, test on the tenth. Do this ten times. • Votes?

  43. What data set should you generally test on? • The data set you trained your classifier on • A data set from a different tutor • Split your data set in half (by students), train on one half, test on the other half • Split your data set in ten (by actions). Train on each set of 9 sets, test on the tenth. Do this ten times. • What are the benefits and drawbacks of each?

  44. The dangerous one(though still sometimes OK) • The data set you trained your classifier on • If you do this, there is serious danger of over-fitting

  45. The dangerous one(though still sometimes OK) • You have ten thousand data points. • You fit a parameter for each data point. • “If data point 1, RIGHT. If data point 78, WRONG…” • Your accuracy is 100% • Your kappa is 1 • Your model will neither work on new data, nor will it tell you anything.

  46. The dangerous one(though still sometimes OK) • The data set you trained your classifier on • When might this one still be OK?

  47. K-fold cross validation (standard) • Split your data set in ten (by action). Train on each set of 9 sets, test on the tenth. Do this ten times. • What can you infer from this?

  48. K-fold cross validation (standard) • Split your data set in ten (by action). Train on each set of 9 sets, test on the tenth. Do this ten times. • What can you infer from this? • Your detector will work with new data from the same students

  49. K-fold cross validation (student-level) • Split your data set in half (by student), train on one half, test on the other half • What can you infer from this?

  50. K-fold cross validation (student-level) • Split your data set in half (by student), train on one half, test on the other half • What can you infer from this? • Your detector will work with data from new students from the same population (whatever it was)

More Related