220 likes | 328 Views
Classification: Hands-on Activity. Hands-on activity. Predicting gaming the system in RapidMiner. Open RapidMiner. And open classifier.xml We currently have the model set up to build a decision tree with action-level cross-validation Click the Run button (blue triangle).
E N D
Hands-on activity • Predicting gaming the system in RapidMiner
Open RapidMiner • And open classifier.xml • We currently have the model set up to build a decision tree with action-level cross-validation • Click the Run button (blue triangle)
Check model goodness • Go to AUC (AUC is another way of referring to A’) • Pretty good, eh? • What’s the problem?
Check model goodness • This model uses data from the same student both to train and test • And more seriously…
Re-run • Move W-J48 above XValidation • Disable XValidation • Right-click Xvalidation and click on Enable Operator • Run the model • Click on text view • What do you see?
Check model goodness • W-J48 • J48 pruned tree ------------------ • pknow <= 0.071168 | • student = N46z59pQP58: N (10.0) | • student = N5LMy832c47: N (5.0) | • student = N668lBbaKFE: N (16.0) | • student = N6O31vedZbI: N (1.0) | • student = tB4vqSxzqo: G (10.0) | • student = N3hSu07XfGd: N (6.0) | • student = N6bJ4auIa8L: N (10.0) | …
What’s wrong with this? • It’s fitting to the student! • Definitely not a model that could be used with new students!
Which features should we remove • For a model that has some hope of being generalizable • (Open the data set in Excel to take a look) • WEKA-CTA1Z04-fordev.csv on your USB flash drive
Which features should we remove • For a model that has some hope of being generalizable • Num • Student • Lesson • Lspair • Skill • Cell • Leave group in, it’s a special case
So… • Let’s quickly go into excel, do that, and re-save (with a new file name) • Now go back to RapidMiner • Change the file name in CSVExampleSource to your new file name • Run again
So… • Anything wrong here?
So… • Yup, no model. • So everything we had was over-fitting
Let’s take • A more extensive data set • Specifically, one with additional distilled features • WEKA-CTA1Z04-allfeatures.csv
What to do • Remove the same over-fitting features as before • Change the file name in CSVExampleSource to your new file name • Run! • What A’ do you get?
But wait… • We still are using data from the same student in both the training set and test set • Replace XValidation with BatchXValidation • Right-click on XValidation, click Replace Operator, go to Validation, go to Other • If you look at ChangeAttributeRole, you’ll see that we are using “group” to set up the cross-validation level (and group refers to a pre-chosen group of students, approximately equal in number of students and number of actions) • Run! • What A’ do you get?
A’ • What A’ do you get? • For me, the incorrect cross-validation did not actually make a difference… this time. • Did it make a difference for any of you? • (It does make a difference sometimes!)
More things worth trying • Adding interaction features • F1 * F2… • You just need to enable a disabled operator…
That takes a while! • Can you filter out the interaction parameters that are closely correlated to each other?
Trying out other algorithms • What if you want to use step regression instead of J48? • Or logistic regression? • Or decision stumps? • Can you replace the W-J48 operator with these?