Determining the Significance of Item Order In Randomized Problem Sets

Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of Computer Science

The Problem • In which order should we present tutor content to students? • Many problem sets in ITS where items given in a random order • Randomizing item order is mostly done when there is not an obvious ordering of items that would benefit learning. • Can we data mine user responses to infer orderings that are reliably more beneficial to learning than others? Pardos, Z. A., Heffernan, N. T. In Press (2009) Detecting the Learning Value of Items In a Randomized Problem Set. In Proceedings of the 14th International Conference on Artificial Intelligence in Education. Brighton, UK. IOS Press.

Solution Approach • Possible approach: evaluate each sequence for learning value seq1 Probability of learning: 0.13 seq2 Probability of learning: 0.19 • In this paper we evaluate the learning rates of ordered item pairs, such as should Q1 go before Q2 or should Q2 go before Q1 Item pair (1,2) Probability of learning: 0.09 • If multiple reliable orderings are found, a full sequence could be determined to be best for learning. Item pair (2,1) Probability of learning: 0.14

Solution Application Example Learning rate: 0.14 Learning rate: 0.09 ) ( ) ( > , , Learning rate: 0.15 Learning rate: 0.17 ) ( ) ( < , , ) ( Learning rate: 0.17 , ( ) > , ) ( ,

Model Parameters can be learned with the EM algorithm! .. ? • Modeling or measuring learning requires modeling knowledge • Knowledge Tracing used to model learning Parameters (probability of learning) (guess/slip) P(Skill: 0 → 1) P(Skill: 0 → 1) S S S Latent (skill knowledge) (dichotomous) P(correct| Skill = 0) P(incorrect| Skill = 1) Observables (question answers) incorrect correct correct

Model The six sequence permutations modeled with shared Bayesian parameters Also known as Equivalence classes of CPTs (conditional probability tables ) Novel contribution of paper: Harnessing the power of randomization to help estimate accurate parameters using all response data

Reliability measure • Data for a problem set randomly split into 10 equal size bins by student • Each bin was evaluated separately by the model • Binomial test used to estimate the probability of the null hypothesis, that each ordering is equally likely to have the highest learning rate • ie: binopdf(best_choice_mode,20,0.25) Ordered pair learning rates

Dataset • Student main problem responses (correct/incorrect) to 5 problem sets of 3 questions each • Questions within a problem set relate to the same skill • 295-674 students completed each problem set in 2006-2007 school year data • Questions in the problem sets were presented in a randomized order (required for this analysis) Main problem hint

Confound • Since only main question responses are being analyzed, the learning from the main question is confounded with the learning from the scaffolding and hints of the problem. • In an item pair, learning could be attributed to • The immediate feedback to the main problem of question 1 • The scaffolding of question 1 • Applying concepts from question 1 on question 2’s main problem Main problem hint

Results • Of the 5 problem sets evaluated, two returned statistically reliable orderings • Other item relationships could be tested • In Problem Set 36: (2,1) > (3,1) in 10 out of 10 of the bins

Results Guess and Slip values per question • Values are within reasonable range (< .50) • Same problem sets run with AIED and sequence model • Same guess and slip values were returned • Indicates high stability in parameter estimation among methods

Simulation Validation • Since ground truth of learning rates in the real world are impossible to know, a simulation study was run • The simulation set a variety of values for the parameters of prior, guess/slip and learning rates and then simulated user responses • These responses could then be analyzed by the method using the same technique as was used on real data • 160 simulations run using different combinations of parameters • Parameters for the simulation drawn from a distribution fit to a previous year’s analysis of ASSISTment data.

Simulation Results • More data leads to more reliable rules found • The rate of false positives remains low, independent of number of users • Average false positive is 6.3%, very close to the 5% p-value cutoff of our reliability estimator • Simulation suggests that the results are trustworthy

Limitations • Only problem sets of five questions or less can be reasonably evaluated • Larger problem sets become intractable to compute due to the exponential increase in nodes and permutations as question count increases • for a four question set (4+4)*24 = 192 nodes • for a five question set (5+5) *120 = 1,200 nodes • Possible optimization is to only model the sequences for which there is data • Randomization of question order must be present to control for factors including problem difficulty and allow for detecting learning rates of all item pairs in the problem set

Conclusions & Future Work • We think that this method, and ones built off of it, will facilitate better tutoring systems • Randomization gives many of the properties of a RCE. This method can perform a similar function but in the form of data mining. • Best orderings might have a variety of reasons for existing. Applying this method to investigate those reasons could inform content authors and scientists on best practices in much the same way as randomized controlled experiments do but by utilizing the far more economical means of investigation which is data mining.

Determining the Significance of Item Order In Randomized Problem Sets