460 likes | 647 Views
Classical and Bayesian Computerized Adaptive Testing Algorithms. Richard J. Swartz Department of Biostatistics (rswartz@mdanderson.org). Outline. Principle of computerized adaptive testing Basic statistical concepts and notation Trait estimation methods Item selection methods
E N D
Classical and Bayesian Computerized Adaptive Testing Algorithms Richard J. Swartz Department of Biostatistics (rswartz@mdanderson.org)
Outline Principle of computerized adaptive testing Basic statistical concepts and notation Trait estimation methods Item selection methods Comparisons between methods Current CAT Research Topics
Computerized Adaptive Tests (CAT) • First developed for assessment testing • Test tailored to an individual • Only questions relevant to individual trait level • Shorter tests • Sequential adaptive selection problem • Requires item bank • Fit with IRT models • Extensive initial development before CAT implementation
Item Bank Development I • Qualitative item development • Content experts • Response categories • Test model fit • Likelihood ratio based methods • Model fit indices
Item Bank Development II • Test Assumption: Unidimensionality • Factor analysis • Confirmatory factor analysis • Multidimensional IRT models • Test assumption: Local Dependence • Residual correlation after 1st factor removed • Multidimensional IRT models
Item Bank Development III • Test assumption: Invariance • DIF = differential item functioning • Over time and across groups (i.e. men vs. women) • Across groups • Many different methods (Logistic Regression method, Area between response curves, and others)
CAT Implementation 15 2 Hi Depression 3 7 4 a b a b c c 13 6 Item bank c 8 15 5 5 12 2 a b 9 c b 14 10 11 b 1 Lo Depression
Estimating Traits • Assumes Item parameters are known • Represent the individual’s ability • Done sequentially in CAT • Estimate is updated after each additional response • Maximum Likelihood Estimator • Bayesian Estimators
Likelihood Model describing a person’s response pattern:
Maximum Likelihood Estimate Frequentist: “likely” value to generate the responses Consistency, efficiency depend on selection methods and item bank used. Does not always exist
Bayesian Framework is a random variable A distribution on describes knowledge prior to data collection (Prior distribution) Update information about (Trait) as data is collected (Posterior distribution) Describes distribution of values instead of a point estimate
Bayes Rule • Posterior Likelihood × Prior Combines information about (prior) with information from the data (Likelihood)
Maximum A Posteriori (MAP) Estimate • Properties: • Uniform Prior = equivalent to MLE over support of the prior, • For some prior/likelihood combinations, Posterior can be multimodal
Expected A Posteriori (EAP) Estimate • Properties: • Always exists for a proper prior • Easy to calculate with numerical integration techniques • Prior influences estimate
Posterior Variance Describes variability of Can be used as conditional Standard Error of Measurement (SEM) for a given response pattern.
Item Selection Algorithms • Choose the item that is “best” for the individual being tested • Define “best” • Most information about trait estimate • Greatest reduction in expected variability of trait estimate
Fisher’s Information Information of a given item at a trait value
Maximum Fisher’s Information • Myopic algorithm • Pick the item ik at stage k, (ik Rk) that maximizes Fisher’s information at current trait estimate, (Classically MLE):
Minimum Expected Posterior Variance (MEPV) Selects items that yields the minimum predicted Posterior variance given previous responses Uses predictive distribution Is a myopic Bayesian decision theoretic approach (minimizes Bayes risk) First described by Owen (1969, 1975)
Predictive Distribution Predict the probability of a response to an item given previous responses
Bayesian Decision Theory Dictates optimal (sequential adaptive) decisions In addition to prior and Likelihood, specify a loss function (squared error loss):
Bayesian Decision Theory: Item Selection Optimal estimator for Squared-error loss is posterior mean (EAP) Select item that minimizes Bayes risk:
Minimum Expected Posterior Variance (MEPV) Pick the item ik remaining in the bank at stage k, (ik Rk) that minimizes the expected posterior variance (with respect to the predictive distribution):
Other Information Measures • Weighted Measures • Maximum Likelihood weighted Fisher’s Information(MLWI) • Maximum Posterior Weighted Fisher’s Information (MPWI): • Kulback-Leibler Information: Global Information Measure
Hybrid Algorithms • Maximum Expected Information (MEI) • Use observed information • Predict information for next item • Maximum Expected Posterior Weighted Information (MEPWI) • Use observed information • Predict information for next item • Weight with Posterior • MEPWI MPWI
Mix – N– Match MAP with uniform prior to approximate MLE MFI using EAP instead of MLE (any point information function) Use EAP for item selection, but MFI for final trait estimate
Study Design • Real Item Bank • Depressive symptom items (62) • 4 categories (fit with Graded Response IRT Model) • Peaked Bank: Items have “narrow” coverage • Flat Bank: Items have “wider” coverage • fixed length: 5, 10, 20-item CATs
Datasets Used • Post hoc simulation using real data: • 730 patients and caregivers at MDA • Real bank only • Simulated data: • q grid: -3 to 3 by .5 • 500 “simulees” per q • Simulated and Real banks
Summary • Polytomous items • Choi and Swartz, In press • Classic MFI with MLE, and MLWI not as good as others. • MFI with EAP, and all others essentially perform similarly. • Dichotomous items • (van der Linden, 1998) • MFI with MLE not as good as all others* • Difference more pronounced for shorter tests
Adaptations/ Active Research Areas Constrained adaptive tests/ content balancing Exposure Control A-stratified adaptive testing Item selection including burden Cheating detection Response times
References and Further Reading Choi SW Swartz RJ. (in press) ”Comparison of CAT Item Selection Criteria for Polytomous Items” Applied psychological Measurement. Owen RJ (1969) A Bayesian approach to tailored testing (Research report 69-92) Princeton, NJ: Educational Testing Service Owen RJ (1975). A Bayesian Sequential Procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351-356. van der Linden WJ. (1998). “Bayesian item selection criteria for adaptive testing” Psychometrika, 2, 201-216. van der Linden WJ. & Glas, C. A. W. (Eds). (2000). Computerized Adaptive Testing: Theory and Practice. Dordrecht; Boston: Kluwer Academic.
MLE Properties Usually has desirable asymptotic properties Consistency and efficiency depend on selection criteria and item bank Finite estimate does not exist for repeated responses in categories 1 or m