Classical and Bayesian Computerized Adaptive Testing Algorithms

Classical and Bayesian Computerized Adaptive Testing Algorithms Richard J. Swartz Department of Biostatistics (rswartz@mdanderson.org)

Outline Principle of computerized adaptive testing Basic statistical concepts and notation Trait estimation methods Item selection methods Comparisons between methods Current CAT Research Topics

Computerized Adaptive Tests (CAT) • First developed for assessment testing • Test tailored to an individual • Only questions relevant to individual trait level • Shorter tests • Sequential adaptive selection problem • Requires item bank • Fit with IRT models • Extensive initial development before CAT implementation

Item Bank Development I • Qualitative item development • Content experts • Response categories • Test model fit • Likelihood ratio based methods • Model fit indices

Item Bank Development II • Test Assumption: Unidimensionality • Factor analysis • Confirmatory factor analysis • Multidimensional IRT models • Test assumption: Local Dependence • Residual correlation after 1st factor removed • Multidimensional IRT models

Item Bank Development III • Test assumption: Invariance • DIF = differential item functioning • Over time and across groups (i.e. men vs. women) • Across groups • Many different methods (Logistic Regression method, Area between response curves, and others)

CAT Implementation 15 2 Hi Depression 3 7 4 a b a b c c 13 6 Item bank c 8 15 5 5 12 2 a b 9 c b 14 10 11 b 1 Lo Depression

CAT Item Selection

Basic Concepts/ Notation

Basic Concepts/ Notation II

Trait Estimation

Estimating Traits • Assumes Item parameters are known • Represent the individual’s ability • Done sequentially in CAT • Estimate is updated after each additional response • Maximum Likelihood Estimator • Bayesian Estimators

Likelihood Model describing a person’s response pattern:

Maximum Likelihood Estimate Frequentist: “likely” value to generate the responses Consistency, efficiency depend on selection methods and item bank used. Does not always exist

Bayesian Framework  is a random variable A distribution on  describes knowledge prior to data collection (Prior distribution) Update information about  (Trait) as data is collected (Posterior distribution) Describes distribution of  values instead of a point estimate

Bayes Rule • Posterior  Likelihood × Prior Combines information about  (prior) with information from the data (Likelihood)

Maximum A Posteriori (MAP) Estimate • Properties: • Uniform Prior = equivalent to MLE over support of the prior, • For some prior/likelihood combinations, Posterior can be multimodal

Expected A Posteriori (EAP) Estimate • Properties: • Always exists for a proper prior • Easy to calculate with numerical integration techniques • Prior influences estimate

Posterior Variance Describes variability of  Can be used as conditional Standard Error of Measurement (SEM) for a given response pattern.

ITEM SELECTION

Item Selection Algorithms • Choose the item that is “best” for the individual being tested • Define “best” • Most information about trait estimate • Greatest reduction in expected variability of trait estimate

Fisher’s Information Information of a given item at a trait value

Maximum Fisher’s Information • Myopic algorithm • Pick the item ik at stage k, (ik Rk) that maximizes Fisher’s information at current trait estimate, (Classically MLE):

MFI - Selection

Minimum Expected Posterior Variance (MEPV) Selects items that yields the minimum predicted Posterior variance given previous responses Uses predictive distribution Is a myopic Bayesian decision theoretic approach (minimizes Bayes risk) First described by Owen (1969, 1975)

Predictive Distribution Predict the probability of a response to an item given previous responses

Bayesian Decision Theory Dictates optimal (sequential adaptive) decisions In addition to prior and Likelihood, specify a loss function (squared error loss):

Bayesian Decision Theory: Item Selection Optimal estimator for Squared-error loss is posterior mean (EAP) Select item that minimizes Bayes risk:

Minimum Expected Posterior Variance (MEPV) Pick the item ik remaining in the bank at stage k, (ik Rk) that minimizes the expected posterior variance (with respect to the predictive distribution):

Other Information Measures • Weighted Measures • Maximum Likelihood weighted Fisher’s Information(MLWI) • Maximum Posterior Weighted Fisher’s Information (MPWI): • Kulback-Leibler Information: Global Information Measure

Hybrid Algorithms • Maximum Expected Information (MEI) • Use observed information • Predict information for next item • Maximum Expected Posterior Weighted Information (MEPWI) • Use observed information • Predict information for next item • Weight with Posterior • MEPWI  MPWI

Mix – N– Match MAP with uniform prior to approximate MLE MFI using EAP instead of MLE (any point information function) Use EAP for item selection, but MFI for final trait estimate

COMPARISONS

Study Design • Real Item Bank • Depressive symptom items (62) • 4 categories (fit with Graded Response IRT Model) • Peaked Bank: Items have “narrow” coverage • Flat Bank: Items have “wider” coverage • fixed length: 5, 10, 20-item CATs

Datasets Used • Post hoc simulation using real data: • 730 patients and caregivers at MDA • Real bank only • Simulated data: • q grid: -3 to 3 by .5 • 500 “simulees” per q • Simulated and Real banks

Real Item Bank Characteristics

Real Bank, Real Data, 5 Items

Real Bank, Real Data, 5 items

Peaked Bank, Sim. Data, 5 Item

Summary • Polytomous items • Choi and Swartz, In press • Classic MFI with MLE, and MLWI not as good as others. • MFI with EAP, and all others essentially perform similarly. • Dichotomous items • (van der Linden, 1998) • MFI with MLE not as good as all others* • Difference more pronounced for shorter tests

Adaptations/ Active Research Areas Constrained adaptive tests/ content balancing Exposure Control A-stratified adaptive testing Item selection including burden Cheating detection Response times

Thank You!

References and Further Reading Choi SW Swartz RJ. (in press) ”Comparison of CAT Item Selection Criteria for Polytomous Items” Applied psychological Measurement. Owen RJ (1969) A Bayesian approach to tailored testing (Research report 69-92) Princeton, NJ: Educational Testing Service Owen RJ (1975). A Bayesian Sequential Procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351-356. van der Linden WJ. (1998). “Bayesian item selection criteria for adaptive testing” Psychometrika, 2, 201-216. van der Linden WJ. & Glas, C. A. W. (Eds). (2000). Computerized Adaptive Testing: Theory and Practice. Dordrecht; Boston: Kluwer Academic.

MLE Properties Usually has desirable asymptotic properties Consistency and efficiency depend on selection criteria and item bank Finite estimate does not exist for repeated responses in categories 1 or m

Classical and Bayesian Computerized Adaptive Testing Algorithms