1 / 46

Classical and Bayesian Computerized Adaptive Testing Algorithms

Classical and Bayesian Computerized Adaptive Testing Algorithms. Richard J. Swartz Department of Biostatistics (rswartz@mdanderson.org). Outline. Principle of computerized adaptive testing Basic statistical concepts and notation Trait estimation methods Item selection methods

metta
Download Presentation

Classical and Bayesian Computerized Adaptive Testing Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classical and Bayesian Computerized Adaptive Testing Algorithms Richard J. Swartz Department of Biostatistics (rswartz@mdanderson.org)

  2. Outline Principle of computerized adaptive testing Basic statistical concepts and notation Trait estimation methods Item selection methods Comparisons between methods Current CAT Research Topics

  3. Computerized Adaptive Tests (CAT) • First developed for assessment testing • Test tailored to an individual • Only questions relevant to individual trait level • Shorter tests • Sequential adaptive selection problem • Requires item bank • Fit with IRT models • Extensive initial development before CAT implementation

  4. Item Bank Development I • Qualitative item development • Content experts • Response categories • Test model fit • Likelihood ratio based methods • Model fit indices

  5. Item Bank Development II • Test Assumption: Unidimensionality • Factor analysis • Confirmatory factor analysis • Multidimensional IRT models • Test assumption: Local Dependence • Residual correlation after 1st factor removed • Multidimensional IRT models

  6. Item Bank Development III • Test assumption: Invariance • DIF = differential item functioning • Over time and across groups (i.e. men vs. women) • Across groups • Many different methods (Logistic Regression method, Area between response curves, and others)

  7. CAT Implementation 15 2 Hi Depression 3 7 4 a b a b c c 13 6 Item bank c 8 15 5 5 12 2 a b 9 c b 14 10 11 b 1 Lo Depression

  8. CAT Item Selection

  9. Basic Concepts/ Notation

  10. Basic Concepts/ Notation II

  11. Trait Estimation

  12. Estimating Traits • Assumes Item parameters are known • Represent the individual’s ability • Done sequentially in CAT • Estimate is updated after each additional response • Maximum Likelihood Estimator • Bayesian Estimators

  13. Likelihood Model describing a person’s response pattern:

  14. Maximum Likelihood Estimate Frequentist: “likely” value to generate the responses Consistency, efficiency depend on selection methods and item bank used. Does not always exist

  15. Bayesian Framework  is a random variable A distribution on  describes knowledge prior to data collection (Prior distribution) Update information about  (Trait) as data is collected (Posterior distribution) Describes distribution of  values instead of a point estimate

  16. Bayes Rule • Posterior  Likelihood × Prior Combines information about  (prior) with information from the data (Likelihood)

  17. Maximum A Posteriori (MAP) Estimate • Properties: • Uniform Prior = equivalent to MLE over support of the prior, • For some prior/likelihood combinations, Posterior can be multimodal

  18. Expected A Posteriori (EAP) Estimate • Properties: • Always exists for a proper prior • Easy to calculate with numerical integration techniques • Prior influences estimate

  19. Posterior Variance Describes variability of  Can be used as conditional Standard Error of Measurement (SEM) for a given response pattern.

  20. ITEM SELECTION

  21. Item Selection Algorithms • Choose the item that is “best” for the individual being tested • Define “best” • Most information about trait estimate • Greatest reduction in expected variability of trait estimate

  22. Fisher’s Information Information of a given item at a trait value

  23. Maximum Fisher’s Information • Myopic algorithm • Pick the item ik at stage k, (ik Rk) that maximizes Fisher’s information at current trait estimate, (Classically MLE):

  24. MFI - Selection

  25. Minimum Expected Posterior Variance (MEPV) Selects items that yields the minimum predicted Posterior variance given previous responses Uses predictive distribution Is a myopic Bayesian decision theoretic approach (minimizes Bayes risk) First described by Owen (1969, 1975)

  26. Predictive Distribution Predict the probability of a response to an item given previous responses

  27. Bayesian Decision Theory Dictates optimal (sequential adaptive) decisions In addition to prior and Likelihood, specify a loss function (squared error loss):

  28. Bayesian Decision Theory: Item Selection Optimal estimator for Squared-error loss is posterior mean (EAP) Select item that minimizes Bayes risk:

  29. Minimum Expected Posterior Variance (MEPV) Pick the item ik remaining in the bank at stage k, (ik Rk) that minimizes the expected posterior variance (with respect to the predictive distribution):

  30. Other Information Measures • Weighted Measures • Maximum Likelihood weighted Fisher’s Information(MLWI) • Maximum Posterior Weighted Fisher’s Information (MPWI): • Kulback-Leibler Information: Global Information Measure

  31. Hybrid Algorithms • Maximum Expected Information (MEI) • Use observed information • Predict information for next item • Maximum Expected Posterior Weighted Information (MEPWI) • Use observed information • Predict information for next item • Weight with Posterior • MEPWI  MPWI

  32. Mix – N– Match MAP with uniform prior to approximate MLE MFI using EAP instead of MLE (any point information function) Use EAP for item selection, but MFI for final trait estimate

  33. COMPARISONS

  34. Study Design • Real Item Bank • Depressive symptom items (62) • 4 categories (fit with Graded Response IRT Model) • Peaked Bank: Items have “narrow” coverage • Flat Bank: Items have “wider” coverage • fixed length: 5, 10, 20-item CATs

  35. Datasets Used • Post hoc simulation using real data: • 730 patients and caregivers at MDA • Real bank only • Simulated data: • q grid: -3 to 3 by .5 • 500 “simulees” per q • Simulated and Real banks

  36. Real Item Bank Characteristics

  37. Real Bank, Real Data, 5 Items

  38. Real Bank, Real Data, 5 items

  39. Peaked Bank, Sim. Data, 5 Item

  40. Peaked Bank, Sim. Data, 5 Item

  41. Summary • Polytomous items • Choi and Swartz, In press • Classic MFI with MLE, and MLWI not as good as others. • MFI with EAP, and all others essentially perform similarly. • Dichotomous items • (van der Linden, 1998) • MFI with MLE not as good as all others* • Difference more pronounced for shorter tests

  42. Adaptations/ Active Research Areas Constrained adaptive tests/ content balancing Exposure Control A-stratified adaptive testing Item selection including burden Cheating detection Response times

  43. Thank You!

  44. References and Further Reading Choi SW Swartz RJ.  (in press) ”Comparison of CAT Item Selection Criteria for Polytomous Items” Applied psychological Measurement. Owen RJ (1969) A Bayesian approach to tailored testing (Research report 69-92) Princeton, NJ: Educational Testing Service Owen RJ (1975). A Bayesian Sequential Procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351-356. van der Linden WJ. (1998). “Bayesian item selection criteria for adaptive testing” Psychometrika, 2, 201-216. van der Linden WJ. & Glas, C. A. W. (Eds). (2000). Computerized Adaptive Testing: Theory and Practice. Dordrecht; Boston: Kluwer Academic.

  45. MLE Properties Usually has desirable asymptotic properties Consistency and efficiency depend on selection criteria and item bank Finite estimate does not exist for repeated responses in categories 1 or m

More Related