380 likes | 392 Views
The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands. c.a.w.glas@gw.utwente.nl. University of Twente. Measuring body height with a questionnaire. 1. I bump my head quite often
E N D
The Impact of Item Response Theory in Educational Assessment: A Practical Point of ViewCees A.W. GlasUniversity of Twente, The Netherlands c.a.w.glas@gw.utwente.nl University of Twente
Measuring body height with a questionnaire 1. I bump my head quite often 2. For school pictures I was always asked to stand in the first row 3. In bed, I often suffer from cold feet 4. When walking down the stairs, I often take two steps at a time 5. I think I would do well in a basket ball team 6. As a police officer, I would not make much of an impression 7. In most cars I sit uncomfortably 8. I literally look up to most of my friends 9. Etc.
Test of Body Height 3 7 5 9 11 13 1 18 2 4 8 6 21 6 16 Ann Jim Jo
Item Response Curve Rasch model Probability Correct Response Latent Ability Scale
Item Response Function Discrimination Probability of Success Guessing Difficulty Ability
Applications • Local reliability and optimal test construction • Test Equating • Multilevel item response theory in school effectiveness research
Item and Test Information • Information is a local measure of reliability • Item and test information function • In Adaptive Testing items are selected to maximize information at the estimated ability of examinee.
Adaptive Item Selection Information
Adaptive Item Selection Cont’d Information Item 1
Adaptive Item Selection Cont’d Test Item 2 Item 1 Information
Adaptive Item Selection Cont’d Test Information Item 3 Item 2 Item 1
Item and Test Information Cont’d Test Information Items Ability
Adaptive Testing with Content Constraints • Psychometrically optimal adaptive individualized testing • Test content specifications • Psychometrically optimal within content constraints and practical constraints • Discrete optimization problem
Adaptive Testing with Content Constraints Law School Admission Test • content constraints • item type constraints • word count constraints • answer key constraints • gender / minority orientation • clusters of items (testlets) • some items contain clues to each other
Test Constraints • Constraints are imposed by Linear - Programming techniques • For every item i a variable is defined
Test assembly model Maximize information in the test Item i is selected for the test or not. At most 5 items on statistics Items 12 and 35 contain clues to each other Time available is 60 minutes
Equating of Examinations • Problem: level of students and difficulty of examinations fluctuate over the years • Objective: to determine pass/fail cut-off scores on examinations in such a way that it reflects the same level of proficiency on the latent scale, • taking into account the difficulty level of the examinations • and differences in proficiency level over years
Simple Deterministic Model • Important feature of the model: Parameter Separation: distinct parameters for persons and items University of Twente
Model for Item with 5 response categories Probability Response Category X=0 X=4 X=1 X=3 X=2 Latent Ability Scale
Multidimensional IRT model University of Twente
Problems Anchor Item Design • Student ability increases between test administrations due to learning • Difference in ability and item ordering between anchor test and examination due to low motivation of students • If anchor test becomes known, the test functions different over the years • All these effects violate the model and bias the estimated cut-off scores
Measurement model: GPCM • Alternatives to GPCM (Muraki): • Graded Response Model (Samejima) • Sequential Model (Tutz)
Structural Model Takane and de Leeuw (1987) Model is equivalent with a factor analysis model: Discrimination parameters are factor loadings Ability parameters are factor scores
Problems with “ordinary” regression and analysis of variance models • Different aggregation levels: school level and student level • Variance structure: students within schools are more similar than students from different schools • Old unsatisfactory solutions: • aggregating to school level • disaggregating to student level • Newer solutions: multilevel models: Bryk & Raudenbush, Longford, Goldstein
Motivation for this approachAll the niceties of IRT are available in Multilevel Analysis • Method to model unreliability in the dependent and independent variables • Hetroscedasticity: reliability is defined locally • Incomplete test administration and calibration design (possibility to include selection models) • No assumption of normally distributed scores • Less ceiling problems
An Example (Shalabi, Fox, Glas, Bosker) • 3384 grade seven pupils in 119 schools in the West Bank • Mathematics test • Gender • SES • IQ • School Leadership • School Climate
Model: Intra-class correlation:
Conclusions • IRT is based on the idea of parameter separation • An IRT measurement model can be combined with a structural model • The combined model is equivalent with factor analysis and latent variable models and as such a generalization of other well-known regression models • Applications of IRT • Local reliability and optimal test construction • Test Equating • Multilevel IRT in school effectiveness research