150 likes | 280 Views
Introduction to IRT/Rasch Measurement with Winsteps Ken Conrad, University of Illinois at Chicago Barth Riley and Michael Dennis, Chestnut Health Systems. Agenda. 12:30. Ken Conrad: Power-point presentation on classical
E N D
Introduction to IRT/Rasch Measurement with WinstepsKen Conrad, University of Illinois at Chicago Barth Riley and Michael Dennis,Chestnut Health Systems
Agenda 12:30. Ken Conrad: Power-point presentation on classical test theory compared to Rasch, includes history and introduction to the Rasch model. 2:15. Break 2:30. Discussion of an application of Rasch analysis in the measurement of posttraumatic stress disorder with interpretation of Rasch/Winsteps output. 3:15. Barth Riley: Implications and Extensions of Rasch Measurement. 4:15. Break. 4:30. Mike Dennis: Practical applications of IRT/Rasch in SUD screening and outcome assessment 5:15. Open discussion and Q & A. 5:30. End of workshop.
The Dream of Rulers of Human Functioning • Beyond organ function to human function—WHO, 1947 • E.g., quality of life, need to ask person • 1970’s--Physical, social, and mental health issues • Measuring many constructs requires many items—time, $, burden • Today—need for psychometric efficiency w/o loss of reliability and construct validity
Prevailing Paradigm, Classical Test Theory • CTT—more items for more reliability • Since we seek efficiency (fewer items), items tend to be where most of the people are—around the mean. • Result—redundancy at mid-range, few items at extremes, ceiling and floor effects • Impossible to measure improvement of those in ceiling and decline of those in floor.
How children measure wooden rods (from Piaget) • Classification—separate the rods from the cups, the balls, etc. (nominal) • Seriation—line them up by size (ordinal) • Iteration—develop a unit to know how much bigger (interval) • Standardization—make a rule(r) and a process for determining how many units each rod has • Children know that classification and seriation are not measurement, Stevens did not: nominal, ordinal, interval, ratio
Improvement: IRT/Rasch measurement and computers • Rasch measurement model enables construction of a ruler with as many items as we want at any level of the construct • The computer enables choice of items based on each person’s pattern of responses. • Each test is tailored to the individual, and not all of the items are needed.
Classical Test Theory A measure is a sample of items from an infinite domain of items that represent the attribute of interest. • Items are treated as replicates of one another in the sense that differences among the items are ignored in scaling. • More items=more reliability • Everyone gets the same items • Answers needed to all items
Ranking is sample dependent E.g., NBA players, jockeys. Height could be in the same 1-5 ordinal metric where both a jockey and NBA player could be rated 5, but this could only be interpreted with reference to a particular sample. The sample defines height. With interval scaling, height defines the sample. Over 6’=NBA, under 6’=jockey.
Classical Test Theory • Uses ordinal data as interval. • Using presumably impermissible transformations, i.e. using ordinal as interval, usually makes little, if any, difference to results of most analyses. • Thus, if it behaves like an interval scale, it can be treated as one. • Just use the raw scores. Add ‘em up. • Clean and easy
Assumption: all items are created equal But we know that is not true. Is that how we measure potatoes? How about spelling? Items actually range from: Easy->hard Like addition -> division E.g., Guttman: 1111100000 Lack of recent practice on item 5: 1111011000 Educated guess on item 8: 1111100100 Slow, nervous start: 0111111000
No Difficulty Parameter in CTT.What if two students both got 5 out of 10 correct, but one got the 5 easiest right and the other the 5 hardest?Easy->hard Peter 1111100000 Paul 0000011111Do they have the same ability? Wouldn’t you like to get a better idea of what happened on Paul’s test? Did he arrive late? Were test pages missing? Maybe they were word problems, and Paul is a foreign student.
With CTT, extremely difficult to compare a person’s scores on two or more different tests—usually compare z-scores. • Assumes that samples of both tests center on the same mean. • Assumes that all of the tests are normally distributed, which is rarely the case.
Assumptions of CTT • CTT =take the test, e.g., SD, D, A, or SA on 50 items. What if there is missing data? • CTT uses ordinal scaling, but assumes equal intervals in the rating scale. However, we know that distances between scale points usually are not equal, e.g., The President is doing a good job. SD D A SA To WWII veterans: Do you wear fashionable shoes? N SD D A SA CTT gives us very limited ability to examine the performance of our rating scales. Do they really work the way we want them to?
Cronbach’s Alpha • Adding items improves alpha, but are they good items? • Ceiling and floor effects improve alpha. • CTT assumes homoscedasticity—that the error of measurement is the same at the high end of the scale as in the middle or at the low end. • However, ordinal measures are biased, especially at the extremes where there is much more error.
To Count > To Measure E.G., From counting potatoes to measuring their quality. From counting number of drinks to measuring substance use disorders. From summing Likert ratings to linear, interval measurement.