400 likes | 582 Views
Implications and Extensions of Rasch Measurement. New Rules of Measurement. The Rasch model has introduced several new “rules” of measurement, which are in stark contrast to the old rules. Rule 1: Standard Errors. Old Rule
E N D
New Rules of Measurement • The Rasch model has introduced several new “rules” of measurement, which are in stark contrast to the old rules.
Rule 1: Standard Errors • Old Rule • The standard error of measurement applies to all scores in a population • "if the score distribution approaches normality, and if obtained scores do not extend over the entire possible range, the standard error of measurement is probably uniform at all score levels" (Guilford, 1965 p. 445). • New Rule: • The standard error of measurement varies across persons with different abilities/trait levels
Implications of Rule 1 • In classical test theory, standard errors of raw scores can lead one to believe that zero and perfect scores are perfectly estimated! • The opposite is the case in Rasch measurement. • In Rasch, each examinee measure has its own standard error, irrespective of who, if any one, takes the same test.
Rule 2: Test Length and Reliability • Old Rule: • Longer tests are more reliable • New Rule: • Shorter tests can be more reliable than longer tests. • While a longer test with the same sort of items is more reliable, this does not preclude the possibility that a shorter test with different items could be equally or more reliable.
Rule 3: Interchangeable Test Forms • Old Rule: • Comparing scores from different forms of an instrument requires test parallelism. • Test forms must be comparable in item difficulty. • New Rule: • Equating test forms that vary in item difficulty is not only possible, but it results in better estimation of trait levels.
Rule 4: Item Properties • Old Rule: • Unbiased assessment of item properties (I.e., difficulty) requires representative samples from the target population. • New Rule: • Unbiased estimates of item properties may be obtained from unrepresentative samples.
Rule 4 • Bias: incorrect decisions due to poor test-to-sample targeting. • Representative: The sample trait distribution matches the distribution of the population. • In Rasch measurement, unbiased estimates of item difficulty parameters can be obtained regardless of the way in which person measures are distributed.
Rule 5: Meaningful Measures • Old Rule: • Meaningful interpretations of scores are obtained by comparing scores relative to a distribution (standardization sample). • Conversion of scores into t scores, percentiles. • New Rule: • Meaningful interpretations of measures are obtained by comparing the distance of measures to various items. • Item and person maps.
Rule 6: Interval Measurement • Old Rule: • Interval measurement is achieved to the extent that items produce normally distributed scale scores. • New Rule: • Interval measurement is achieved to the extent that the data fit the Rasch model.
Summary • The Rasch model with its new rules of measurement make it possible to: • Achieve measurement that is free of the distributional properties of samples of persons and items. • More easily equate different instrument forms • Analyze an item’s characteristics irrespective of other items or sample characteristics. • Create better and shorter instruments, including: • Computerized adaptive testing
Short vs. Long Instruments • Floor and ceiling effects • Limited content validity • Lack precision • Burden on respondent • Redundant information • May lack specificity Difficult to crosswalk without common items Short Long Instrument Length
Computer Adaptive Testing • A CAT works much like a trained clinical interviewer: • Selects questions based on the client’s previous responses. • Can cover a broad range of potential problems/diagnoses quickly. • Continues to ask questions until sufficient information for a diagnosis has been obtained.
Benefits of CAT & Item Banking Respondent Burden Tailoring/ Specificity CAT Coverage of content domains Floor and ceiling effects Item Bank
Item Banking Items for Instrument A Items for Instrument B Items for Instrument C Items for Instrument A Item Pool Item Pool Calibrate items based on data collected from representative sample Rasch/IRT Item Bank
CAT and the Rasch Model • The Rasch model is ideal as the underlying measurement model for CAT: • Standard errors can be estimated for each respondent independent of other respondents (Rule 1). • Shorter tests can be as reliable as longer tests (Rule 2). • CAT-based measures can be equated regardless of the items administered in each CAT session (Rule 3).
Benefits of CAT • CAT provides a way to obtain precise measures while minimizing respondent burden. • Measures obtained with CAT can be directly compared even though respondents receive different sets of items. • Instruments measuring the same construct can be combined to form a larger item bank.
Benefits of CAT • CAT of course shares the benefits of computer-based testing: • Standardized scoring procedures • Automated data entry • Immediate feedback • Automatic report generation • Greater privacy
CAT Process Typical Pattern of Responses Increased Difficulty • Score is calculated and the next best item is selected based on item difficulty Middle Difficulty +/- 1 Std. Error Decreased Difficulty Correct Incorrect
Logical Components of CAT • Start Rule • Item Selection • Measure Estimation • Stop Rule(s)
The Start Rule • Used to select first item • What measure is assigned to the respondent prior to selecting the first item? • Can be an arbitrary value (0 on the logit scale) or can be based on previously gathered information.
Item Selection • Several methods available. • Common approach is to select item providing maximum information relative to the current measure. • Can be modified to include other criteria: • Content domains • Items needed for diagnosis
Item Information Item Difficulty = 0.5 Maximum information, Trait level = 0.5 too easy too difficult
Item Selection Item 2 Select Item 1 Item 3
Estimating the Measure • Once an item is selected and a response to the item is obtained, the CAT system will re-estimate the respondent’s measure and the standard error of measurement. • As with all Rasch measures, the measure estimated by CAT is on a logit scale ranging form negative to positive infinity.
Estimation Methods • Maximum Likelihood • No distributional assumptions • Cannot estimate measures with 0 or perfect scores. • Bayesian • Assumes the latent trait has a given distribution, e.g., normal distribution • Easier to program • Provides estimates of persons with extreme (0 or perfect) scores. • Measures at the extremes are biased.
Stop Rules • Determines when sufficient information has been collected • Types of Stop Rules • Measurement precision • Number of items administered • Test-taking time • Some combination of the above
Are CAT and Paper and Pencil Tests Equivalent? • Numerous studies have documented the equivalence of paper-and-pencil and CAT administration, including: • Equal ability estimates (Bergstrom, 1992) • Equal variances • High correlations (> .90) • CATs provide comparable and in some cases improved construct and predictive validity
How Many Items? • Short Answer: The more, the better. • Not uncommon to have hundreds of items in an item bank. • Number of items will depend on • Stop rule used • Number of constructs or domains being assessed • Measurement range • Purpose of the CAT: to estimate a measure or classify persons into groups
Comments • Even large item banks fail to provide adequate precision over the entire measure (though it can come close). • Bank size matters, but so does item quality and targeting of items to the intended population. • An important question is: • Where along the measurement continuum is precision most critical?
Potential of CAT in Clinical Practice • Reduce respondent burden • Reduce staff resources • Reduce data fragmentation • Streamline complex assessment procedures
Limitations of CAT • Expensive to develop and maintain • Reviewing/changing answers to previous items is usually not allowed, and when allowed can complicate CAT procedures.
Recommended Readings • Wainer, H., Dorans, N.J., Flaugher, R., Green, B.F., Mislevy, R.J., Steinberg, L., & Thissen, D. (2000). Computerized Adaptive Testing: A Primer. New York: Lawrence Erlbaum. • van der Linden, W. & Glas C.A.W. (2000). Computerized adaptive testing: Theory and Practice. • Parshall, C.G., Spray, J.A., Kalohn, J.C., & Davey T. (2002). Practical Considerations in Computer-Based Testing. New York: Springer Verlag.