1.02k likes | 1.45k Views
A Practitioner’s Introduction to Equating. with Primers on Classical Test Theory (CTT) and Item Response Theory (IRT). Joseph Ryan , Arizona State University Frank Brockmann , Center Point Assessment Solutions. Workshop:. Assessment, Research and Evaluation Colloquium
E N D
A Practitioner’s Introduction to Equating with Primers on Classical Test Theory (CTT) and Item Response Theory (IRT) Joseph Ryan, Arizona State University Frank Brockmann, Center Point Assessment Solutions Workshop: Assessment, Research and Evaluation Colloquium Neag School of Education, University of Connecticut October 22, 2010
Acknowledgments • Council of Chief State School Officers (CCSSO) • Technical Issues in Large Scale Assessment (TILSA) and Subcommittee on Equating, • part of the State Collaborative on Assessment and Student Standards (SCASS) • Doug Rindone and Duncan MacQuarrie, CCSSO TILSA Co-Advisers • Phoebe Winter, Consultant • Michael Muenks, TILSA Equating Subcommittee Chair • Technical Special Interest Group of National Assessment of Educational Progress (NAEP) • coordinators • Hariharan Swaminathan, University of Connecticut • Special thanks to Michael Kolen, University of Iowa
Workshop Topics The workshop covers the following topics: • Overview - Key concepts of assessment, linking, and equating • Measurement Primer – Classical and IRT theories • Equating Basics • The Mechanics of Equating • Equating Issues
1. Overview Key Concepts in Assessment, Linking, Equating
Assessment, Linking, and Equating Validity is… … an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment. (Messick, 1989, p. 13) Validity is the essential motivation for developing and evaluating appropriate linking and equating procedures.
Linking and Equating • Equating • Scale aligning • Predicting/Projecting Holland in Dorans, Pommerich and Holland (2007)
Misconceptions About Equating Equating is… • …a threat to measuring gains. • …a tool for universal applications. • …a repair shop. • …a semantic misappropriation. MYTH WISHFUL THOUGHT MISCONCEPTION MISUNDERSTANDING
2. Measurement Primer Classical Test Theory (CTT) Item Response Theory (IRT)
Classical Test Theory The Basic Model = + E error (with some MAJOR assumptions) O observed score T true score • Reliability is derived from the ratio of error score to true score • Key item features include: • Difficulty • Discrimination • Distractor Analysis
Classical Test Theory Reliability reflects the consistency of students' scores • Over time, test retest • Over forms, alternate form • Within forms, internal consistency Validity reflects the degree to which scores assess what the test is designed to measure in terms of • Content • Criterion related measures • Construct
Item Response Theory (IRT) The Concept • An approach to item and test analysis that estimates students’ probable responses to test questions, based on • the ability of the students • one or more characteristics of the test items
Item Response Theory (IRT) • IRT is now used in most large-scale assessment programs • IRT models apply to items that use • dichotomous scoring with right (1) or wrong (0) answers and • polytomous scoring with items scored with ordered categories (1, 2, 3, 4) common with written essays and open-ended constructed response items • IRT is used in addition to procedures from CTT INFO
Item Response Theory (IRT) IRT Models • All IRT models reflect the ability of students. In addition, the most common basic IRT models include: • The 1-parameter model – (aka Rasch model) models item difficulty • The 2-parameter model – models item difficulty and discrimination • The 3-parameter model – models item difficulty, discrimination • and pseudo guessing
Item Response Theory (IRT) IRT Assumptions • Item Response Theory requires major assumptions: • Unidimensionality • Item Independence • Data-Model Fit • Fixed but arbitrary scale origin
Item Response Theory (IRT) A Simple Conceptualization BASIC PROFICIENT ADVANCED -1.5 +2.25
Item Response Theory (IRT) Probability of a Student Answer
Item Response Theory (IRT) Item Characteristic Curve for Item 2
IRT and Flexibility IRT provides considerable flexibility in terms of • constructing alternate tests forms • administering tests well matched or adapted to students’ ability level • building sets of connected tests that span a wide range (perhaps two or more grades) • inserting or embedding new items into existing test forms for field testing purposes so new items can be placed on the measurement scale INFO
3. Equating Basics Basic Terms (Sets 1, 2, and 3) Equating Designs (a, b, c) Item Banking (a, b, c, d)
Basic Terms Set 1 Column A Column B __Anchor Items A. Sleepwear __Appended Items B. Nautically themed __Embedded Items apparel C. Vestigial organs D. EMIP learning module USEFUL TERMS
Basic Terms Set 2 For each term, make some notes on your handout: Pre-equating - Post equating - USEFUL TERMS
Basic Terms Set 3 For each term, make some notes on your handout: Horizontal Equating – Vertical Equating (Vertical Scaling) – Form-to-Form (Chained) Equating – Item Banking – USEFUL TERMS
Equating Designs Random Equivalent Groups Single Group Anchor Items
Equating Designs a. Random Equivalent Groups
Equating Designs b. Single Group The potential for order effects is significant--equating designs that use this data collection method should always be counterbalanced! CAUTION
Equating Designs b. Single Group with Counterbalance
Equating Designs c. Anchor Item Design not always at the end
Equating Designs c. Anchor Item Set
Equating Designs c. Anchor Item Designs • Internal/Embedded • Internal/Appended • External USEFUL TERMS
Equating Designs Internal Embedded Anchor Items
Equating Designs Internal Appended Anchor Items
Equating Designs External Anchor Items
Equating Designs Guidelines for Anchor Items • Mini-Test • Similar Location • No Alterations • Item Format Representation RULES of THUMB
3. Equating Basics Basic Terms (Sets 1, 2, and 3) Equating Designs (a, b, c) Item Banking (a, b, c, d)
Item Banking Basic Concepts Anchor-item Based Field Test Matrix Sampling Spiraling Forms
Item Banking a. Basic Concepts • An item bank is a large collection of calibrated and scaled test items representing the full range, depth, and detail of the content standards • Item Bank development is supported by field testing a large number of items, often with one or more anchor item sets. • Item banks are designed to provide a pool of items from which equivalent test forms can be built. • Pre-equated forms are based on a large and stable item bank.
Item Banking b. Anchor Item Based Field Test Design Field test items are most appropriately embedded within, not appended to, the common items. RULE of THUMB
Item Banking c. Matrix Sampling • Items can be assembled into relatively small blocks (or sets) of items. • A small number of blocks can be assigned to each test form to reduce test length. • Blocks may be assigned to multi forms to enhance equating. • Blocks need not be assigned to multi forms if randomly equivalent groups are used.
Item Banking c. Matrix Sampling
Item Banking d. Spiraling Forms • Tests forms can be assigned to individual students, or students grouped in classrooms, schools, districts, or some other units. • “Spiraling” at the student level involves assigning different forms to different students within a classroom. • “Spiraling” at the classroom level involves assigning different forms to different classrooms within a school. • “Spiraling” at the school or district level follows a similar pattern.
Item Banking d. Spiraling Forms
Item Banking d. Spiraling Forms • Spiraling at the student level is technically desirable: • provides randomly equivalent groups • minimizes classroom effect on IRT estimates • (most IRT procedures assume independent • responses) • Spiraling at the student level is logistically problematic: • exposes all items in one location • requires careful monitoring of test packets and • distribution • requires matching test form to answer key at the • student level
It’s Never Simple! Linking and equating procedures are employed in the broader context of educational measurement which includes, at least, the following sources of random variation (statistical error variance) or imprecision. • Content and process representation • Errors of measurement • Sampling errors • Violations of assumptions • Parameter estimation variance • Equating estimation variance IMPORTANT CAUTION
4. The Mechanics of Equating The Linking-Equating Continuum Classical Test Theory (CTT) Approaches Item Response Theory (IRT) Approaches
The Linking-Equating Continuum • Linking is the broadest terms used to refer to a collection of procedures through which performance on one assessment is associated or paired with performance on a second assessment. • Equating is the strongest claim made about the relationship between performance on two assessments and asserts that the scores that are equated have the same substantive meaning. USEFUL TERMS
The Linking-Equating Continuum different forms of linking equating (strongest kind of linking)
The Linking-Equating Continuum Frameworks • There are a number of frameworks for describing various forms of linking: • Mislevy, 1992 • Linn, 1993 • Holland, 2007 • (in Dorans, Pommerich, and Holland 2007)
The Linking-Equating Continuum In 1992, Mislevy described four typologies of linking test forms: moderation, projection, calibration, and equating (Mislevy, 1992, pp. 21-26). In his model, moderation is the weakest form of linking tests, while equating is considered the strongest type. Thus, equating is done to make scores as interchangeable as possible.