A Practitioner’s Introduction to Equating

A Practitioner’s Introduction to Equating with Primers on Classical Test Theory (CTT) and Item Response Theory (IRT) Joseph Ryan, Arizona State University Frank Brockmann, Center Point Assessment Solutions Workshop: Assessment, Research and Evaluation Colloquium Neag School of Education, University of Connecticut October 22, 2010

Acknowledgments • Council of Chief State School Officers (CCSSO) • Technical Issues in Large Scale Assessment (TILSA) and Subcommittee on Equating, • part of the State Collaborative on Assessment and Student Standards (SCASS) • Doug Rindone and Duncan MacQuarrie, CCSSO TILSA Co-Advisers • Phoebe Winter, Consultant • Michael Muenks, TILSA Equating Subcommittee Chair • Technical Special Interest Group of National Assessment of Educational Progress (NAEP) • coordinators • Hariharan Swaminathan, University of Connecticut • Special thanks to Michael Kolen, University of Iowa

Workshop Topics The workshop covers the following topics: • Overview - Key concepts of assessment, linking, and equating • Measurement Primer – Classical and IRT theories • Equating Basics • The Mechanics of Equating • Equating Issues

1. Overview Key Concepts in Assessment, Linking, Equating

Assessment, Linking, and Equating Validity is… … an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment. (Messick, 1989, p. 13) Validity is the essential motivation for developing and evaluating appropriate linking and equating procedures.

Assessment, Linking, and Equating

Linking and Equating • Equating • Scale aligning • Predicting/Projecting Holland in Dorans, Pommerich and Holland (2007)

Misconceptions About Equating Equating is… • …a threat to measuring gains. • …a tool for universal applications. • …a repair shop. • …a semantic misappropriation. MYTH WISHFUL THOUGHT MISCONCEPTION MISUNDERSTANDING

2. Measurement Primer Classical Test Theory (CTT) Item Response Theory (IRT)

Classical Test Theory The Basic Model = + E error (with some MAJOR assumptions) O observed score T true score • Reliability is derived from the ratio of error score to true score • Key item features include: • Difficulty • Discrimination • Distractor Analysis

Classical Test Theory Reliability reflects the consistency of students' scores • Over time, test retest • Over forms, alternate form • Within forms, internal consistency Validity reflects the degree to which scores assess what the test is designed to measure in terms of • Content • Criterion related measures • Construct

Item Response Theory (IRT) The Concept • An approach to item and test analysis that estimates students’ probable responses to test questions, based on • the ability of the students • one or more characteristics of the test items

Item Response Theory (IRT) • IRT is now used in most large-scale assessment programs • IRT models apply to items that use • dichotomous scoring with right (1) or wrong (0) answers and • polytomous scoring with items scored with ordered categories (1, 2, 3, 4) common with written essays and open-ended constructed response items • IRT is used in addition to procedures from CTT INFO

Item Response Theory (IRT) IRT Models • All IRT models reflect the ability of students. In addition, the most common basic IRT models include: • The 1-parameter model – (aka Rasch model) models item difficulty • The 2-parameter model – models item difficulty and discrimination • The 3-parameter model – models item difficulty, discrimination • and pseudo guessing

Item Response Theory (IRT) IRT Assumptions • Item Response Theory requires major assumptions: • Unidimensionality • Item Independence • Data-Model Fit • Fixed but arbitrary scale origin

Item Response Theory (IRT) A Simple Conceptualization BASIC PROFICIENT ADVANCED -1.5 +2.25

Item Response Theory (IRT) Probability of a Student Answer

Item Response Theory (IRT) Item Characteristic Curve for Item 2

Item Response Theory (IRT)

IRT and Flexibility IRT provides considerable flexibility in terms of • constructing alternate tests forms • administering tests well matched or adapted to students’ ability level • building sets of connected tests that span a wide range (perhaps two or more grades) • inserting or embedding new items into existing test forms for field testing purposes so new items can be placed on the measurement scale INFO

3. Equating Basics Basic Terms (Sets 1, 2, and 3) Equating Designs (a, b, c) Item Banking (a, b, c, d)

Basic Terms Set 1 Column A Column B __Anchor Items A. Sleepwear __Appended Items B. Nautically themed __Embedded Items apparel C. Vestigial organs D. EMIP learning module USEFUL TERMS

Basic Terms Set 2 For each term, make some notes on your handout: Pre-equating - Post equating - USEFUL TERMS

Basic Terms Set 3 For each term, make some notes on your handout: Horizontal Equating – Vertical Equating (Vertical Scaling) – Form-to-Form (Chained) Equating – Item Banking – USEFUL TERMS

Equating Designs Random Equivalent Groups Single Group Anchor Items

Equating Designs a. Random Equivalent Groups

Equating Designs b. Single Group The potential for order effects is significant--equating designs that use this data collection method should always be counterbalanced! CAUTION

Equating Designs b. Single Group with Counterbalance

Equating Designs c. Anchor Item Design not always at the end

Equating Designs c. Anchor Item Set

Equating Designs c. Anchor Item Designs • Internal/Embedded • Internal/Appended • External USEFUL TERMS

Equating Designs Internal Embedded Anchor Items

Equating Designs Internal Appended Anchor Items

Equating Designs External Anchor Items

Equating Designs Guidelines for Anchor Items • Mini-Test • Similar Location • No Alterations • Item Format Representation RULES of THUMB

3. Equating Basics Basic Terms (Sets 1, 2, and 3) Equating Designs (a, b, c) Item Banking (a, b, c, d)

Item Banking Basic Concepts Anchor-item Based Field Test Matrix Sampling Spiraling Forms

Item Banking a. Basic Concepts • An item bank is a large collection of calibrated and scaled test items representing the full range, depth, and detail of the content standards • Item Bank development is supported by field testing a large number of items, often with one or more anchor item sets. • Item banks are designed to provide a pool of items from which equivalent test forms can be built. • Pre-equated forms are based on a large and stable item bank.

Item Banking b. Anchor Item Based Field Test Design Field test items are most appropriately embedded within, not appended to, the common items. RULE of THUMB

Item Banking c. Matrix Sampling • Items can be assembled into relatively small blocks (or sets) of items. • A small number of blocks can be assigned to each test form to reduce test length. • Blocks may be assigned to multi forms to enhance equating. • Blocks need not be assigned to multi forms if randomly equivalent groups are used.

Item Banking c. Matrix Sampling

Item Banking d. Spiraling Forms • Tests forms can be assigned to individual students, or students grouped in classrooms, schools, districts, or some other units. • “Spiraling” at the student level involves assigning different forms to different students within a classroom. • “Spiraling” at the classroom level involves assigning different forms to different classrooms within a school. • “Spiraling” at the school or district level follows a similar pattern.

Item Banking d. Spiraling Forms

Item Banking d. Spiraling Forms • Spiraling at the student level is technically desirable: • provides randomly equivalent groups • minimizes classroom effect on IRT estimates • (most IRT procedures assume independent • responses) • Spiraling at the student level is logistically problematic: • exposes all items in one location • requires careful monitoring of test packets and • distribution • requires matching test form to answer key at the • student level

It’s Never Simple! Linking and equating procedures are employed in the broader context of educational measurement which includes, at least, the following sources of random variation (statistical error variance) or imprecision. • Content and process representation • Errors of measurement • Sampling errors • Violations of assumptions • Parameter estimation variance • Equating estimation variance IMPORTANT CAUTION

4. The Mechanics of Equating The Linking-Equating Continuum Classical Test Theory (CTT) Approaches Item Response Theory (IRT) Approaches

The Linking-Equating Continuum • Linking is the broadest terms used to refer to a collection of procedures through which performance on one assessment is associated or paired with performance on a second assessment. • Equating is the strongest claim made about the relationship between performance on two assessments and asserts that the scores that are equated have the same substantive meaning. USEFUL TERMS

The Linking-Equating Continuum different forms of linking equating (strongest kind of linking)

The Linking-Equating Continuum Frameworks • There are a number of frameworks for describing various forms of linking: • Mislevy, 1992 • Linn, 1993 • Holland, 2007 • (in Dorans, Pommerich, and Holland 2007)

The Linking-Equating Continuum In 1992, Mislevy described four typologies of linking test forms: moderation, projection, calibration, and equating (Mislevy, 1992, pp. 21-26). In his model, moderation is the weakest form of linking tests, while equating is considered the strongest type. Thus, equating is done to make scores as interchangeable as possible.

A Practitioner’s Introduction to Equating

A Practitioner’s Introduction to Equating

Presentation Transcript

Introduction to VHDL (A Basic Introduction)

Aboriginal Sacred Tobacco Use: What Should You Know as a Researcher and Practitioner?

NON MEDICAL PRESCRIBING FOR PARAMEDIC PRACTITIONERS

The Role of the Laparoscopic Nurse Practitioner

Developing the Reflective Practitioner : Supervision in Harm Reduction Programs

Introduction to MATLAB

LIS650 part 0 Introduction to the course and to the World Wide Web

Candida Albicans

A Brief Introduction to C++

Chapter 1 Introduction

PAIN MANAGEMENT

Introduction and Kinematics

Talk (III): Introduction to Phishing

NUTRITION AND CARDIOVASCULAR DISEASE

Introduction to SRS

Introduction to Neuropsychology

Introduction to the Physical Examination

( I) Introduction

An Introduction to the