Improving the Accessibility of Large-scale Reading Assessments for Students with Disabilities

Improving the Accessibility of Large-scale Reading Assessments for Students with Disabilities

National Accessible Reading Assessment Projects (NARAP) Martha Thurlow National Center on Educational Outcomes

NARAP Overview • Partnership for Accessible Reading Assessments (PARA) • National Center for Educational Outcomes • CRESST • Westat • Designing Accessible Reading Assessments (DARA) • Educational Testing Service (ETS)

Goals for Project • Develop a definition of reading proficiency • Research the assessment of reading proficiency • Develop research-based principles and guidelines making large-scale reading assessments more accessible for students who have disabilities that affect reading • Develop and field trial a prototype reading assessment

Timeline • Yr 1 - Yr 3 • Yr 4 - Yr 5 Goal 1: Definition Goal 3: Principles

Definition NARAP gathered relevant information toward developing a definition of reading proficiency by (a) reviewing existing definitions, (b) convening a panel of experts to provide input, and (c) conducting focus groups. NARAP also adopted as a foundational definition the definition from NCLB’s Reading First provisions.

Principles and Issues Principle 1: Definitions of reading proficiency must be consistent with core NCLB provisions. Principle 2: Reading proficiency must be defined in such a way that flexible expressions of reading are allowed while preserving the essential nature of reading. This is essential as we seek to make assessments accessible to students with a variety of disabilities.

Principles and Issues Principle 3: Definitions of reading proficiency must reflect both comprehension and foundational skills.

Three Major Research Question Areas • What characteristics of students require more accessible assessment of reading? • What characteristics of current assessment practices hinder accessibility? • What characteristics would accessible assessment have?

Current Assessment Characteristics – Exploration • State documents – Reviews of state standards, test specifications, accommodations policies, test forms. • State data - Item response analyses; DIF and distractor analysis. • Cognitive labs – Think aloud studies to see how students deal with questionable assessment materials and practices.

Accessible Assessment Characteristics - Exploration • Ideas for accessible assessment – from potentially accessible instructional practices through IEP-recommended practices, CRESST OTL instruments, classroom observations. • Solicit nominations for accessible assessment practices through surveys of selected groups (seeking help here). • Expert review via focus groups, MACB, Delphi to select accessible assessment practices based on likelihood of effectiveness (improve performance), validity (retain reading construct integrity), feasibility (practical to implement).

Designing Accessible Reading Assessments (DARA) Cara Cahalan-Laitusis Educational Testing Service

DARA Project Focus • Students with Reading-based Learning Disabilities • Component Approach to Reading • Isolate components by subtests and provide information on overall proficiency and by component • Components based on NRP definition of reading (fluency, decoding, vocabulary, and comprehension)

DARA Staff • ETS • John Sabatini, goal 1 construct definition • Cara Cahalan-Laitusis, goal 2 research • Linda Cook, goal 3 guidelines • Jennifer Dean, goal 4 test development • Laurie Cutting, Kennedy Krieger Institute • Jerry Tindal, University of Oregon • Westat • Council for Exceptional Children

DARA Project Research • Psychometric Properties of Current State ELA Assessments for students with and without learning disabilities • Differential Boost from Read Aloud on Reading test • Cognitive Labs to explore performance of current and proposed item types

Psychometric Research • Differential Item Functioning (DIF) on State ELA Assessments • Factor Analysis on State ELA Assessments

Differential Boost

Data Collected • 2 Reading Comprehension Tests • Extra Time • Extra Time with Read Aloud via CD • 2 Fluency Measures • 2 Decoding Measures (4th grade only) • Student Survey • Teacher Survey

Sample • 1170 4th Graders • 522 Students with RLD • 648 Students without a disability • 855 8th Graders • 394 Students with RLD • 461 Students without a disability

Differential Boost Design

Preliminary Findings • Differential Boost at both 4th and 8th grades (i.e., students with LD had significantly greater score gains from read aloud than non-LD students) • Teachers ratings of students reading ability is good predictor of score gains from read aloud

Plans for Cognitive Labs • Think Aloud studies with LD and non-LD students to examine how students approach • items shown to have DIF • new item types designed to assess fluency and decoding in a large scale assessment • Families of items with slight variations (e.g., operational item and universally designed item)

Investigating Item Format for Accessible Reading Assessments Improving the Accessibility of Large-Scale Reading Assessment for Students with Disabilities Jamal Abedi CRESST/University of California, Davis

Selected Factors Affecting Accessibility of Reading • Item type • Number of distracters in MC items • Question length • Word or sentence length • Page crowdedness • Font type • Color/background

Item Type • Some types of items may be more challenging for students with disabilities than other types • For example, short and extended constructed-response items may be more linguistically and cognitively demanding • We are exploring the possible impact of item type on SDs.

Number of distracters • Results of extant data analyses suggested that trend of response to distracters in MC items differ across SD/non-SD categories • Guessing factor does not impact performance outcomes similarly across the SD designations • Students with no apparent disabilities make more educated guesses in selecting distracters than students who are identified as having disabilities

Number of distracters • Non-SD students unsure about the correct response are more likely to narrow the choices down • SD students unsure about the correct response are more likely select randomly among the alternatives with no apparent preference of one alternative over others

Conclusions and Recommendations • Reducing the number of alternatives in multiple-choice format could reduce the differential performance of items across students’ SD designation. • We are investigating this issue further in a field study.

Question, sentence & word length • Literature has shown question length/word length to affect the performance of ELL students • Research findings on the impact of question length/word length may be applied to students with learning disabilities • The impact of question length/word length as a factor affecting accessibility of assessment for SDs will be investigated

Page crowdedness • Too much information such as having many items on one page or crowded charts on a page maybe more challenging for SDs than for non-SDs • A crowded page may be more confusing for those students at the lower performance distribution particularly those with disabilities

Font and background Studies on font have shown mixed results. For instance, studies on serif versus sans serif fonts have varying results. Similarly, font color and its background all need more concrete investigation.

Concluding Remarks • Analyses of existing data will help identify accessibility issues • Nuisance variables that have differential impact on student performance across SD designation may reduce accessibility level • In a series of field studies, NARAP will identify factors affecting accessibility and provide guidance to promote accessibility

Using Differential Item Functioning to Analyze a State English-language Arts Assessment Linda Cook Educational Testing Service

Overview of Presentation • What is DIF and why are DIF procedures useful • Some issues related to using DIF procedures for students with disabilities • How we have addressed these issues • Results of fourth grade analyses of state English-language arts assessment

Differential Item Functioning (DIF) • DIF refers to a difference in item performance between two comparable groups of test takers • DIF exists if test takers who have the same underlying ability level are not equally likely to get an item correct • Some recent DIF studies • Bielinski, Thurlow, Ysseldyke, Freidebach & Friedebach, 2001 • Bolt, 2004 • Barton & Finch, 2004

Issues Related to the Use of DIF Procedures for Students with Disabilities • Group characteristics • Definition of group membership • Differences between ability levels of reference and focal groups • The characteristics of the criterion • Unidimensional • Reliable • Same meaning across groups

Addressing Issues in DIF Applications for Students with Disabilities • Study Groups • Used intact groups from state assessment • Used large samples of students • Multiple Procedures • Mantel-Haenszel • Logistic regression • DIF analysis paradigm

Addressing Issues in DIF Applications for Students with Disabilities (cont.) • Evaluated three criteria • Total Test • Reading • Writing • Analyzed items in total test

Results of Mantel-Haenszel Analyses • Purpose of the study • To examine the affect of a learning disability and of test accommodations on items on a state standards based English-language arts assessment • To evaluate the characteristics of three matching criteria

Description of the Tests • Grade 4 ELA test contains reading and writing strands for a total of 75 items • Reading subtest has three strands for a total of 42 items • Writing subtest has three strands for a total of 33 items

Description of the Samples • Four groups of students • Students without disabilities • Students with LD who took the test without an accommodation • Students with LD who took the test with an accommodation defined by 504 plan or IEP • Students with LD who took the test with a read-aloud accommodation

Comparison Groups Used for DIF Analyses Reference Group Focal Group Without disabilities LD, no accommodations Without disabilities LD, IEP/504 accommodations Without disabilities LD, read- aloud accommodation LD, no accommodations LD, IEP/504 accommodations LD, no accommodations LD, read-aloud accommodation

Characteristics of Groups Used for DIF Studies

Reliabilities of Scores on Matching Criteria

No. of DIF Items Identified Using Total Test as Criterion

No. of DIF Items Identified Using Reading Subtest as Criterion

No. of DIF Items Identified Using Writing Subtest as Criterion

Summary of Results • Criterion • Total test score most reliable • Writing subtest score least reliable • Total test may be multi-dimensional • Total test identifies least amount of DIF items • Reading and writing subtests identify similar amounts of DIF • Using Reading as the criterion identifies mostly writing items as having DIF and using writing items as the criterion identifies mostly reading items as having DIF

Summary of Results (cont.) • When reference group is students without disabilities, students with disabilities who took test with accommodations showed more DIF than students with disabilities who took test without accommodations • Read-aloud accommodations result in increased DIF • DIF is decreased for accommodated groups if reference group is students with disabilities

Conclusions • Choice of matching criterion impacts results • Disability, alone, results in DIF • Accommodations result in DIF • Read-aloud accommodations result in the most DIF • Accommodations specified in 504/IEP do not result in DIF when reference group is students with disabilities who took test without accommodations

Improving the Accessibility of Large-scale Reading Assessments for Students with Disabilities