CCSSO National Conference on Student Assessment New Orleans, LA June 27, 2014 Strand 11A

Accountability and Students with Disabilities: Assuring Valid Inferences about Teachers and Schools CCSSO National Conference on Student Assessment New Orleans, LA June 27, 2014 Strand 11A

IES-funded 5-state longitudinal study on student growth • Research on teacher evaluation • National advisory panel member for NCAASE • NCEO research • Research supported by IES-funded National Research and Development Center on Assessment and Accountability for Special Education • Observations from Massachusetts and discussant remarks

Assuring valid inferences about teachers and schools…

Describing and Using Growth from Students with Disabilities on Summative Assessments Heather Buzick Educational Testing Service Princeton, NJ A portion of this research was funded by the Institute for Education Sciences (Award #R324A120224)

Current research

Motivation and importance • Approximately 14% of students have a diagnosed disability • The majority spend most of instructional time in general education classroom* • Approximately 80% of teachers have at least one student with a disability in their classroom** • At least 75% of students with disabilities take the general assessment* • Students’ disabilities can have an impact on access to test content, student may require testing accommodations, teaching and learning may differ from other students • How should students’ test scores be included in accountability systems? *from Historical State-Level IDEA Data Files (http://tadnet.public.tadnet.org/pages/712 **Estimate. Sources available from author

Study 1 Research Questions

Some definitions of growth within individual students • Differences in vertically scaled scores (gains) • E.g., 300 scaled score in grade 3, 320 scaled score in grade 4 • Transitions across proficiency levels • E.g., “basic” in grade 3, proficient in grade 4 • Student growth percentiles • From grade 3 to grade 4, the student grew as much as or more than 70 percent of other students in the state peers who had similar grade 3 test scores

Conclusions • The model matters for the inferences we make about schools, teachers, and individual students (sometime is subtle, but important ways) • Policy makers: identify the claims they wish to make • Measurement experts: help policymakers understand the meaning derived from a particular model • How much growth is enough? • Norms based on accumulated growth data from multiple sources • Prediction associated with college- and career-ready standards

Accountability Dilemmas for Students with Disabilities and Policy Alternatives Ann Schulte Arizona State University Natalie Murr North Carolina State University Joseph Stevens University of Oregon

National Center on Assessment & Accountability for Special Education • NCAASE www.ncaase.com • Institute of Education Sciences, 2011-2016 • Co-PI’s • Stephen Elliott & Ann Schulte, Arizona State Univ • Joseph Stevens & Gerald Tindal (Project Director), Univ of Oregon This work is supported by the Institute of Education Sciences, U.S. Department of Education, through grant R32C110004 awarded to the University of Oregon. The opinions expressed are those of the authors and do not necessarily represent views of the Institute or the U.S. Department of Education.

NCAASE 2011-2016:Our Key Research Questions • What is the natural developmental progress in achievement for students with disabilities? • What models best characterize achievement growth for students with disabilities who are participating in general achievement tests? • How do various growth models represent school effects for students with and without disabilities, and how do results compare to those derived from the status models now in use? • How do results from different types of interim assessments of students’ achievement meaningfully contribute to a model of academic growth for students with disabilities? • How can information about opportunity to learn and achievement growth be used to enhance academic outcomes for students with disabilities?

Persistent Accountability Dilemmas • Bias introduced by including only current students with disabilities in students with disabilities (SWD) subgroup (Ysseldyke & Bielinski, 2002) • “One shot” model of assessing proficiency and SWD performance variability—retests to assure assessment fairness (Wei, 2012) • Students start at differing levels, status measures do not consider student progress relative to starting point—importance of looking at growth to assess school and teacher effects (Dunn & Allen, 2009; Stevens, 2005)

Data Sources for Presentation • North Carolina test data (NCAASE also looking at AZ, OR, PA) • Cross sectional-2010 • Allowed retests for non-proficient student, inclusion of students who had exited special education for two years or less • State-level growth metric—residual gain score using two prior years’ test scores, z-score score based on mean gain and sd in standard setting year • Longitudinal—Math 2001-2005 cohort, Reading 2003-2007 cohort

Impact of Two Specific Policies • Including students who have exited special education • Allowing retesting for students who do not reach proficiency

Stable Subgroup Membership Matters Mathematics Achievement Gap

Change in Mean Number of Students Reaching Proficiency

Change in School-level Percent Proficient for SWD w/ Exiters Included

SWD’s Reaching Math Proficiency With and Without Retest

SWD’s Reaching Reading Proficiency With and Without Retest

Growth vs. Proficiency • What does growth across grades look like for specific exceptionalities? • Relationship between status and growth for students with and without disabilities

Mathematics Growth by Exceptionality

Reading Growth by Exceptionality

Growth by Starting Proficiency Level-Math General Ed SWD

Growth by Starting Proficiency Level-Rdg General Ed SWD

Conclusions • SWD subgroup is not stable and policy changes allowing longer time to “count” in subgroup improve school SWD outcomes • Retesting benefits SWDs and may also be likely to benefit other groups characterized by large achievement gaps • SWDs show growth mathematics and reading achievement across grades, although improvement may not be reflected in changes in status (Non-proficient/proficient) • Large differences in starting point achievement skills within SWD group, smaller differences in growth

Accountability and Students with Disabilities: Assuring valid inferences about teachers and schoolsJim Ysseldyke

Purposes of Monitoring Student Growth • District/State Accountability • Individual Progress Monitoring/instructional planning • Teacher evaluation (value added)

Typical Accountability Models for SWD • Cross-sectional • Cohort Static • Cohort Dynamic

Typical Scores • Scaled Scores • Proficiency Levels • Effect Sizes • More recently Student Growth Percentiles (ala Betebenner) or Student Deciles in some of our work

Main Issues • Reducing achievement gap (GE v SE) • Nobody wants SWD in their accountability profile • How long should SWD count? • What model should be used? • What scores should be used?

Major Points I Heard • Students start at differing levels, so status measures do not consider student progress relative to starting point • Use of cross sectional dangerous • SWD are growing, but many may not meet proficiency standards

Major Points I Heard • We have limited data on growth norms for SWD (small Ns) • Much concern about how long to count SWD (at district or state level).

Achievement Gap Using 3 Analytic Methods over 6 Years

Spring-Fall SGP Growth by Category v National Norms

STAR Reading Growth for Grade 10 SED Students

Spring to Fall SGP for Students with SED in Differing Programs

CCSSO National Conference on Student Assessment New Orleans, LA June 27, 2014 Strand 11A