Consequential Validity

Consequential Validity by Suzanne Lane Presentation for CCSSO SCASS-ASR Accountability Systems and Reporting January 20 2005

NCLB Goals • Better, more demanding instruction • Challenging content standards • Same educational opportunities for all students • All students reach same level of achievement

Standards-Based Accountability and Assessment Systems • Goal: improve instruction and student learning • Assessments: grounded in theories and models of student cognition and student learning • Impact: instruction and student learning are improving in meaningful ways

Overview of Presentation • Need for consequential validity evidence • Summary of research on consequential validity • Factors affecting the impact of standards-based reform • Measuring the impact of standards-based reform

Standards and Assessments Peer Review Guidelines • Has the State Ascertained that the decisions based on the results of its assessments are consistent with the purposes for which the assessments were designed? • Has the State ascertained whether the assessments produce intended and unintended consequences? (U.S. Department of Education, 2004)

Standards for Educational and Psychological Testing and Consequential Validity • 13.1 When educational testing programs are mandated by school, district, state or other authorities, the ways in which test results are intended to be used should be clearly described. It is the responsibility of those who mandate the use of tests to monitor their impact and to identify and minimize potential negative consequences. Consequences resulting from the uses of the test, both intended and unintended, should also be examined by the test user. Efforts should be made to document the provision of instruction in tested content and skills (AERA, APA, NCME, 1999)

Need for Consequential Validity Evidence • Success of a reform policy is typically evaluated in terms of assessment scores • Assessment scores in accountability systems can increase in the first few years without actual student learning with respect to the larger construct domain (Linn et. al, 1990) • Examination of the impact of assessments on instruction and student learning provides validity evidence addressing the effectiveness of standards-based accountability systems • Such studies allow a deeper understanding of whether improved performance on assessments reflects meaningful improvements in student achievement and learning

Success of a reform policy in terms of assessment scores • Compared NAEP results in high-stakes sates against the national average for NAEP scores (Amrein & Berliner, 2002) Concluded that states that introduced consequences to their statewide tests did not show any particular gains in their statewide NAEP scores • Using a comparison group, found that states that attached consequences outperformed the comparison group of states for each grade (Rosenshine, 2003; Braun, 2004) • Strong positive associations between gains and accountability index for grade 8 but weaker for grade 4 (Carnoy & Loeb, 2003)

Consequential Validity Research • Many studies have reported a narrowing of curriculum (KY - Koretz et al., 1996, CT- Chudowsky et al., 1997) • Using interviews and classroom artifacts - KY and N.C. assessment systems led to new instructional strategies, but depth and complexity of the content covered in instruction did not change in any essential way (McDonnell & Choisser, 1997) • Teacher-reported professional activities and attitudes toward the state assessment, especially the scoring rubric, were related to changes in instructional practices (KS – Pomplun, 1997)

2/3 of on-grade teachers reported that standards and state assessment extended-response items promoted better instruction and student learning (WA - Stecher et al., 2000) Teachers used reform-oriented strategies in a meaningful way - mathematics and writing scoring rubrics used in instruction in a way that reinforced meaningful learning (WA- Borko et al, 2001)

Few studies have examined the relationship between changes in instruction and improved performance on the assessments KIRIS (Stecher et al., 1998) • Few consistent findings across subject areas and grades • Positive relationship between standards-based practices in writing instruction and direct writing assessment at the middle school level • E.g., More 7th grade teachers in high- versus low-gain score schools had reported integration of writing with other subjects and increased emphasis on the writing process

QUASAR (Stein & Lane, 1996) Examined the relationship between the presence of reform features of math instruction and student performance on a mathematics performance assessment

Greatest gains on performance assessment when instruction tasks engaged students in high levels of cognitive processing • Moderate gains when instruction tasks began as cognitively demanding but implemented so students were not engaged in high levels of reasoning and problem solving • Relatively small gains when instruction tasks were procedurally based and able to be solved with a single, easily accessible strategy

Maryland - MSPAP ( Lane et al.,; Stone & Lane, 2003) • MSPAP Study – majority of the teachers indicated that MSPAP had a positive impact on their instruction • Teacher reported reform-oriented instruction-related variables explained performance differences across schools in reading, writing, math and science ie., schools in which teachers reported that their instruction over the years reflected more reform-oriented problem types and learning outcomes similar to those assessed on MSPAP had higher levels of school performance on MSPAP

Teacher reported reform-oriented instruction-related variables explained differences in MSPAP school performance gains in in reading and writing i.e., increased reported use of reform-oriented tasks in writing and reading and a focus on the reading and writing learning outcomes in instruction was associated with greater rates of change in MSPAP school performance over a 5 year period • Teacher perceived impact of MSPAP on instructional practices explained differences in MSPAP school performance gains in math and science

MSPAP study examined contextual variables such as SES • Percent free or reduced lunch which served as a proxy for SES was significantly related to school performance in all content areas – Schools with a higher percent tended to perform poorer on MSPAP • No significant relationship between percent free or reduced lunch and growth in school level performance

Factors Affecting the Impact of Standards-Based Reform on Improving Instruction and Student Learning • Quality of Content Standards • Assessments aligned to standards • Variability in defining proficiency • Classification Error

Content Standards • Carefully crafted • Coherent across grade levels and across content areas, reflecting a developmental progression • Accessible to teachers • Teachers need to have a shared understanding of the standards • Cognitive complexity of the standards • Implications for instructionally practices

Content Standards Mathematics (NCTM, 2000) • Develop & evaluate mathematical arguments & proofs • Organize mathematical thinking through communication • Create & use representations to organize, record, & communicate mathematical ideas Science (NRC, 1996) Abilities necessary to do scientific inquiry: • Identify questions that can be answered through scientific investigations • Design & conduct a scientific investigation • Use appropriate tools & techniques to gather, analyze & interpret data • Develop descriptions, explanations, predictions, & models using evidence

Alignment of Content Standards and Assessment Depth-of-knowledge consistency between standards and assessment indicates alignment if what is elicited from students on the assessment is as demanding cognitively as what students are expected to know and do as stated in the standards (Webb, 2002, p. 5). Levels: Recall, Skill or Concept, Strategic Thinking, Extended Thinking

Alignment (Webb, 2002 – 4 states) • Math: Over 50% of objectives under the standards required a more complex depth-of-knowledge than the corresponding items • Reading: 3 of 5 state/grade combinations had 66% of the objectives requiring a more complex depth-of-knowledge than the corresponding items • Most objectives judged to have a depth-of-knowledge level of 2 or 3 • Most multiple-choice items judged to have a depth-of-knowledge of 1 or 2

Alignment • Inconsistency of the cognitive level of content standards across subject matters within states • Inconsistency of the cognitive level of content standards across states • Assessments tent to measure lower cognitive levels than reflected in content standards • Implications for the quality and level of instruction

Proficiency LevelsGrade 8 AYP Starting Points for 42 States on the State Assessment and Performance on NAEP (Linn, 2003)

Proficiency LevelsGrade 8 AYP Starting Points for 42 States on the State Assessment (Linn, 2003, 2004) Most stringent (AZ): 7.0% Most lenient (CO): 74.6% 75th percentile: 56.5% Median: 39.4% 25th percentile: 23.6%

Variability in Starting Points and Effect on Rates of Improvement • At starting point, less than 15% proficient in CA and only 25% proficient in NY • 60% were proficient n GA and VA • Relatively consistent rate of growth during the first few years across the 6 states • Considerable variability in rate of growth during the remaining years

Variability in Proficiency Levels • Meaningfulness of the term proficiency • Need for coherency of performance standards across grades and across content areas • Impact on instructional practices

Classification Errors • Concern with false negatives or misclassifying passing students as non-passing and proficient students as not proficient • Impact grade promotion, retention, and high school graduation • Need to consider measurement error and form confidence bands • Need to provide validity evidence for the psychometric models used (Stone, Weissman, & Lane, in press)

Row % in parens ~.90 overall agreement 1P model underestimates proficiency in two categories 1P model overestimates proficiency for one category

Measuring the Impact of Standards-Based Reform on Instructional Practice • Information about instructional practice provides evidence to evaluate the validity of test performance and performance gains as well as to evaluate the effectiveness of reform programs “While policy makers and reformers at all levels of the system are crucial if these reforms are to be enacted locally, teachers are the key agents when it comes to changing classroom practice. They are the final policy brokers” (Spillane, 1999)

Intended Impact of Assessment and Accountability Program • Motivation and effort to adapt the curriculum and instruction to content standards • Beliefs about the content standards and the assessment • Student motivation to learn and put forth their best effort on the assessment • Professional development support • Instructional practice – content and strategies • Classroom assessments – content and format • Use and nature of test preparation activities • Improved student learning

Unintended Impact of Assessment and Accountability Program • Narrowing of curriculum and instruction • Use of test preparation materials closely linked to the assessment without making changes to instruction • Decreased confidence and motivation to learn and perform well on the assessment • Differential performance gains for subgroups of students

Methods • Surveys - cost effective - increases generalizability - limited in capturing complex instructional practices - lack of shared understanding of terminology - self-report bias • Observations/Case Studies - captures complexities of instructional practices - costly, time consuming, labor intensive - lacks generalizabiltiy

Classroom Artifacts - Captures instructional practices - more direct evidence of whether the cognitive demands required by students in the classroom reflect the cognitive demands required by students on the standards and assessments - increases generalizability - time-consuming for the analysis

Consequential Validity Evidence:Familiarity, Beliefs, Morale, Effort • Familiarity with the content standards and assessment • Beliefs and attitudes toward the assessment • Principal, teacher, student morale • Teacher and student effort Surveys

Consequential Validity Evidence:Professional Development • Nature of professional development support • Amount of professional development support Surveys and artifacts

Consequential Validity Evidence:Instruction and Curriculum • Extent to which instruction and classroom assessments reflect the state standards and assessment - content - cognitive demands - reform oriented Classroom Artifacts- instruction, classroom assessments, scoring rubrics, test preparation materials Surveys

Conceptual Modeling of the Relationship Between Score Gains and School, Principal, Teacher, and Student Variables • Evaluation of consequential evidence involves examining variation in school performance in terms of contextual and evidential variables. • To model the change process and examine agents of change, growth models have been advocated (e.g., Rogosa & Willet, 1985; Willet & Sayer, 1994)

Between School Model (Lane & Stone, 2002) • Contextual Variables • % free or reduced lunch • % minority students • Funding per student • Stability • Evidential Variables • Motivation and Effort • Professional Dev • Classroom Instruction • Student Motivation • Change in Evidential Var • Motivation and Effort • Professional Dev • Classroom Instruction • Student Motivation Initial Performance Level (Intercept) Rate of Change (Slope) Year 1 School Score Year 2 School Score Year 3 School Score Year 4 Class Score

Between School Model Two latent variables are used to describe school performance on the assessment Initial performance Rate of Change Degree of variability in these latent variables reflects the degree of variability between the schools Contextual and evidential factors can be introduced to explain any variability in these factors

Effects that may be evaluated • relevant contextual variables on school level initial performance • relevant evidential variables at Year 1 on school level initial performance and rates of change • changes in relevant evidential variables (Year 1-4) on school-level initial performance and rates of change

Within School Model • Contextual Variables • (Class Level) • Teacher Education • Teacher Experience • Evidential Variables • (Class Level) • Motivation and Effort • Professional Dev • Classroom Instruction • Student Motivation • Change in Evidential Var • (Class Level) • Motivation and Effort • Professional Dev • Classroom Instruction • Student Motivation Initial Performance Level (Intercept) Rate of Change (Slope) 2001/02 Class Score 2002/03 Class Score 2003/04 Class Score 2004/05 Class Score

Within School Model • Examines variability in relevant evidential variables at the class level within schools • Effects that may be evaluated • Contextual variables on class-level initial performance and rates of change • Evidential variables at Year 1 on class-level initial performance and rates of change • Changes in evidential variables (Year 1-4) on class-level initial performance and rates of change

Multilevel Model with 3 Levels • Level I: Changes in performance at the class level • Level 2: Variability within schools • Level 3: Variability between schools (Bryk & Raudenbush, 1992; Muthen; 1995)

Case Studies • Case studies can be used to supplement large-scale studies • Provide richer, contextualized information on instructional practices, classroom assessment, and professional development • Can focus on effective programs and illustrate what factors contribute to quality instruction and student learning as a result of the standards-based reform

Conclusion • Information on teachers’ instruction and classroom assessment practices is pivotal in understanding the success or failure of accountability systems and reform efforts • Need to better understand the extent to which performance gains on assessments reflect improved instruction and student learning rather than more superficial interventions such as narrow test preparation activities

Consequential Validity