330 likes | 532 Views
A prospective approach to standard setting. Isaac I. Bejar, Henry I. Braun, Rick Tannenbaum Educational Testing Service Presented at ASSESSING AND MODELING COGNITIVE DEVELOPMENT IN SCHOOL: INTELLECTUAL GROWTH AND STANDARD SETTING Maryland Assessment Research Center for Education Success
E N D
A prospective approach to standard setting Isaac I. Bejar, Henry I. Braun, Rick Tannenbaum Educational Testing Service Presented at ASSESSING AND MODELING COGNITIVE DEVELOPMENT IN SCHOOL: INTELLECTUAL GROWTH AND STANDARD SETTING Maryland Assessment Research Center for Education Success University of Maryland October 19-20, 2006
Outline • Present rationale for a prospective approach to the standard setting process in K-12 that is explicitly informed by learning and developmental considerations • Review the evolution of validity over the last 60 years focusing on the implications for standard setting and assessment design • Review conceptual developments in standard setting and argue that a prospective approach is a natural step in the evolution of the standard setting process • Finally, we sketch steps in a prospective standard setting • Discuss remaining challenges
Why are performance standards important? • Increasingly, academic performance is being communicated in terms of standards (e.g. 30% of students at or above proficient) • Consequential decisions about students and/or schools are being made on the basis of results framed in terms of standards • Policy-makers and the public make inferences about public schools based on their interpretations of the standards and standards-based reports
What are we making inferences about? “Standard setting still can not be reduced to a problem of statistical estimation. Fundamentally, standard setting involves the development of a policy about what is to be required for each level of performance. This policy is stated in the performance standards and implemented through the cutscores.” (Kane, 2001, p. 85, emphasis added)
Some inferences of interest • Inferences about individual students’ level of achievement one point in time • Inferences about individual students’ performance next year • Population inferences about proportion of students at different levels of achievements • Inferences as to the progress of a school or district Advanced Proficient Basic
Problems with current standard setting practice • Historically, standard setting has been a retrospective judgmental process carried out • independently of other factors that inform the design of the assessment, • after the assessment is administered the first time. • The consequences of a retrospective approach are • Reliance on subject matter expertise rather than research on student learning and development • Potential conflation of policy and psychometrics • Difficulty in achieving coherence of cut scores across grades • Risks • Cut scores may not be well supported psychometrically • Insufficient evidence to adequately support desired inferences
Validity overview • Validation as theory testing • Cronbach and Meehl (1955): Gathering evidence for score interpretation follows scientific principles “The investigation of a test's construct validity is not essentially different from the general scientific procedures for developing and confirming theories.” • Items increasingly seen as validity-building blocks • Fischer (1973): LLTM • Embretson (1983): Construct representation
Validity overview (cont.) • Validity is an ongoing argument that seeks to clarify what a measurement means and to understand the limitations of each score interpretation (adapted from Cronbach,1988) • Validity as consequence • “Validity is an overall evaluative judgment, founded on empirical evidence and theoretical rationales, of the adequacy and appropriateness of inferences and actions based on test scores.” (Messick, 1989)
Validity overview (cont.) • Validity as argument (Kane, 2004) • Kane elaborates Cronbach’s “validation as argument” thesis through specification of • Interpretive argument • Build a chain of reasoning from the test construction process to the desired claims. • Validity argument • Amass theoretical and empirical support for the truthfulness of the claims and set appropriate boundaries.
Validity through design: ECD (e.g., Mislevy et al. 2003) • Evidence Centered Design • Make explicit the claim(s) you will want to make about scores at individual and aggregate levels • Determine the student observables that would provide support for the claims we wish to make. • Carefully design and write tasks that would elicit those observables. • Assemble assessments targeted to support the desired claims as strongly as possible
Some history • Through 1980’s standard setting mainly concerned with procedural issues but signs of concern by e.g., Glass (1978), Shepard (1980) begin to emerge • NAGB calls for the use of performance standards (see Lissitz and Bourke 1995) • Kane (1993) emphasizes the need to separate policy from procedure • Performance level descriptors become more prominent (Hansche, 1998) • The judgmental task imposed on standard setting panelist strongly criticized (Pellegrino, Jones, Mitchell (1999) • Response by Hambleton et al. (1999) does not address basic criticism
Some history (cont.) • Cizek (2001) • Zieky (2001) on how standard setting has changed • Kane (2001) on how standard setting has not changed and the importance of separating policy and method • Camilli et al. (2001) • “In the long run, standard setting will make its most valuable contribution to teaching and learning at all levels if procedures are developed that are more closely aligned with cognitive and developmental models of competence in content disciplines” (2001, p. 471, italics added). • Validity oriented standard setting and the idea of “canonical response patterns” • Haertl and Lorie (2004) • Lorie (2001) • On the importance of coherent standards (Ferrara, Johnson, Cheng, 2005; Lewis and Haug (2005)
Outline of an approach • Standard-Setting for K-12 • Mastery of material at grade “n” is not an end in itself but a milestone in a student’s progression through school. • Common-sense meanings of achieving proficiency in grade n are: • Student has met requirements for grade n • All things being equal, the student has a high probability of achieving proficiency in grade n+1,
Standard-Setting for K-12 • Ideally, (i) and (ii) should be consistent. To support forward-looking inferences, we should have: • A developmental perspective in the creation of content frameworks and content standards (e.g., Wilson, 2004). • A prospective approach to standard-setting in which both content frameworks and preliminary performance standards guide assessment design process • “In a coherent educational assessment system, all components should work to prepare the student to meet or exceed that cut score; each component suggests the cut score”, Lewis and Haug, 2005, 12, emphasis added)
Grade n-1 Grade n Grade n +1 Multi grade content standards Research-based Competency model Task Model Library Performance level descriptors (PLDs) Design Performance standards Pragmatic & psychometric constraints Test Specifications (blueprint) Pro-forma Canonical response patterns Develop Assessment instrument developed Assessment administered, calibrated, and scaled Final cut-scores
Hansche, L., Hambleton, R., Mills, C. N., Jaeger, R. M. (1998) Handbook for the development of performance standards.
Multi grade content standards Downloaded from http://www.nctm.org/focalpoints/downloads.asp, on October 10, 2005
Multi grade content standards n-1 n Downloaded from http://www.nctm.org/focalpoints/downloads.asp, on October 10, 2005
Research-based Competency model • A competency model is a recasting and fleshing out of a broad framework, such as the NCTM curricular guidelines, for developing assessments • A competency model is assembled from various sources, including basic research on student learning • A central goal in developing a competency model is to structure it such as to facilitate the translation of policy into performance standards.
Performance level descriptors (PLDs) • Performance level descriptors are typically narratives that elaborate the meaning of performance standards • PLDs are developed with reference to a competency model • PLDs are associated with “evidence rules”
If [(evidence (T3, T10,T11)] Fragment of a PLD The student is capable of formulating a persuasive argument appropriate to a specific audience or recipient.
Task Model Library Performance standards Pragmatic & psychometric constraints Test Specifications (blueprint) n-1 n n+1 PLDs PLDs PLDs T1 T2 . . . . . . . Tn ºººº ºººº ºººº ºººº ºººº ºººº ºººº ºººº ºººº ºººº
Pro-forma Canonical Response Patterns (CRP) n-1 Basic Advanced Proficient T1 T2 . . T9 CRP for top basic CRP Bottom advanced CRP for bottom proficient CRP for top proficient
Setting final cut scores Pro-forma Canonical response patterns • The panel starts with preliminary cut scores that have been obtained by directly mapping canonical response patterns to a scale once it is available. Are there any inconsistencies? • The panel’s role is to accept or adjust preliminary cut scores in light of data from the administration. • The panel’s cognitive task is less burdensome than the usual standard setting task • Arbitrariness (Glass, 1978) is greatly reduced since much thought has gone into where the cuts should be Final cut-scores
Content strands Advanced Proficient Basic Below Basic WI WII WIII WIV 0 1 2 3 Basic 0 1 2 3 Below basic
Some attributes of the model • Prospective: The competency model influences test development through the early specification of performance standards • Progressive: The approach calls for coordination in content frameworks and performance standards across grades • Predictive: PLDs and performance standards are explicitly based on theoretical and empirical evidence about trajectories of student learning and development
Rationale redux • A prospective approach • requires a coordinated set of standards, which encourages articulated pedagogy across grades and reduces possibility of confusing accountability outcomes. • provides better support for forward-looking inferences • strengthens foundations for consequential validity
Some specific challenges • Explicate the approach to an operational level • Address complications entailed by intervening treatment of variable effectiveness (i.e. next year’s instruction). • Formulate and implement feasible validation strategies?