160 likes | 266 Views
Race ( or a long, slow slog ) to the Top Assessment Program: Suggestions, Reflections and Cautions. Henry Braun Lynch School of Education Boston College Public Meeting of the U.S. Department of Education Boston MA November 12 2009. Idealism vs. Realism.
E N D
Race (or a long, slow slog) to the Top Assessment Program:Suggestions, Reflections and Cautions Henry Braun Lynch School of Education Boston College Public Meeting of the U.S. Department of Education Boston MA November 12 2009
Idealism vs. Realism • Design framework constitutes a worthwhile goal • In principle, we know how to construct such a system -- and starting with a “blank slate” we could probably do it in 3-5 years • In the real world, we are unlikely to achieve it because of • Technical and logistical obstacles • Capacity constraints • Contractual issues • Resistance and inertia
What should we aim for? • Chart a new “developmental pathway” that will jumpstart innovation and lead toward an approximation to the ideal • Adopt a systems approach that takes account of the multiple interacting systems in which assessment design, development and implementation take place • First instantiation should • Model new patterns of collaboration in assessment design • Have superior measurement properties with respect to indicators of status and growth • Effectively employ new paradigms for assessment • Exhibit potential of novel assessment platforms
Some Prerequisites • Comprehensive model of each domain • Models of student learning in the domain (pathways to expertise) • Content standards that are complete and vertically articulated • Performance standards that are rigorous and vertically articulated • Technology platform to support instruction and assessment
Hansche, L., Hambleton, R., Mills, C. N., Jaeger, R. M. (1998) Handbook for the development of performance standards. 5
A Four Component System • Diagnostic • Provides instructional support • Frequent • May be technology-based • Extended Project(s) • Targets higher-order standards • Integrated • Technology-based and teacher marked • On-demand based on previewed materials • Student-produced responses • May be technology-based • Centrally marked • On-demand (standard) • Forced choice and short answer responses • May be technology-based
Intended Outcomes • Enhanced construct validity • Better construct representation • Multiple assessment modes • Improved systemic validity • Reduces incentive to narrow curriculum • Reduces value of inappropriate preparation • Potential link to better professional development • Focus on student work products • Moderated marking leads to collaborative learning • Ramp-up to a full-service technology platform • Lower-stakes use of technology • Opportunity to build capacity and experience
Specifics: Standard-Setting • Mastery of material at grade “n” is not an end in itself, but a milestone in a student’s trajectory through school. • Some common-sense meanings of achieving proficiency in grade n are: • Student has met requirements for grade n • All things being equal, the student has a high probability of achieving proficiency in grade n+1, • Argues for cross-grade coherence in standard-setting • Develop through “backward” induction from standards for college and career readiness so that “meeting” standards in earlier grades signals student is on-track for post high school readiness
The 3P Paradigm • Prospective: The domain model and agreement on competencies shape test development through the early specification of performance standards • Progressive: Essential there be coordination in content frameworks and performance standards across grades • Predictive: Descriptions of performance standards are explicitly based on theoretical and empirical evidence about trajectories of student learning and development 9
Technology Considerations • Technology as a means to an end (what end?) • The ambiguity of “technology platform” • Capacity (and lack of same) at all levels • The need for an implementation strategy • The promise of technology • Novel item types (improve construct validity) • Adaptive test designs (improve precision) • Automated scoring of student-produced responses (reductions in cost and turn-around time) • Accessibility (for some)
Implications for Assessment Design • Strategic planning for a multi-phase sequence of assessments incorporating both evolutionary and revolutionary advances • Integration of cognitive and developmental perspectives – in concert with “traditional” psychometric and logistic requirements • Greater complexity in balancing goals and constraints • Need for multi-disciplinary teams
Thoughts on Accountability • Reauthorization of ESEA is in progress • Expect that indicators of both status and growth (value-added?) will be employed • Superior test design can enhance validity of process … but test design alone does not address fundamental problems of making causal inferences from aggregate data drawn from an observational (i.e. non-randomized) study
Challenges • Resistance/inertia due to reluctance to abandon familiar pathways and constraints of current contractual arrangements • Identifying a viable state consortium • Encouraging innovation without being overly prescriptive • Complaint: Assessment tail wagging instructional dog • Complaint: Relying on teachers’ judgments for “objective” summative evaluations
Final Thoughts • Federal dollars should be invested in building capacity that can be leveraged over time • Support alternative strategies • Amortize costs through dual-use strategies • Encourage involvement of educators at all stages of development • Build in requirements for formative evaluation and independent audits • Provide incentives for states to do the “right thing” • Allow states some flexibility on timing of adoption