370 likes | 543 Views
American Education Policy: Trends, Innovations and Dilemmas. Henry Braun Boston College Chestnut Hill, MA, USA Van Leer Jerusalem Institute Workshop. Outline. Educator Evaluation Value-added Modeling Design of Accountability Systems. Education Policy & Educator Accountability.
E N D
American Education Policy:Trends, Innovations and Dilemmas Henry Braun Boston College Chestnut Hill, MA, USA Van Leer Jerusalem Institute Workshop
Outline Educator Evaluation Value-added Modeling Design of Accountability Systems
Education Policy & Educator Accountability • Education policy involves the design, management, monitoring and restructuring (if necessary) of the systems that deliver education services • Four key dimensions of education policy (AQEE) • Access • Quality • Equity • Efficiency • Educator accountability is a system with management/ monitoring/ improvement functions. It’s impact depends on: • The quality and utility of the evidence it generates • The willingness/ability to act on that evidence • The structure of the system as experienced by different stakeholders
U.S. Educator Accountability: Current Scene • State-to-state variation and district-to-district variation within states • Traditional system • Reliance on principals’ judgments of practice • Inconsistent , unreliable and inaccurate • Compensation based on seniority and credentials • General unhappiness with such systems • Poor tool for management • Does not support improvement in teaching and learning • Lots of changes now taking place at both the state and district levels -- encouraged by the federal government
What is “Teacher Effectiveness”? • Effectiveness should be defined in relation to the full range of valued learning goals • Arguably, these goals are broader than “… making a year’s worth of progress each academic year in English/Language Arts & Mathematics” • Other academic subjects • Non-cognitive skills • Behaviors and dispositions • General goals vary by community, grade and subject • Student-specific goals may depart from general goals
“Teachers are the Most Important School-related Factor in Student Achievement” • General agreement with the proposition but disagreement about the magnitude of the “effect” (on test scores) • Estimates of the magnitude vary substantially, depending on • The grade-subject-test combination • How the criterion is scaled and/or adjusted • Whether estimates are corrected for measurement error • Over-blown claims (e.g. “Having superior teachers three years in a row would eliminate achievement gaps”.) add urgency to policymakers’ rush toward approaches that promise to identify superior teachers. • Note: Non-school factors account for at least 50% - 60% of outcome variance.
The Federal Role President Johnson’s Great Society Program 1964-5: Elementary and Secondary Education Act (ESEA) • ESEA (1965): Evaluation of effectiveness of Title I • ESEA (1994): Test-based accountability for Title I schools • ESEA (2002): Test-based accountability for all schools • ESEA (2012?): Test-based accountability for schools, teachers and teacher training programs
The Intuitive Logic Behind Test-based Accountability If good schools and teachers are critical to student learning, then why can’t evidence of student learning (or its absence) tell us something about the quality of schools and teachers? Note 1: In practice learning is identified with gains in test scores. Note 2: Other contributing factors are treated incompletely or not at all. Note 3: Aggregating student test scores to the classroom or school level in order to make inferences about relative effectiveness introduces deep questions about making causal inferences from non-experimental studies.
“Test-based” Accountability (I) • Improve QEE by adopting performance monitoring strategies in education closer to business world models: Augment “input” indicators with “output” indicators • Raises critical questions: • Which student outcomes can/should be used to derive output indicators? • What are the practical/technical/financial constraints? • What are the properties of the indicators? • How should the indicators be combined for evaluation?
“Test-based” Accountability (II) • Usual compromise is to employ students’ results on centrally developed standardized tests • Possible Indicators • Current status • Simple growth • Student growth percentiles • Value-added estimates • Raises more questions • How does this contribute to the improvement of QEE? • What are some other (possible) consequences of using these indicators ? • How fair are evaluations based on these indicators? • What are the messages to different stakeholders?
Limitations of Test-based Indicators • Focus on a narrow range of valued outcomes • Credibility depends on many factors • Test quality • Completeness of data • Technical characteristics of the indicators • Perceived fairness • Substantial uncertainty around an individual teacher’s “score” [random error] • Vulnerable to bias, distortion and corruption [systematic error] • Issues related to transforming indicator values (normative) into decision categories (absolute) • Little to modest amounts of information for improving practice
The Importance of Being Observed • Evidence about teachers’ practices is essential to improving pedagogy enhancing student learning • Evidence should be obtained through a protocol that is • Theory-based and empirically supported • Comprehensive (multi-dimensional) • Adequate measurement characteristics • Implemented by trained personnel • Subject to audit • Favored frameworks • Danielson • Pianta • National Board for Professional Teaching Standards
Rationale for Multiple Indicators • All performance indicators are fallible (subject to random and systematic errors) • Each indicator has a unique “target” that captures one aspect of teacher competency • Using multiple indicators can reduce incentives for distortion and corruption • Combining indicators in an appropriate manner should provide a reasonable basis for making evaluative distinctions • Choice of a combination algorithm depends on • Policy/value considerations • Technical properties of indicators
Value-added Methodology (I) “Value-added methodology refers to quantitative approaches to estimating the specific contributions to the achievement of students of their current teachers, schools or programs, taking account of the differences in prior achievement and (perhaps) other measured characteristics students bring with them to school”.
Value-added Methodology (II) • The results of a value-added analysis are intended to support causal inferences (i.e. attribution of responsibility) – but this is always challenging when the data are derived from an observational study in which there are considerable selection effects. • Value-added analysis is an example of “statistical salvage” – trying to reconstruct (through conditioning) the characteristics of a randomized experiment that never was! • This is an attempt to “level the playing field” in comparing teachers working in very different contexts • There is an inherent difficulty in determining the degree of success of the effort.
How Does VAM Work? • Estimate the relationship between current test scores and (prior student test scores) & (student characteristics), using all available data • Use the estimated model to predict current test score for each student. The predicted score is a counter-factual (i.e. an estimate of the outcome after exposure to the average unit) • Compute: Residual = Observed – Predicted • Aggregate residuals to the desired unit level and compute the unit mean • Adjust unit means (optional) • (Adjusted) unit means are value-added estimates A value-added analysis for a cohort of teachers yields a distribution of estimates of comparative effectiveness.
VAM is not a Panacea (1) The properties of VAM estimates of teacher effectiveness are determined by • Extent and completeness of data • Psychometric characteristics of test scores • Other measurement issues • Analytic issues related to the interaction of model structure, model assumptions and data for adjustment
VAM is not a Panacea (2) Student growth and student status provide complementary descriptions of what is happening with respect to one aspect of student learning. Estimates based on VAMs are also descriptions. Using such descriptions for purposes of accountability implicitly assumes that they are accurate indicators of school (or teacher) effectiveness.
VAM is not a Panacea (3) Making attributions of “treatment” effectiveness is equivalent to making causal inferences on the basis of statistical descriptions. This can be dangerous when the data for analysis come from an observational study and not a randomized experiment. Obtaining an unbiased estimate of a treatment effect depends on having a credible counter-factual available. This is the fundamental problem of causal inference (Holland, 1986).
VAM is not a Panacea (4) • Students’ test results can be influenced by factors not under (or only partially under) a teacher’s control: • School-level factors • leadership • professional collaboration • climate • resources • Peer groups • Extra-school support including contributions from family/community • Model-based statistical adjustments using observables cannot fully eliminate the selection biascaused by non-random linking of students, teachers and schools. • Examine the inference chain more carefully!
Unpacking the VAM Claim (1) Teacher Effects A B C D E +6 +2 0 -3 -5
Unpacking the VAM Claim (2) • VAM really produces an estimate of how much the average gain in a particular class in a particular year* differs from the average gain in all classes in that year.* *after adjusting for the experiences of the students in the class in previous (and possibly) subsequent years
Unpacking the VAM Claim (3) Class Effects A B C D E +6 +2 0 -3 -5
Unpacking the VAM Claim (4) Step 1: Interpret the class effect as the causalcontribution of being in the class to the gains of the students in that class. Step 2: Attribute all of the causal contribution of the class to the teacher of the class.
Unpacking the VAM Claim (5) Examining Step 1: • Question: Under what circumstances can we “unequivocally” interpret the result of a statistical analysis as a causal effect? • Answer: If we have conducted a large well-designed randomized experiment.
Unpacking the VAM Claim (6) • If there are systematic differences among classes in the characteristics of the students (and their interactions) that are not captured by the variables in the model, • If there are also systematic differences among schools in resources and student populations that are not captured by the variables in the model • And, if some of those differences are related to their score gains, • Then, the VAM class effects aren’t accurate measures of the “contributions” of classes to student score gains (i.e. they are biased estimates).
Unpacking the VAM Claim (7) Examining Step 2 • Question: Under what circumstances can we attribute all (or almost all) of the progress in the class to the teacher? • Answer: ??
Unpacking the VAM Claim (8) • Teachers surely bear a substantial degree of responsibility for the learning of their students. • But the degree varies with the school and community environment, the mix of students in the classroom, and the number of “unexpected events”, etc. • Thus, there is: (i) An irreducible uncertainty regarding the accuracy of an individual VAM estimate and (ii) A recognition that the model-based estimate of the error associated with the VAM estimate is too small because it ignores the bias. • An argument to exercise caution in usage for high-stakes.
Current Research on VAMs (I) • Operating Characteristics (empirical) • Newton et al. (2010) • Briggs and Domingue (2011) • Corcoran (2010) • McCaffrey et al. (2009) • Operating Characteristics (theoretical) • Raudenbush (2009) • Lockwood and McCaffrey (2007)
Current Research on VAMs (II) • Robustness to Assumptions • Reardon and Raudenbush (2009) • J. Rothstein (2009) Validity of Indicators Boyd et al. (2009) Rockoff and Speroni (2011) MET Project (see J. Rothstein, 2011)
Current Research on Accountability IncentivesGeneral Treatments R. Rothstein (2008) McCaffrey et al. (2003) NRC/BOTA (2011) Braun (2005) Systems OECD (2008) R. Rothstein et al. (2008) NRC/BOTA (2010) Fuhrman & Elmore (2004) Harris (2011) Springer et al. (2010)
Thinking in Systems (I) • Accountability operates as a system comprising multiple interacting components. The operating characteristics of the system are a function of: • The properties of each component • The interactions of the components • The context in which the system operates
Other Teacher Characteristics Current Educator Classification Consequences Data/ Sources Analytic Engines Indicators Classification Engine
Thinking in Systems (II) • The design of an accountability system should include (at least) • A logic model • Standards and design specifications for all components • Provisions for modifications over time • Means for monitoring the ongoing operations and the impact of the system in multiple dimensions • Addressing “evidential asymmetry” Quiscustodietipsoscustodes? (Who will watch the watchers?) Juvenal
Thinking in Systems (III) The designers must consider the statistical (and other) properties of the components, as well as feasibility, cost and stakeholders’ perspectives. There are many tensions that system designers must resolve: • Conflicting agendas among stakeholders • Data quality vs. Cost and feasibility • Lack of a gold standard vs. Need for consequential decisions • Fairness to students vs. Fairness to educators
Thinking in Systems (IV) The “science” of accountability system design is still in its infancy. A priori and ongoing evaluations are too rare. Much is known about what not to do, but less about what to do. Current systems are largely driven by political forces, with technical considerations playing a secondary role. Speed is valued over thoughtful analysis. We have observed – and can expect -- negative unintended consequences to follow such implementations. “Anything worth doing, is worth doing slowly.” Mae West