I. Revisiting a 30+ year old argument

Systematic Naturalistic Inquiry: Toward a Science of Performance Improvement (aka improvement research)Anthony S. BrykCarnegie Foundation for the Advancement of TeachingSociety for Research on Educational Effectiveness, March 2010

I. Revisiting a 30+ year old argument • Is design really the answer? • The randomized treatment control paradigm as the gold standard circa 1975 • Takes me back to the spring of 1978: “evaluating program impact: a time to cast away stones, a time to gather stones together” • And, is this really the right question?

II. What Information Does an RCT Actually Provide? • Two marginal distributions YTand YC: the distributions of outcomes under the treatment and control conditions. • Provides answers to questions that can be addressed in term of observed differences in these two marginal distributions.

Evidentiary Limits of the Treatment-Control Group Paradigm • Suppose now that we define a treatment effect for individual i as αi. • We can estimate the mean treatment effect, μα. • But, interestingly we cannot estimate the median effect or any percentile points in the αi distribution.

Evidentiary Limits (continued) • Nor can we assess any linkages between αi and how these effects might be changing over time, or depend on individual and context characteristics. • To accomplish the latter, we need to know about the treatment effect distribution conjoint with multivariate data on individual and program characteristics.

Evidentiary Limits (continued) • Of course we can add a limited number of factors into the design and estimate these interaction effects. • So we can do something on a limited scale within the T/C paradigm • But we need to know the factors in advance • And they have to be small in number • Pushing the envelop here would be time-consuming, expensive and cumbersome

My conclusions back then • We need a different methodology for learning about programs and the multiple factors that may affect their outcomes • An accumulating evidence strategy (Light and Smith) from multiple efforts at systematic inquiry over time • Needs to be dynamic in design—as we learn from practice we are changing it • A system orientation— “elements standing in strong interaction.” pause

The Paradox of Anti-depressant-Induced Suicidiality(H.I. Weisberg, V.C. Hayden, V.P. Pontes (2009) Clinical Trials. Vol 6.No. 2, 109-118. ) • Key conclusions: • When the causal effect of an intervention varies across individuals the threat to validity can be serious. • RCTs should not automatically be considered definitive, especially when the results conflict with those of observational studies. • Not only the magnitude but even the direction of the population causal effect may be erroneous.

III. So a New Directions 2010: Basic Principles • Returning to this idea of a prospective accumulating evidence strategy • Simplest version: the multi-site trial vs. cluster randomized trial. • Extend this idea out to all three facets – contexts, teachers, and students.

Basic Principles • Anchored in a working theory about advancing improvements reliably at scale - Assume a systems perspectives: interventions as operationally defined in strong interactions with the specific people who take it up and the contexts in which they work. • Gathering and using empirical evidence about such phenomena should be the organizing goal.

Basic Principles (continued) • Accelerated longitudinal design + a value added analytic model. • Counterfactual comes from a baseline comparison. • In principle we have some evidence about variable effects attached to individuals, their teachers and their context. • Any individual piece not very precise but if we have enough cases there is power to see many signals.

Basic Principles (continued) • A key internal validity concern – a coterminous intervention to worry about. • But we also now have an evidentiary resource not typically found in RCT • a capacity to examine questions of replicability over many different contexts of intervention. • This is the generalizability evidence that relates directly to our reliability consideration. • Can we make this happen with any reliability over many different situation?

What makes it naturalistic? • Easily engaged in practice. Could be routinely done. • Could imagine gathering such data at large scale. • Immediacy of evidence – possibility of learning as you go. • And as it will turn out, actually moot (opportunistic) on the question of an appropriate design analysis paradigm

IV. Elaborate through an Example A recently completed study of the efficacy of Literacy Collaborative Professional Development Co-contributor: Gina Biancarosa University of Oregon Detailing the causal cascade from the intentional design of professional education through changes in instructional practice and then on through to improvements in student learning gains over time.

Setting the Context: Typical District Approach to a Coaching Initiative Credit to: A Framework for Effective Management of School System Performance. Lauren Resnick, Mary Besterfield-Sacre, Matthew Mehalik, Jennifer Zoltners Sherer and Erica Halverson.

And then voila! (aka the zone of wishful thinking!)

Peering inside the “Black Box”: the actual work of coaches

What do coaches need to know and be able to do and how do we know if they can? Data for Performance Improvement Quality of coach-teacher trust social resources for improvement How do coaches actually spend their time? Quality of the trust dependency/ relationship Who is being coached on What topics? What about the Individual teacher might affect These social exchanges?

Data for Performance Improvement Teacher practice development Evidence of teacher learning ?

Filling out the account: an information system to support instructional improvement Surveys of teacher-coach trust and school-based professional community Coaching Logs Teacher practice development Coaching performance assessments Surveys of coach principal trust: respect, regard, competence and integrity Observational evidence of teacher learning and practice Coaching logs: the who and what of PD as delivered and what’s next?

Formal School Structure Informal Organization Joined in a Working Theory of Practice Improvement • Background • Willingness to engage innovation • Experiment with new practices in the classroom • Expertise • Prior experiences in comprehensive literacyteaching (ZPD) LC Intervention: amount, quality and content Of PD Impact on Student learning Classroom Literacy Practice Individual Teacher School-wide support for teacher learning * Work relations among teachers * Influence of informal leaders *professional norms * principal leadership * coach quality/role relationship * resource allocations (time) * school size It is hard to improve what you do not really understand.

Linked to evidence about variability in effects on student learning associated with teachers and schools • Assessing (even crudely) the value added to learning associated with individual classrooms and schools and investigating what might be driving observed variability in these effects.

Accelerated Cohort design6 cohorts studied over 4 years Grade Training year Year 1 of implementation Year 2 of implementation Year 3 of implementation

Hierarchical Crossed Value-added Effects Model Individual growth parameters overall value- added effects teacher-level school-level value-added effects

Value-added effects by year Ave. student learning growth is 1.02 per academic year

Variability in school value-added, year 1 Average student gain per academic year Year 1 mean effect No effect

Variability in school value-added, year 2 Average student gain per academic year Year 2 mean effect Year 1 mean effect No effect

Variability in school value-added, year 3 Average student gain per academic year Year 3 mean effect Year 2 mean effect Year 1 mean effect No effect

Variability in teacher value-added within schools, year 1 Average student gain per academic year No effect

Variability in teacher value-added within schools, yr 2 Average student gain per academic year No effect

Variability in teacher value-added within schools, yr 3 Average student gain per academic year No effect

Exploring variation in trends • Which teachers and schools improved most? • Why? Under what conditions?

V. To Sum Up • The accelerated multi-cohort design is relatively easy to implement in school settings (a naturalistic data design). • It affords treatment effect results not easily obtainable through the “gold standard”— • A multivariate distribution of effects linked with potential sources of their variation and dynamic over time

To Sum Up • More generally, an argument for an evolutionary, exploratory approach to accumulating evidence • Data designs are now practical and analytic tools exist. • Imagine if we had such information now on the 750+ schools that have been involved with LC over the past 15 years. • A stronger empirical base for a design-engineering-development orientation to the improvement of schooling.

To Sum up: Useable Knowledge for Improving Schooling • Anchored in: • place problems of practice improvement at the center • a working theory of practice and its improvement • Measure core work activities and outcomes • Aim for a science of performance improvement • Variation is the natural state of affairs • Make it an object of study • Reliability is a key improvement concern in human-social resource intensive enterprise

I. Revisiting a 30+ year old argument

I. Revisiting a 30+ year old argument

Presentation Transcript

Revisiting Difficult Constraints

Revisiting Tactile Graphics

Revisiting Generalizations

REVISITING A BRAND

Revisiting Reference

Revisiting AS Mock

Revisiting Rhetoric

Revisiting Lester Hill

Revisiting Revision

REVISITING SOCIAL MARKETING

Revisiting ‘A unique punishment’

Revisiting Differentiation

Revisiting revision

Revisiting Parallelism

Revisiting a Seminal Anthropological work

Musique: “ Revisiting Normandy”

Ariovistus I.30-I.54

Revisiting Slope

Revisiting the OLM’s

Revisiting failure detectors

Revisiting Statistics