480 likes | 596 Views
CLASS Project. Value-Added Measures Workshop. Central Oregon, January 20, 2011 Western Oregon, January 21, 2011. Technical Assistance Consultants. Dr. Allison McKie Seifullah, Mathematica Dr. Kevin Booker, Mathematica Albany Bend. Technical Assistance Consultants.
E N D
Value-Added Measures Workshop Central Oregon, January 20, 2011 Western Oregon, January 21, 2011
Technical Assistance Consultants Dr. Allison McKie Seifullah, Mathematica Dr. Kevin Booker, Mathematica Albany Bend
Technical Assistance Consultants Jackson Miller, MPP, Westat Dr. Chris Thorn, Westat Crook County Lebanon Oregon City Redmond Salem
Three Frames: • Time • Process • A Story
Timeline 1. 2. 3. 4. 5. 6. 7. • TIF Award • VAM Introduction • Research and Policy • Begin Deeper Study • Initial Construction • Trial First Run • Editing and Adjusting Oct 1 Oct 25 Dec 2 Jan 20 Jan - June July - Sept Oct - May
Process • Commitment to multiple measures • VAM not basis for high stakes decisions • Multiple year process • Don’t jump ahead to compensation
Teacher Incentive Fund Overview January 2011 Presentation to Chalkboard Project & Partner Districts Allison McKie Seifullah
Teacher Incentive Fund (TIF) Purpose "To support projects that develop and implement performance-based compensation systems (PBCSs) for teachers, principals, and other personnel in order to increase educator effectiveness and student achievement, measured in significant part by student growth, in high-need schools" Source: Teacher Incentive Fund Frequently Asked Questions
TIF Goals • Improve student achievement by increasing teacher and principal effectiveness • Reform educator compensation systems so that educators are rewarded for increases in student achievement • Increase the number of effective teachers teaching poor, minority, and disadvantaged students in hard-to-staff subjects • Create sustainable PBCSs Source: http://www2.ed.gov/programs/teacherincentive/index.html
Mathematica Policy Research • Mathematica: a nonpartisan policy research firm with over 40 years of experience • TIF roles: • Conduct national evaluation • Provide technical assistance to evaluation districts
National TIF Evaluation • Research questions: • What is the impact of differentiated effectiveness incentives (DEIs) on student achievement and educator effectiveness and mobility? • DEIs reward, at differentiated levels, teacher and principals who demonstrate their effectiveness by improving student achievement. • Incentive amounts vary based on performance. • Is a particular type of DEI associated with greater student achievement gains? • What are the experiences and challenges districts face in implementing these programs?
Evaluation Design Schools Participating in the Evaluation Lottery GROUP 2 SCHOOLS 1% Across-the-Board Bonus GROUP 1 SCHOOLS Differentiated Effectiveness Incentive GROUP 1 & 2 SCHOOLS Roles & Responsibilities Incentives Evaluations Professional Development All Other PBCS Components
Resources • Technical Assistance • From Mathematica and Vanderbilt for evaluation districts (Albany and Bend-La Pine in Oregon) • From Westat for all other districts • Center for Educator Compensation Reform website: http://cecr.ed.gov/
Required PBCS Components • Differentiated effectiveness incentives • Additional responsibilities and leadership roles incentives • Rigorous, transparent, and fair evaluation systems for teachers and principals • Needs-based professional development • Data management system that links student achievement data to payroll and HR systems • Plan for effectively communicating PBCS elements and promoting use of PBCS data
Differentiated Effectiveness Incentives • Must give "significant weight" to student growth • Must include observation-based assessments of teacher and principal performance at multiple points in the year • Must be “substantial”: “likely high enough to create change in behavior…in order to ultimately improve student outcomes” • May be individual (e.g. teacher), group (e.g. grade, team, or school), or mixed incentives
Student Growth and Value-Added Models • Student growth: change in student achievement • Chalkboard given competitive preference for using a value-added measureof the impact on student growth as a “significant factor” in calculating differentiated effectiveness awards • Student growth VAM (Kevin’s discussion)
Background on Value-Added Models January 2011 Presentation to Chalkboard Project & Partner Districts Kevin Booker
Background on value-added models (VAMs) • VAMs aim to estimate contribution of teacher or school to student learning growth • Use prior achievement and other student data to factor out external influences on achievement • Assess whether students across a classroom or school are doing better or worse than predicted • Can be used to assess performance at different levels, including school, teacher teams, grade/subject teams, and individual teachers
How does value-added compare to alternatives? • Percent proficient • Unfair, makes inefficient use of data • Average test scores • Unfair, doesn’t account for baseline performance • Changes in average test scores • Better, but changing samples of tested students over time make it problematic • Average test score gains • This is closest to value-added conceptually
Value-added = Average test scores ofown students – scores of similar students 540 Value-added = 5 End of Year Test Scores 535 Similar Students Own Students Predicted Beginning of Year 440 440
VAMs rely on residuals: What is left after accounting for other known factors • Account for everything we know • Assume that prior scores capture other factors that would be unobservable • Student’s innate ability, accumulated achievement, and family, neighborhood, and peer influences that affect achievement also affected achievement last year • Time-specific events for individual students add “noise,” reduce precision of estimates
Issues to consider when using VAMs • There will be some tested grades/subjects where a VAM is infeasible • Earliest tested grade • If prior test scores are not a good predictor of current performance • The results from a VAM are inherently relative to the sample included, rather than benchmarked to an external standard • When the analysis sample includes schools for the entire state, the VAM can tell you how a particular school performed compared to other schools in the state • Could say that School A is in the 85th percentile of schools in the state, based on this VAM
What VAMs don’t do • Don’t measure student learning that isn’t captured in student assessments • Don’t adjust for differences in resources • Don’t account for unobserved changes in student circumstances • Don’t determine how or why some teachers/ schools are performing better than others
Balancing risks of errors: a policy decision • Like all evaluation methods, VAMs are susceptible to some error • Unlike most other methods (e.g. classroom observation), VAM error rates are measured and reported • Particular error rate adopted is a policy question that depends on tolerance for different kinds of mistakes • Confidence level/error rate might vary depending on use of results
Frequently asked questions about VAM • How does a VAM compare schools with different grade ranges? • Which factors should a VAM control for? • How many students are necessary to get a valid VAM estimate? • How will issues of data quality be addressed? • Can a VAM work when the test changes from one year/grade to the next? • Can a VAM incorporate multiple measures of performance?
Common pitfalls when rolling out VAM • Most common mistake when rolling out VAM is to push to use VAM for high stakes too soon • Typically mainly data linking students to classrooms and teachers that is most problematic • Need both short term and long term goals • Short term goals: • Identify VAM models that can be reliably estimated with existing data • Start process of improving data systems so that more and better VAM measures can be included moving forward
Goals for Year 1 VAM • Identify VAM levels feasible in first year • School-level VAM • Grade-level team VAM • Subject or grade-by-subject team VAMs? • Identify tests to include in first year • State-wide assessments a good starting point • Tests need to be comparable across schools • Can add additional tests in future years • VAM is flexible in terms of including different types of tests • Aim for a trial run of a teacher-level VAM sometime during first year
Which tests to include in the VAM? • An advantage a statewide test is that the VAM can identify when all schools in the district improve • Can set performance standards based on meeting a certain percentile in the state performance distribution • Allows for more schools to meet the standard as the district improves • The VAM can use tests given only within the district, but results will be relative to other schools in the district • For instance, reward schools in the top 30%
VAM as part of teacher performance measure • Multiple VAM levels can be included in the measure of teacher performance • Could be 30% teacher team VAM, 30% school-level VAM, and 40% other outcome measures • Which VAM levels are included can vary across teachers • Teachers in tested grades and subjects • Teachers in untested grades • Teachers in untested subjects
Using a teacher-level VAM • Teacher-level VAM is a useful tool to inform district policy, even if not used for high stakes • Many interventions take place at the classroom level • Successful rollout of VAM takes small steps to build trust • Start with school and team-level VAM to build understanding and confidence • As data systems improve, roll out teacher-level VAM in a low stakes setting • Once trust and understanding are in place and multiple years of teacher VAM are available, build up to other uses
Improving Data Quality • Key challenge is to correctly identify the source(s) of instruction associated with each outcome, for each student • Student mobility • Team teaching • Teaching assistants • Other sharing arrangements • Policy question: how much time is necessary to be instructionally relevant? • Roster verification is a crucial component • Once data is available, the VAM can allocate credit appropriately
VAMs with team teaching • Whenever multiple sources share responsibility for a particular student outcome, VAM uses dosage to allocate credit • A student who switched schools during the year may get 60% dosage at one school and 40% at another • Even if not interested in teacher-level VAM, improved data quality can allow for more realistic groupings of teacher teams • Not necessary for entire analysis sample to have the same data quality
Key VAM decision: Which control variables? • Potential control variables include: • Student characteristics such as gender, race/ethnicity, disability status, parental income/education • Student programmatic variables such as ELL status, free or reduced price lunch status, special education status, gifted status • Student mobility, indicator for grade repeater • Classroom-level aggregates, class size • “Dosage” variables indicating how much of the year each student spent with that school • Is the control variable outside of the control of the school, and comparable across schools?
Shrinkage Estimator • Not fair to treat estimate based on 40 students with same weight as estimate based on 400 students • Final estimate for each school is weighted average of : • Value-added estimate calculated for the school • Value-added estimate for the average school • The fewer students a school has: • The less evidence we have for this school • The more we weight the average school
Shrinkage: An Example Top 20% District average
Shrinkage: An Example Top 20% District average
Shrinkage: An Example Top 20% District average
Options for combining outcome measures • Unweighted average across all tests • Give each subject equal weight • Base weights on precision: More weight to outcomes that are more precisely estimated • Use weights chosen according to district policy priorities
HS issues: EOC assessments • Can be difficult to accurately predict EOC scores • Prior test scores are typically from different subject areas • Students take EOC exams in different grades • Algebra I EOC exam typically taken in 8th or 9th grade • Differences likely related to high-school academic track • Patterns can vary dramatically across schools
HS issues: Dropouts • Attrition includes dropouts and students who leave the data for other reasons • Rates of attrition from 8th grade to 12th grade vary substantially across high schools • Commonly see attrition ranges from 10% to 50% • If dropouts would have experienced lower growth, then schools with high dropout rates would show artificially inflated value-added