Washington State Teacher and Principal Evaluation Project

Washington State Teacher and Principal Evaluation Project Maximizing Rater Agreement

As you enter, please have a brief discussion with your district team to decide your level of confidence in whether the following statements are true for your district: Our evaluators demonstrate accuracy and strong rater agreement when using observation data to score teacher performance. Our district’s new evaluation system includes frequent, structured opportunities for evaluators to practice and calibrate their observation and rating skills. Our teachers and principals trust their evaluators to rate their performance accurately and reliably. Write your district name on three sticky notes and place them on the confidence scales posted on [INSERT LOCATION] for each statement. Entry Task: Confidence Conversation

Welcome! • Agenda • Connecting • Learning I • Implementing • Reflecting • Wrap-Up • Introductions • Logistics • Agenda

Modules • Introduction to Educator Evaluation in Washington • Using Instructional and Leadership Frameworks in Educator Evaluation • Preparing and Applying Formative Multiple Measures of Performance: An Introduction to Self-Assessment, Goal Setting, and Criterion Scoring • Including Student Growth in Educator Evaluation • Conducting High-Quality Observations and Maximizing Rater Agreement • Providing High-Quality Feedback for Continuous Professional Growth and Development • Combining Multiple Measures into a Summative Rating

The Evaluation System Components

TPEP Core Principles “We Can’t Fire Our Way to Finland” • The critical importance of teacher and leadership quality • The professional nature of teaching and leading a school • The complex relationship between the system for teacher and principal evaluation and district systems and negotiations • The belief in professional learning as an underpinning of the new evaluation system • The understanding that the career continuum must be addressed in the new evaluation system • The system must determine the balance of “inputs or acts” and “outputs or results”

Session Norms • Pausing • Paraphrasing • Posing Questions • Putting Ideas on the Table • Providing Data • Paying Attention to Self and Others • Presuming Positive Intentions • What Else?

Connecting Builds community, prepares the team for learning, and links to prior knowledge, other modules, and current work

Module Overview: 2 Parts • Conducting High-Quality Observations • Maximizing Rater Agreement Reminder! • This module provides an orientation to the basic concepts. • This module does not go into great depth about evidence relating to any of the specific instructional or leadership frameworks and instead leaves it up to the districts to seek additional training.

Overview of Intended Participant Outcomes Participants will know and be able to: • Describe the OSPI working definition of rater agreement and the stages for development. • Identify common rating errors in their own and others’ practice. • Utilize appropriate strategies for minimizing bias and error in the observation and rating process. • Understand the elements of high-quality training required to achieve maximum rater agreement.

Connecting Content: Importance of Rater Agreement • Even if you select a high-quality instructional or leadership framework AND • Observers use best practices in collecting the observation data: The results will be meaningless if observers are unable to demonstrate accuracy and consistency in scoring using the framework. KEY POINT: An educator’s observation scores should be the same regardless of the observer.

Importance of Rater Agreement • Demonstrating rater agreement is critical to ensuring that: • Educators can trust the new evaluation system. • Educators receive relevant, useful information for professional growth. • The new system is legally defensible for personnel decisions.

Rater Agreement Background • The new law requires that evaluators of both teachers and principals “must engage in professional development designed to implement the revised systems and maximize rater agreement.” • The Teacher and Principal Evaluation Project (TPEP) has relied heavily on the growing body of research, the framework authors, and the practical input from practitioners in pilot sites to create a “working definition” of rater agreement for the 2012-13 school year.

OSPI Definition of Rater Agreement The extent to which the scores between the raters have consistency and accuracy against predetermined standards. The predetermined standards are the instructional and leadership frameworks and rubrics that define the basis for summative criterion-level scores.

OSPI Definition of Rater Agreement • Consistency: A measure of observer data quality indicating the extent to which an observer is assigning scores that agree with scores assigned to the same observation of practice by another typical observer. • Accuracy: A measure of observer data quality indicating the extent to which an observer is assigning scores that agree with scores assigned to the same observation by an expert rater; the extent to which rater’s scores agree with the true or “correct” score for the performance.

Calculating Rater Agreement Table 1. Illustrating Rater Agreement

Calculating Rater Agreement Table 1I. Illustrating Rater Agreement (Cont.)

Connecting Activity: Where Can You Assess Rater Agreement? Summative Criterion Score Evidence Framework Score Criteria 1 • Observation evidence Criteria 2 Framework scales (e.g., components, domains, dimensions) Criteria 3 Criteria 4 • Student growth data • Artifacts • Other relevant evidence Criteria 5 Criteria 6 Criteria 7 Criteria 8

Learning Understand common sources of rater error and strategies for minimizing their influence in observer ratings Understand the role of high-quality observation training in achieving rater agreement

Learning Content I.Avoiding Rater Error Recall that a skilled observer: • 1. Understands each component and indicator on the district rubric thoroughly and deeply. • 2. Gathers and sorts sufficient evidence of practice as it happens in the classroom or school. • 3. Recognizes and puts aside preferences and biases. • 4. Interprets the evidence appropriately to give an accurate rating using the evaluation instrument. (McClellan, Atkinson, & Danielson, 2012)

Avoiding Common Rater Errors • Central Tendency • A rater evaluates the observation using points on the middle of the scale and avoids extremely high or low ratings. • Strategy to avoid this error? • Pay careful attention to behavioral anchors that define performance at each scale point. • Compare observation evidence with the behavioral anchors. • Keep in mind that behavioral anchors are examples—you do not have to have observational evidence for every single anchor for a particular rating.

Avoiding Common Rater Errors • Contrast Effect • A rater directly compares the performance of one educator to that of another educator. • This is particularly problematic when a group of educators select a common criterion on a focused evaluation cycle. • Strategy to avoid this error? • When assigning observation ratings, do not use another educator’s performance as a point of reference. Raters should only compare the observation evidence against the anchors on the rating scale.

Avoiding Common Rater Errors • Focusing on One or Two Incidents • Ratings are based on only a small sample of observation evidence that typically includes either very strong or weak examples of practice. • Strategy to avoid this error? • Be sure to take into account the full range of performance described in the observation evidence. Assess the frequency and depth of the behaviors recorded against the behavioral indicators in the rubric.

Avoiding Common Rater Errors • Halo Error • A rater allows ratings on one component/scale to influence ratings on another component/scale. • Strategy to avoid this error? • Remember that framework components are scored separately. Your ratings on one component should not influence ratings on another component. • Consider the observation evidence for each component separately and only use the information that is relevant to the component you are considering.

Avoiding Common Rater Errors • Potential Error • A rater gives higher or lower ratings to an educator then is warranted by the observation evidence because he or she believes the educator has (or does not have) the potential to be an excellent educator. • Strategy to avoid this error? • Remember to consider all instances of an educator’s actual observation data. Ratings should be made based only on the observation evidence collected, not on anticipated improvements or declines.

Avoiding Common Rater Errors • Leniency and Severity Errors • A rater gives mostly high (lenient) or low (severe) ratings to an educator in a manner inconsistent with the observation data collected. • Strategy to avoid this error? • Pay careful attention to the scale anchors when making your ratings. Also, review the anchors in order to understand how performance is defined at each scale point. Do not try to be intentionally “easy” or “hard” in your ratings.

Avoiding Common Rater Errors • Recency Bias • A rater is inclined to remember recent events better than those that occur in the past; thus, raters often place greater weight or emphasis on evidence collected near the end of the observation. • Strategy to avoid this error? • Consider all of the observation evidence collected over the entire class period. Remind yourself that the educator’s performance at the beginning of the observation is just as important as his or her performance at the end.

Avoiding Common Rater Errors • Similar-to-me Bias • A rater provides higher ratings to educators who are similar to themselves and lower ratings to educators who are dissimilar. • Strategy to avoid this error? • Avoid incorporating personal preferences, feelings, or perceptions about the educator into your ratings. Only actual observation evidence should be used to make an observation rating.

Learning Activity I. Practicing Observation Rating • You will need the following: • Your observation notes from the Conducting High Quality Observations module • Your district’s instructional framework • Identify sections of your framework aligned to Criteria 5:

Learning Activity I. Practicing Observation Rating • Step 1: • As a group: select two indicators from the list, read them through, and discuss the key differences between the performance levels in each. • Step 2: • As an individual: • Read your observation notes and code the evidence relevant to each indicator (e.g., use highlighters, make notations, etc.). • Select a rating for each indicator based on your coded evidence.

Learning Activity I. Practicing Observation Rating • Step 3: • Select one person to be the “recorder” and write down ratings in Handout 4: Ratings Record. • Share your indicator ratings for recording.

Learning Activity I. Practicing Observation Rating • Step 4: • Identify any indicator without exact rater agreement. • Discuss and attempt to achieve a rating consensus on each (e.g., explain your ratings with reference to evidence). • Note and record, during the discussion, any common rating errors you find in your own ratings or others in your group (see Handout 3: Common Rating Errors for reference).

Learning Activity I: Debrief/Wrap-Up • Did anyone achieve exact rater agreement on at least one indicator? Both? • Were you able to achieve consensus on ratings where you did not have exact rater agreement? • What rater errors did you identify and what strategies could you utilize in the future to avoid the error?

Learning Content II: Observer Training to Achieve Rater Agreement • Intensive Training to Achieve Rater Agreement • Orientation and deep understanding of standards and framework, components, and tools • Practice rating using a combination of videos and live observations • Feedback, coaching, and discussion of ratings • Assessment of rater agreement (e.g., certification testing)

OSPI’s Stages of Rater Agreement Training 2-3 Day Foundational Training Ongoing Rater Agreement Training

Certification and Calibration Exams • The certification exam should cover all grades and subjects the observer will observe. • There are a variety of ways to reduce the time burden of certification: • Include a knowledge assessment of the observation rubric • Mix shorter videos of practice with longer, full-lesson videos of practice • The calibration exam should test the observer on a representative selection of skills and content to ensure continued accuracy in rating. • Certification and calibration exams are high-stakes exams.

Ongoing Calibration, Practice, and Monitoring • Ongoing Calibration, Practice, and Monitoring • Rater agreement is NOT ensured by a single training or certification test.

Ongoing Calibration, Practice, and Monitoring • Rater drift will naturally occur unless evaluators have: • Periodic opportunities to re-calibrate. • Access to practice videos for difficult-to-score domains/components. • Expectations that their ratings will be monitored.

Ongoing Calibration, Practice, and Monitoring • Lessons from TPEP Pilots: • Informal calibration through discussion forums where observers share challenges and best practices have a big impact. • Use pre-existing professional learning groups (such as principal PLCs) to practice and calibrate. • To practice, co-observe a classroom lesson, score separately, and meet to compare scores.

Learning Activity II: Identifying Opportunities for Calibration • Discuss with your team: What opportunities already exist in your district for ongoing calibration? Identify at least two. • Share: What opportunities have you identified?

Implementing Develop a district plan for ongoing assessment and monitoring of rater agreement Develop a district plan for ongoing rater calibration and practice

Implementing Activity: Monitoring and Maintaining Rater Agreement • Read “Maximizing Rater Agreement: A Primer” and “Rater Agreement in Washington State’s Evaluation System” (20 minutes) • Use the Implementation Planning Tool (Handout #5) to begin developing your district’s plan for monitoring and maintaining rater agreement over time

Implementing Activities Debrief • Each team share two things to debrief our implementing tasks: • One decision you made today (could be a key decision, a preliminary decision, a change of course, etc.) • One of the immediate next steps you are taking when you return to your district

Reflecting

Revisiting Our Confidence Conversations • Our evaluators demonstrate accuracy and strong rater agreement when using observation data to score teacher performance. • Our district’s new evaluation system includes frequent, structured opportunities for evaluators to practice and calibrate their observation and rating skills. • Our teachers and principals trust their evaluators to rate their performance accurately and reliably.

What’s Next • Homework options: • District or school teams: use your observation notes to practice scoring additional indicators in your framework and discuss ratings to achieve agreement. • Identify specific components or dimensions you think will be particularly hard for your observers to score. Prioritize those components or dimensions in your ongoing calibration and practice sessions.

Thank you! Presenter Name XXX-XXX-XXXX xxxxxxxxxxx@xxx.xxx 1234 Street Address City, State 12345-1234 800-123-1234

Washington State Teacher and Principal Evaluation Project