Teachers Voices About the Effectiveness of Benchmark Testing

1. Lisa Abrams James McMillan Angela Wetzel Virginia Commonwealth University CREATE � National Evaluation Institute Williamsburg, Virginia October 7, 2010 Teachers� Voices About the Effectiveness of Benchmark Testing

2. Need for Benchmark Testing: Role of NCLB Created an accountability context. Contributed to a data-driven culture: Reporting requirements Track achievement of students over time Demonstrate progress toward AYP goals Disaggregation of achievement data Dramatically increased the importance of learning from student assessment results.

3. Need for Benchmark Testing: Limitations of State-mandated Test Results �Relying on high-stakes test results for instructional guidance is like trying to get to the Empire State Building with a map of the United States� (Supovitz & Klein, 2003, p.1). �Receiving test data in July is like driving a school bus looking out of the rearview mirror. I can see where my students have been but I cannot see where we are going� (Salpeter, 2004, p. 30).

4. Rationale & Purpose Are students making progress toward meeting the requirements of the state test? Are students on track to pass the state tests? Are subgroups of students on track to meet AYP targets? Need for information that could provide the following: Measure of student progress relative to a set of specific content standards/skills Identify content areas of strength/areas for improvement Shape instructional decisions Serve as an �early warning� system Inform strategies to support the learning of individual students Results that can be aggregated: student ?classroom ?grade/team level ?school ?district levels Many districts have adopted formal local interim assessments to obtain diagnostic information that can be acted on quickly.

5. Range of Instructional Uses (see Supovitz & Klein, 2003) Planning: Decide on content Pace and instructional strategies or approaches (i.e., mastery orientation) Delivery: Targeted instruction: whole class or small groups depending on mastery of content/skills Provide feedback and/or re-teaching selected content and/or skills Selection and use of supplemental or additional resources Remediation: Identify low-performing students Design plans for providing additional supports/assistance Evaluation: Monitor/track student progress Examine effectiveness of interventions Determine instructional effectiveness

6. What We Know About Benchmark Tests Widespread use across districts in Virginia and nationally (Marsh, Pane & Hamilton, 2006). Mixed views on usefulness of benchmark test results Compared to own classroom assessments ? less useful, provide redundant information. Compared to state test ?more useful than those of state tests to �identify and correct gaps in their teaching�. Factors that influence teachers� views: quick turnaround of results, alignment with curriculum, capacity and support, instructional leadership, perceived validity, reporting, added-value.

7. Impact on Teachers Informs instructional adjustments (Brunner et al., 2005; Marsh, Pane & Hamilton, 2006; Yeh, 2006) Increased collaboration and problem solving (Lachat & Smith, 2005; Wayman & Cho, 2009; Yeh, 2006) Enhanced self-efficacy, increased reflection (Brunner et al., 2005; Yeh, 2006) Increased emphasis on testing; test preparation and primary influence of colleagues and standards on practice (Loeb, Knapp & Elfers, 2008) Variability within schools � some teachers use information, others do not �80% of the variability in teacher survey responses was within rather than between schools (Marsh, Pane & Hamilton, 2006).

8. Impact on Students Achievement � although limited, research suggests impact may be mixed Targeted instruction led to improvements in student test scores (Lachat & Smith, 2005; Nelson & Eddy, 2008; Trimble, Gay & Matthews, 2005; Yeh, 2006) and proficiency in reading and mathematics (Peterson, 2007). Henderson, Petrosino & Guckenburg (2008) found no difference in student achievement gains between schools that implemented benchmark assessments and a group of comparison schools that did not. Increased engagement and motivation (Yeh, 2006) Increased access to learning opportunities � tutoring and remedial services (Marsh, Pane & Hamilton, 2006) Targeted instruction toward the �bubble kids.�

9. Purpose of the Study Explore the extent to which teachers use benchmark test results to support learning. What is the policy context and nature of benchmark testing? How do teachers use benchmark testing data in formative ways? What factors support and/or mitigate teachers� formative use of benchmark testing data?

10. Research Design and Methods Qualitative double-layer category focus-group design (Krueger & Casey, 2009) Layers : school type & district (N=6) Protocol the general nature of benchmark testing policies and the type of data teachers receive expectations for using benchmark test results instructional uses of benchmark test results general views on benchmark testing policies, practices and procedures Focus group sessions

11. Participants Selection: two-stage convenience sampling process District ? School Principal ? Teachers Data Collection: Spring 2009/Fall 2010; 15 focus groups w/67 core-content area teachers Demographic Profile: The majority were: white (82%), female (88%) taught at the elementary level (80%) Average of 11.5 years of classroom experience (range of 1-34 yrs.) 33% were beginning teachers with 1-3 years of teaching experience and 20% had been teaching for over 20 years. 20% were middle school teachers in the areas of civics, science, mathematics and language/reading

12. Data Analysis Transcript-based approach using a constant-comparative analytic framework was used to identify emergent patterns or trends (Krueger & Casey, 2009). Analysis focused on the frequency and extensiveness of viewpoints or ideas. Codes created in 9 key areas and applied to the text. �alignment,� �test quality,� �individualized instruction,� �testing time� High inter-coder agreement.

13. Findings: Similar Policies Across Districts; Differences Within Districts Theme 1: Benchmark testing policies related to test construction and administration were similar among school divisions. Inconsistencies were evident across content areas and grade levels within districts. Example: use of results for grades They are graded, but they are not part of their grade. So they will [benchmark test results] show up on their report card as a separate category just so parents know and the students know what the grade is, but it doesn�t have any effect on their class grade. A lot of that is actually left up to the teacher. It�s not implemented county wide that it has to count as a test� it�s been left up to the teacher [to determine] what parts of the tests they actually count�if you spent more time on one [content area] and didn�t get to cover another, it is left up to the teacher to take those questions out, recalculate what their grade was based on what you taught.

14. Findings: Expectations Theme 2: There are clear and consistent district- and building-level expectations for teachers� analysis and use of benchmark test results to make instructional adjustments in an effort to support student achievement. We are asked to be accountable for each and every one of those students and sit face-to-face with an administrator and she says to you, how are you going to address those needs? And then we have to be able to say, well, I�m pulling them for remediation during this time, or I�m working with a small group or I�ve put them on additional enrichment, or whatever it is, but we�ve got to be able to explain how we�re addressing those weaknesses. Our principal expects when you have a grade level meeting to be able to say, this is what I�m doing about these results, because it is an unwritten expectation but it is clearly passed on, usually from mentor to mentee by sitting down with them the first time they are giving the test and describing how you do data analysis and literally walking them through it and showing them patterns to look for.

15. Findings: Access to Results Theme 3: Timely access to test results and use of a software program supported data analysis and reporting. You can break it [benchmark test results] down by SOL strand as well on this computer system. And I think it�s important because it can show you whether or not the class as a whole did well or if it was just that one particular student where you need to go back and remediate, or it could reflect on you as a teacher, maybe I didn�t teach that particular concept thoroughly enough and it might not necessarily be the child which I think is really helpful� It helps you know what you need to hit on for the SOLs. That if we are supposed to be using this information to guide instruction we need immediate feedback, like the day of, so we can plan to adjust instruction for the following day.

16. Findings: Thoughtful Discussion Essential Theme 4: It was important for teachers to discuss results with others and have time with colleagues to discuss results. We have achievement team meetings where we look at every single teacher, every single class, everything, and look at the data really in depth to try to figure out what�s going on. What is the problem with this class? Why is this one doing better? I think we just review what it is [the results] and we say, hey, your kids did really good on that. What did you do to make them understand? Maybe mine didn�t do too well. We exchange ideas, and brainstorm together. We work as a team. We don�t do anything with it [test results]. To be very honest with you, we as individuals look at it [test results], we look at the item analysis, but as a department we never come together.

17. Findings: Informing Instruction Theme 5: Teachers� analyze benchmark test results at the class and individual student level to inform review, re-teaching, and remediation or enrichment. Individual student versus class needs guide teachers� next steps. If I see a large number of my students missing in this area, I am going to try to re-teach it to the whole class using a different method. If it is only a couple of [students], I will pull them aside and instruct one-on-one. I definitely look at how many kids missed a certain question. Did the majority of the class miss this question? Can I see a trend with this topic? I don�t want to say that I ever brush any skills off to the side, but I do hit those weaknesses and the gaps hard. It makes a difference in my instruction. I mean, I think I�m able to help students more that are having difficulty based on it. I am able to hone in on exactly where the problem is. I don�t have to fish around.

18. Findings: Factors that Impact Use of Results Theme 6: A variety of factors impact teachers� use of benchmark test data, including the alignment of the test with the content of instruction, the quality of the test items, the accuracy of the scoring, and the technology available to support the test administration. The other problem too is when you have your pacing guide and they tell you to hit this [content] the first nine weeks, a lot of times the questions on the benchmark aren�t correlated with what you were teaching the first nine weeks, so they will have questions about things that they didn�t tell you to go over. We really need to focus on the tests being valid. It is hard to take it seriously when you don�t feel like it is valid. When you look at it and you see mistakes or passages you know your students aren�t going to be able to read because it is way above their reading level. Many times the 9-week assessments are so all encompassing that it is difficult for the students�.you may only have one question that addresses a specific objective. And so that is not really a true representation of what the child knows about that objective.

19. Findings: Testing Time vs. Learning Time Theme 7: Teachers expressed significant concerns about the amount of instructional time that is devoted to testing and the implications for the quality of their instruction. I think it has definitely made us change the way we teach because you are looking for how can I teach this the most effectively and the fastest�that is the truth, you have got to hurry up and get through it [curriculum] so that you can get to the next thing so that they get everything [before the test]. I do feel like sometimes I don�t teach things as well as I used to because of the time constraints. You are sacrificing learning time for testing time�we leave very little time to actually teaching. These kids are losing four weeks out of the year of instructional time.

20. Discussion In the main, consistent with other research. Primary purpose � used to identify weaknesses and whether additional instruction was needed. Secondary purpose � instructional effectiveness & curriculum evaluation. Little predictive usage. Teachers were rarely surprised at the results. Typically evaluated in light of other evidence of student understanding. Much promise, less impact on teacher practice. Importance of conversations among teachers. Most effective with clear expectations, with regular, structured, yet informal meetings. Enhances critical analysis and interpretation of results. Leads to sharing of ideas and collaborations for re-teaching. Most effective with a supportive school culture. More pressure to improve > less useful

21. Discussion, continued Relatively little emphasis on instructional correctives. Suggests surface-level impact on few teachers. Without some structure, discussions about re-teaching tend to be general. Professional development on re-teaching needed. Formative assessment? Alignment and high quality items are essential. Instructional emphasis must match tests and item format. Teachers needed to see the test items when reviewing results. What is more meaningful - individual items, groups of items, or total scores? Are benchmark tests worth the cost and lost instructional time? Total costs & loss of instructional time needs to be documented.

22. Factors Influencing Use of Benchmark Assessments

23. Recommendations for Practice Assure item quality. Emphasize one purpose � instructional. Use a variety of item types. Reports should be accessible by teachers and provide immediate results. Provide on-going professional development in interpretation and use of results. Provide guidelines for teachers in interpretation and use of results. Provide guidelines for incorporating other information on student understanding. Evaluate the effectiveness of assessments.

24. Recommendations for Research More research on factors related to appropriate teacher interpretation and use of test results. More research on use of re-teaching strategies and subsequent improvements in student achievement. More research on effectiveness of professional development. More research on test items that can diagnose errors in knowledge, understanding, and thinking. More research on validity claims. More research on the technical qualities of items. More research on student motivation when taking the tests.

25. Teachers� Voices About the Effectiveness of Benchmark Testing Questions? (PowerPoint available at http://www.soe.vcu.edu/merc/index.html)

26. Lisa Abrams James McMillan Angela Wetzel Virginia Commonwealth University CREATE � National Evaluation Institute Williamsburg, Virginia October 7, 2010 Teachers� Voices About the Effectiveness of Benchmark Testing

Teachers Voices About the Effectiveness of Benchmark Testing

Teachers Voices About the Effectiveness of Benchmark Testing

Presentation Transcript

VOICES of the Staff

Metrics for Measuring the Effectiveness of Software-Testing Tools

Measuring Educator Effectiveness for Music Teachers

Voices of the Revolution

Are the future teachers apt for teaching? Teachers’ Aptitude Testing

Voices of the World

NRMS Benchmark Testing

Voices of The Thunders

Voices of the Hungry

testing the effectiveness of digital storytelling

Institutional Effectiveness and the National Community College Benchmark Project

About the work of the Association of Ringing Teachers

Voices of the Poor

Teacher Effectiveness and the Equitable Distribution of Effective Teachers

Voices of the World

Testing the Effectiveness of Geo-Behavioural Profiling Systems

“Voices of a nation, about that nation”

War of the Voices

TEAN: Creating and maintaining the effectiveness of teachers in Scotland

Effectiveness of JMeter for Load Testing

Content Effectiveness Benchmark Report

Know about the Dosage and Effectiveness of Tramadol