Assessment Lessons from the National Research Council and the Promise of Formative Assessment

Assessment Lessons from the National Research Council and the Promise of Formative Assessment Michael C. Rodriguez Quantitative Methods in Education NAAD 2019 Educational Psychology

“The principal function of measurement is to contribute directly or indirectly to the effectiveness of teaching and learning.” from Tests and Measurements for Teachers Tiegs, 1931

Measurement and assessment has been coopted by accountability systems. The work of educators is not about the tests. It’s about teaching and learning. … a few thoughts

Standardized assessments provide a narrow glimpse of learning. • Globally, curriculum doesn’t appear to matter - Instructional approaches matter. • Local assessment is more informative – when tied to instructional practices and designed to reflect what students know and can do – we can strive to connect local objectives and assessments to state standards. … thoughts

By simply increasing attention to our decision-making process, there is no guarantee that new ways of solving the problem will miraculously come into awareness. • Because we do DDDM, doesn’t automatically translate into improved outcomes. • No amount of increased effort can compensate for limited knowledge and skill about how to solve problems that require special training. …thoughts

Using learning and motivation theories to coherently link formative assessment, grading practices, and large-scale assessment Shepard, L.A., Penuel, W.R., & Pellegrino, J.W. Educational Measurement: Issues and Practice Spring 2018 Moving Forward

All learning is fundamentally social, involving the student’s use of shared language, tools, norms and practices in interaction with their social context. • One’s cognitive development and social identity are jointly constituted through participation in multiple social worlds of family, community, and school. Sociocultural Learning Theory

Compared to other theories, sociocultural theory offers a more powerful, integrative account of how motivational aspects of learning—such as self-regulation, self-efficacy, sense of belonging, and identity—are completely entwined with cognitive development. • To support equitable and ambitious teaching practices, classroom assessment design must be grounded in a research-based theory of learning. Shepard, Penuel, & Pellegrino

Instead of “scores,” which offer teachers little information about what to do next, it is much more important that formative assessment questions, tasks, and activities provide instructional “insights” about student thinking and about what productive next steps might be taken. Shepard, Penuel, & Pellegrino

Given what we know about how interim assessments are constructed, it is not surprising that identifying standards not-yet-mastered does not give teachers access to student thinking. • Typical interim tests offer students an impoverished vision of intended learning goals. Shepard, Penuel, & Pellegrino

formative assessment activities to surface student thinking and further learning within instructional units • summative unit assessments used for grading that explicitly address transfer and extensions from previous instructional activities • district-level assessments designed in parallel to unit summative measures but with particular attention to program-level evaluation Goals to Strive For (Shepard, Penuel, & Pellegrino)

Formative assessment is a special kind of test or series of tests that teachers learn to use to find out what their students know. • Formative assessment is a program that teachers adopt and add to what they already do. • Any practice that gathers information for the purpose of improving programs or improving teaching is a part of formative assessment. Advancing Formative Assessment in Every Classroom by Connie M. Moss and Susan M. Brookhart Misconceptions about Formative Uses

Feedback is not advice, praise, or evaluation. Feedback is information about how we are doing in our efforts to reach a goal Educational Leadership, Sept. 2012

If students know the classroom is a safe place to make mistakes, they are more likely to use feedback for learning. • The feedback students give teachers can be more powerful than the feedback teachers give students. • When we give a grade as part of our feedback, students routinely read only as far as the grade.

Effective feedback occurs during the learning, while there is still time to act on it. • Most of the feedback that students receive about their classroom work is from other students—and much of that feedback is wrong. • Students need to know their learning target—the specific skill they’re supposed to learn—or else “feedback” is just someone telling them what to do.

Common Notions of Equity

http://culturalorganizing.org Rethinking Equity

Phelps, R.P. (2012). The effect of testing on student achievement, 1910-2010. International Journal of Testing, 12, 21-43.

Reviewed several hundred studies published between 1910 and 2010. • The pure testing effect is an increase in achievement that occurs simply because students take a test instead of spending the same amount of time some other way, such as studying. • Taking a test, responding to questions, generating responses – has a more durable effect on memory and understanding than listening/reading. • 177 empirical studies with 640 effects

Quantitative results: d = 0.55 to 0.88 • Testing with feedback produced strongest positive effects • Adding stakes and testing with greater frequency increased positive effects

A Report of the National Research Council Lessons Learned about Testing Lessons Learned about Testing

In many situations, standardized tests provide the most objective way to compare the performance of a large group of examinees across places and times. • A test score is an estimate rather than an exact measure of what a person knows and can do. • High-stakes decisions about individuals should not be made on the basis of a single test score. • Tests should not be used for high-stakes decisions if test takers have not had an opportunity to learn the material on which they will be tested. Lessons re: Uses

The people who design and mandate tests must be constantly vigilant about equity concerns, including opportunity to learn, cultural bias, or adverse impact. • In the absence of effective services for low-performing students, better tests will not lead to better educational outcomes. • Test results may be invalidated by teaching narrowly to a particular test. Lessons re: Consequences

Test developers and policy makers should clearly explain to the public the purpose for a test and the meaning of different levels of test performance. • When test results are reported to students, teachers, and the public, the limitations of the test should be explained clearly to a lay audience. Lessons re: Public Understanding

Board of Testing and Assessment A test score is an estimate rather than an exact measure of what a person knows and can do. The items on any test are a sample from some larger universe of knowledge and skills, and scores for individual students are affected by the particular questions included. National Academies (2009)

A student may have done better or worse on a different sample of questions. In addition, guessing, motivation, momentary distractions, and other factors introduce uncertainty into individual scores. National Academies (2009)

We observe a score, and due to sampling of items/time/occasions/settings/etc., we know an individual’s score will vary on average by the SEM, simply as a function of measurement error. Interpreting Individual Scores

2016-2017 Technical Manual for Minnesota’s Title I and Title III Assessments Understanding Measurement Error When interpreting test scores, it is important to remember that test scores always contain some amount of measurement error. That is, test scores are not infallible measures of student characteristics… measurement error must always be considered when making score interpretations. (p. 71)

Because measurement error tends to behave in a fairly random fashion, when aggregating over students, these errors in the measurement of students tend to cancel out. (p. 71) 2016-2017 Technical Manual

2016-2017 Technical Manual Using Objective/Strand-Level Information Strand or substrand level information can be useful as a preliminary survey to help identify skill areas in which further diagnosis is warranted. The standard error of measurement associated with these generally brief scales makes drawing inferences from them at the individual level very suspect; more confidence in inferences is gained when analyzing group averages. (p. 72)

2016-2017 Technical Manual When considering data at the strand or substrand level, the error of measurement increases because the number of possible items is small. In order to provide comprehensive diagnostic data for each strand or substrand, the test would have to be prohibitively lengthened. (p. 72)

MCA for Individual Interpretation 2016-17 Yearbook Tables for Minnesota’s Title I and Title III Assessments Example: Grade 3 Reading Score Distributions, p. 96

2017 MCA-III Summary StatisticsGrade 3 Reading (p. 125) 2016-2017 Technical Manual

2017 MCA-III Subscale CorrelationsGrade 3 Reading (p. 158) 2016-2017 Technical Manual

We know that a correlation can be no larger than the score reliabilities:

Correlations can be corrected (disattenuated) for measurement error. To do so, we manipulate the above relation and divide the correlation by the corresponding reliabilities:

3rd Grade Mathematics Subscale Corrected Correlations 2017 MCAs

6th Grade Mathematics Subscale Corrected Correlations 2017 MCAs

8th Grade Mathematics Subscale Corrected Correlations 2017 MCAs

2017 MCA Reading Subscale Corrected Correlations: Literature & Information

Interim assessments …school leaders want to use them.

Tests that mimic the structure of large-scale, high-stakes, summative tests, which lightly sample broad domains of content taught over an extended period of time, are unlikely to provide the kind of fine-grained, diagnostic information that teachers need to guide their day-to-day instructional decisions. National Academies 2009

…BOTA urges the Department to clarify that assessments that simply reproduce the formats of large-scale, highstakes, summative tests are not sufficient for instructional improvement systems. National Academies 2009

Test score information that more appropriately represents the challenges facing MN teachers and schools

Student Score Distributions Vary by Race Consider the role of variability Grade 3 Reading

Phelps, R.P. (2009). Educational achievement testing: Critiques and rebuttals. In R.P. Phelps (Ed.), Correcting fallacies about educational and psychological testing (pp. 89-146). Washington, DC: American Psychological Association.

Assessment Lessons from the National Research Council and the Promise of Formative Assessment

Assessment Lessons from the National Research Council and the Promise of Formative Assessment

Presentation Transcript

Formative Assessment

Formative Assessment

FORMATIVE ASSESSMENT

FORMATIVE ASSESSMENT

Formative Assessment

Formative Assessment

Formative Assessment

Formative Assessment

Learning from Formative Assessment

Formative assessment

Formative Assessment

Formative Assessment Lessons

Formative Assessment

Formative Assessment

The Power of Formative Assessment

Formative Assessment

Formative Assessment

Formative Assessment

Formative Assessment

Research perspectives and formative assessment

Formative Assessment