National Council on Measurement in Education 2019 Annual Meeting April 6, 2019 Toronto, Canada

Strengthening Claims-based Interpretations and Uses of Local and Large-scale Science Assessment Scores(SCILLSS) National Council on Measurement in Education 2019 Annual Meeting April 6, 2019 Toronto, Canada

Welcome and Introduction Liz Summers, Ph.D., Executive Vice President, edCount, LLC

AboutSCILLSS • One of two projects funded by the US Department of Education’s Enhanced Assessment Instruments Grant Program (EAG), announced in December, 2016 • Collaborative partnership including three states, four organizations, and 10 expert panel members • Nebraska is the grantee and lead state; Montana and Wyoming are partner states • Four year timeline (April 2017 – December 2020)

SCILLSS Project Goals • Create a science assessment design model that establishes alignment with three-dimensional standards by eliciting common construct definitions that drive curriculum, instruction, and assessment • Strengthen a shared knowledge base among instruction and assessment stakeholders for using principled-design approaches to create and evaluate science assessments that generate meaningful and useful scores • Establish a means for state and local educators to connect statewide assessment results with local assessments and instructionin a coherent, standards-based system

SCILLSS Partner States, Organizations, and Staff

Ensuring Rigor in State and Local Assessment Systems: A Self-Evaluation Protocol Andrew Wiley, Ph.D., Assessment Design & Evaluation Expert, ACS Ventures, LLC

Self-evaluation Protocols Purposes The local and state self-evaluation protocols are designed to provide frameworks for how local schools or districts can consider how to best implement their local assessment program, and how states can evaluate their options for their statewide assessment program.

Self-evaluation Protocols Goals and objectives • Identify intended use(s) of test scores • Foster an internal dialogue to clarify the goals and intended uses of an assessment program • Evaluate whether given test interpretations are appropriate • Identify gaps in assessment programs • Identify overlap across multiple assessment programs

Self-evaluation Protocol Structure 4 Steps

Gather and Evaluate Validity Evidence FOUR FUNDAMENTAL VALIDITY QUESTIONS • To what extent does the assessment as designed capture the knowledge and skills defined in the target domain? • To what extent does the assessment as implemented yield scores that are comparable across students, sites, time, forms? • To what extent are students able to demonstrate what they know and can do in relation to the target knowledge and skills on the test in a manner that can be recognized and accurately scored? • To what extent does the test yield information that can be and is used appropriately within a system to achieve specific goals? Construct Coherence Comparability Accessibility and Fairness Consequences

Self-Evaluation Protocol, Steps One and Two:Identifying Purposes and Assessments Used to Serve those Purposes Evaluate science curricula ISTEP+: science in grades 4, 6, and 10 Monitor learning and guide instruction in math MAP Growth in grades K-10 Monitor reading development Edmentum Reading Eggs

MAP Growth in grades K-10 All* students in grades K-10 *Except some students with disabilities and English learners?  Four times annually: September, December, February, April

MAP Growth in grades K-10

Ellen Forte, Ph.D., CEO & Chief Scientist, edCount, LLC The SCILLSS Digital Workbook on Educational Assessment Design and Evaluation

Digital Workbook Purpose • Inform state and local educators and other stakeholders on the purposes of assessments; • Ensure a common understanding of the purposes and uses of assessment scores, and how those purposes and uses guide decisions about test design and evaluation; • Complement the needs assessment by providing background information and resources for educators to grow their knowledge about foundational assessment topics; and • Address construct coherence, comparability, accessibility and fairness, and consequences.

Digital Workbook Audience • State and district administrators who may be— • Instructional leaders • Content specialists • Assessment specialists • Decision-makers regarding state or local assessments • Responsible for implementing State and Local Self-Evaluation Protocols • These educational leaders will strengthen their assessment literacy by building their knowledge base, understanding the nuances of validity and reliability, and applying their knowledge in the evaluation of their own systems.

AssessmentLiteracy • Being assessment literate means that one understands key principles about how tests are designed, developed, administered, scored, analyzed, and reported upon in ways that yield meaningful and useful scores. • An assessment literate person can accurately interpret assessment scores and use them appropriately for making decisions.

Digital Workbook Series OrganizingPrinciples Validity Questions Phases of the Test Life Cycle Construct Coherence Validity Comparability Fairness & Accessibility Consequences

Digital Workbook Series OrganizingPrinciples Validity Questions To what extent do the test scores reflect the knowledge and skills we’re intending to measure, for example, those defined in the academic content standards? Construct Coherence Comparability To what extent are the test scores reliable and consistent in meaning across all forms, students, test sites, and time? Fairness & Accessibility To what extent does the test allow all students to demonstrate what they know and can do? Consequences To what extent are the test scores used appropriately to achieve specific goals?

Digital Workbook Series Chapters

Examples of Construct Coherence Questions • What are you intending to measure with this test? (What are the measurement targets?) • How was the assessment developed to measure these measurement targets? • How were items reviewed and evaluated during the development process to ensure they appropriately address the intended measurement targets and not other content, skills, or irrelevant student characteristics? • How are items scored in ways that allow students to demonstrate, and scorers to recognize and evaluate, their knowledge and skills? How are the scoring processes evaluated to ensure they accurately capture and assign value to students’ responses? • How are scores for individual items combined to yield a total test score? What evidence supports the meaning of this total score in relation to the measurement target(s)? How do items contribute to subscores and what evidence supports the meaning of these subscores? • What independent evidence supports the alignment of the assessment items and forms to the measurement targets? • How are scores reported in relation to the measurement targets? Do the reports provide adequate guidance for interpreting and using the scores?

Examples of Comparability Questions • How is the assessment designed to yield consistent, reliable scores? What evidence supports score reliability? • How is the assessment designed to support comparability of scores across forms? What evidence supports such comparability? • How is the assessment designed to support comparability of scores across time? What evidence supports such comparability? • How is the assessment designed to support comparability of scores across administration sites? What evidence supports such comparability? • How is the assessment administered to protect against various types of cheating so that the scores reflect students’ knowledge and skills and not inappropriate access to testing materials? What evidence supports the implementation of these safeguards and security protocols? • How is the assessment scored such that scores reflect students’ knowledge and skills and not inaccuracies or inconsistencies in the scoring process? What evidence supports implementation of these scoring procedures? • How are scores reported in ways that appropriately support or disrupt comparability in score interpretation across time, administration sites, or variations in student characteristics?

Examples of Fairness & Accessibility Questions • How were the assessment questions developed to ensure that scores do not reflect student characteristics that are irrelevant to the measurement targets? • How were the needs of students with disabilities addressed during assessment development? What evidence supports these efforts and their effectiveness? • How were the needs of English learners addressed during assessment development? What evidence supports these efforts and their effectiveness? • How are students with disabilities able to demonstrate their knowledge and skills through the availability and use of any necessary accommodations? What evidence supports the identification and use of these accommodations at the time of testing? • How are English learners able to demonstrate their knowledge and skills through the availability and use of any necessary accommodations? What evidence supports the identification and use of these accommodations at the time of testing? • How are students’ responses scored in ways that reflect only the construct-relevant aspects of those responses? What evidence supports the minimization of construct-irrelevant influences on students’ responses? • How are assessment scores interpreted in relation to knowledge and skills that test takers have had an opportunity to learn or are preparing to learn? What evidence supports the interpretation of students’ scores in relation to their learning opportunities?

Examples of Consequences Questions • How is the assessment developed, administered, scored, and reported in ways that deter and limit instances of cheating by students or others associated with the assessment or its stakes? What evidence supports the implementation and effectiveness of these efforts? • How are the scores from the assessment intended to be used as described by the test developers and how are they used by your state? How well do these uses align? • If your state is using test scores for purposes other than those for which the test developers intended, what evidence supports those uses? • If assessment scores are associated with recommendations for instruction or other interventions for individual students, what evidence supports such interpretations and uses of these scores? What tools and resources are available to educators for evaluating and implementing these recommendations? • If assessment scores are associated with recommendations for whole-class or group instruction, what evidence supports such interpretations and uses of these scores? What tools and resources are available to educators for evaluating and implementing these recommendations? • If assessment scores are associated with high stakes decisions for teachers, administrators, schools, or other entities or individuals, what evidence supports such interpretations and uses of these scores? • How are scores reported to students and parents in ways that support their understanding of the scores and any associated recommendations or decisions?

Rhonda True, M.A., EAG Grant Coordinator, Nebraska Department of Education The Benefits, Challenges, and Lessons Learned from Using the SCILLSS Resources

Development Timeline Field TestSpring 2020 OperationalSpring 2021 PilotSpring 2019 Pilot DevelopmentSummer 2018 Field Test DevelopmentSummer 2019 Future Field Test DevelopmentSummer 2020 VisioningNov. 2017 AdoptionSept. 2017 Task ForceJan. 2018

Tools & Processes Developed by SCILLSS Project Tools & Processes Developed by theSCILSSS Project Support Science System

Coherence Connected

Theory of Action Theory of Action Theory of Action Educators are qualified/ credentialed, effective leaders that critically use CIA products, processes, and data to support student college and career readiness. ALL students integrate science learning based on the 3 dimensional science standards to ensure application and transfer of knowledge and skills in interdisciplinary ways. The assessment system contributes to shifting the underlying skill set of the state’s workforce to draw new business via NE students. Every student upon completion of secondary education is prepared for postsecondary education, career, and civic opportunities and possesses technological and digital readiness. Curriculum, instruction, and assessments designed for NE-CCRS are implemented systemically and systematically. Outcomes Outcomes

Assessment System Components * *Common Thread: Professional learning for educators* Statewide Summative (5,8,11**) Monitoring Tasks (3,4,6,7,9,10) Curriculum Embedded Tasks (K-12) Task Library (K-12) **ACT for 3rd year cohort

Consistency

Features to Maintain Consistency Grade appropriate 3D targets Phenomena and Problem- focused Require reasoning with evidence Engage diverse sense- making Demonstrate science understanding by doing science

Figuring ut Learning about

Goals for Student Learning Goals for Student Learning Exploring unknown/novel situations Flexible & logical thinkers Transfer of understanding Science learning in all grades Integrated application of knowledge & skill Cross- content integration Reasoning and evidence in all content areas Problem solving & critical thinking Learning science by doing science Grade-Appropriate, Progressive Three-Dimensional Learning

Nebraska’s Claims Overall Claim Students can demonstrate the scientific literacy necessary to be civic minded decision makers and demonstrate readiness for college, career, and lifelong learning through application of science and engineering practices and crosscutting concepts within and among the disciplines of science. Critical Consumers of Information Students can gather, analyze, and communicate information from multiple sources to use as evidence to make sense of familiar and unfamiliar phenomena and problems. Interconnectedness of Science Students can make connections between disciplinary core ideas within the physical science, life science, and Earth and Space sciences domains, across multiple science domains, and across multiple content areas (such as mathematics and English language arts) to make sense of familiar and unfamiliar phenomena and problems.

Ensuring Rigor in State Assessment Systems: A Self-evaluation Protocol

Purpose • To support state departments of education in evaluating each of their assessments as well as their overall assessment system.

Protocol is Designed To • Provide a framework for educators at state level to use in any evaluation of aspects of their state assessment system. • Focus on assessments that are state-mandated or on support programs that are supplied by the state • Educators at a state level can use and modify the protocol as needed. • SCILLSS Digital Workbook is designed as a resource to support the protocol.

What are your intended uses of assessment scores? For what purpose will they be used? Self-Evaluation Protocol Step 1: Articulate your current and planned needs for assessment scores

Self-Evaluation Protocol Step 2: Identification of the complete array of assessments you use to address specific needs. Help identify overlap as well as areas with gaps • Identify all current and planned assessments

Self-Evaluation Protocol Self-Evaluation Protocol Step 3: Data and evidence that is available to support the interpretation and use of the assessment scores for their intended purposes. • Gather and evaluate evidence for each assessment

Evidence For Construct Coherence • Does the assessment have evidence for construct coherence with your overall standards? • Has the assessment been designed in such a way to ensure that the content of the assessment is consistent with your state standards and the curriculum in the classroom? • In other words, to what extent does the assessment as designed capture the knowledge and skills defined in the target domain?

Evidence For Comparability and Reliability • Are the test scores comparable, or are the test scores reliable and consistent in meaning across all students, classes, and schools? • Is there evidence to support the concept that the test scores mean the same thing for all students, regardless of which year the student takes the test or the exact test form that is taken? • Is there evidence that includes reliability estimates, including documentation for how the estimates were determined and if the estimates are applicable across students that take the assessment?

Evidence For Fairness and Accessibility • Are the tests accessible and fair for all students? • Has the test publisher provided evidence that all students can complete the assessment and fully understand the concepts being assessed? • To what extent are students able to demonstrate what they know and understand in your state and within your current curriculum?

National Council on Measurement in Education 2019 Annual Meeting April 6, 2019 Toronto, Canada

National Council on Measurement in Education 2019 Annual Meeting April 6, 2019 Toronto, Canada

Presentation Transcript

2019 Annual Outage Safety Meeting

Annual Referee Meeting August 2019

Dylan Wiliam , NCME 2019 April: Toronto, Canada @dylanwiliam

April 2019

April 2019

Year 6 Meeting 2019

NATIONAL CONFERENCE ON AGRICULTURE ( KHARIF CAMPAIGN-2019) 25-26 April, 2019

PSC Annual Business Meeting 2019