New England Common Assessment Program

New England Common Assessment Program Item Review Committee Meeting March 30, 2005 Portsmouth, NH

Welcome and Introductions • Tim Kurtz – Director of Assessment New Hampshire Department of Education • Michael Hock – Director of Educational Assessment Vermont Department of Education • Mary Ann Snider – Director of Assessment & Accountability Rhode Island Department of Education • Tim Crockett – Assistant Vice President Measured Progress

Logistics • Meeting Agenda • Committee Member Expense Reimbursement Form • Substitute Reimbursement Form • NECAP Nondisclosure Form • Handouts of presentations

Morning Agenda Test Development: Past, Present & Future How we got here? – Tim Kurtz, NH DoE Statistical Analyses – Tim Crockett, MP Bias/Sensitivity – Michael Hock, VT DoE Depth of Knowledge – Ellen Hedlund, RI DoE Betsy Hyman, RI DoE 2005-2006 Schedule – Tim Kurtz, NH DoE So, what am I doing here?

Item Review Committee How did we get to where we are today? Tim Kurtz Director of Assessment New Hampshire Department of Education

NECAP Pilot Review 2004-05 • 1st Bias Committee meeting – March • 1st Item Review Committee meeting – April • 2nd Item Review Committee meeting – July • 2nd Bias Committee meeting – July • Face-to-Face meetings – August • Test Form Production and DOE Reviews – August

NECAP Pilot Review 2004-05 Reading and Mathematics • Printing and Distribution – September • Test Administration Workshops – October • Test Administration – October 25 – 29 • Scoring – December • Data Analysis & Item Statistics – January • Teacher Feedback Review – February • Has affected item review, accommodations, style guide and administration policies • Item Selection meetings – February & March

NECAP Pilot Review 2004-05 Writing • Printing and Distribution – December & January • Test Administration – January 24 - 28 • Scoring – March • Data Analysis & Item Statistics – April • Item Selection meetings – April & May

NECAP Pilot Review 2004-05 What data was generated from the pilot and what do we do with it? Tim Crockett Assistant Vice President Measured Progress

Item Statistics ●The review of data and items is a judgmental process ●Data provides clues about the item ●Difficulty ●Discrimination ●Differential Item Functioning

At the top of each page . . .

The Item and any Stimulus Material

Item Statistics Information

Item Difficulty(multiple-choice items) ●Percent of students with a correct response. Range is from .00 to 1.00 0.00 1.00 Difficult Easy ●NECAP needs a range of difficulty, but • below .30 may be too difficult • above .80 may be too easy

Item Difficulty(constructed-response items) • Average score on the item. • Range is from .00 to 2.00 or 0.00 to 4.00 On 2-point items • below 0.4 may be too difficult • above 1.6 may be too easy On 4-point items • below 0.8 may be too difficult • above 3.0 may be too easy

Item Discrimination ●How well an item separates higher performing students from lower performing students ●Range is from -1.00 to 1.00 ●The higher the discrimination the better ●Items with discriminations below .20 may not be effective and should be reviewed

Other Discrimination Information:(multiple-choice items)

Differential Item Functioning ●DIF (F-M) – females compared to males who performed the same on the test are compared on their performance on the item ● positive number reflects females scoring higher ● negative number reflects males scoring higher ● NS means no significant difference

Item Statistics Information

Dorans and Holland, 1993 • For CR items: – .20 or + .20 represents negligible DIF • >–.30 or + .30 represents low DIF • >–.40 or +.40 represents high DIF

Bias/Sensitivity Review How do we insure that this test works well for students from diverse backgrounds? Michael Hock Director of Educational Assessment Vermont Department of Education

What Is Item Bias? • Bias is the presence of some characteristic of an assessment item that results in the differential performance of two individuals of the same ability but from different student subgroups • Bias is not the same thing as stereotyping although we don’t want either in NECAP • We need to ensure that ALL students have an equal opportunity to demonstrate their knowledge and skills

How Do We Prevent Item Bias? • Item Development • Bias-Sensitivity Review • Item Review • Field-Testing Feedback • Pilot-Testing Data Analysis (DIF)

Role of the Bias-Sensitivity Review Committee The Bias-Sensitivity Review Committee DOES need to make recommendations concerning… • Sensitivity to different cultures, religions, ethnic and socio-economic groups, and disabilities • Balance of gender roles • Use of positive language, situations and images • In general, items and text that may elicit strong emotions in specific groups of students, and as a result, may prevent those groups of students from accurately demonstrating their skills and knowledge

Role of the Bias-Sensitivity Review Committee The Bias-Sensitivity Review Committee DOESNOT need to make recommendations concerning… • Reading Level • Grade Level Appropriateness • GLE Alignment • Instructional Relevance • Language Structure and Complexity • Accessibility • Overall Item Design

Passage Review Rating Form “This passage does not raise bias and/or sensitivity concerns that would interfere with the performance of a group of students”

Universal Design Improved Accessibility through Universal design

Universal Design Improved Accessibility through Universal design • Inclusive assessment population • Precisely defined constructs • Accessible, non-biased items • Amenable to accommodations • Simple, clear, and intuitive instructions and procedures • Maximum readability and comprehensibility • Maximum legibility

Item Complexity How do we control item complexity? Ellen Hedlund and Betsy Hyman Office of Assessment and Accountability Rhode Island Department of Elementary and Secondary Education

Depth of Knowledge A presentation adapted from Norman Webb for the NECAP Item Review Committee March 30, 2005

Bloom Taxonomy KnowledgeRecall of specifics and generalizations; of methods and processes; and of pattern, structure, or setting. Comprehension Knows what is being communicated and can use the material or idea without necessarily relating it. ApplicationsUse of abstractions in particular and concrete situations. AnalysisMake clear the relative hierarchy of ideas in a body of material or to make explicit the relations among the ideas or both. SynthesisAssemble parts into a whole. EvaluationJudgments about the value of material and methods used for particular purposes.

U.S. Department of Education GuidelinesDimensions important for judging the alignment between standards and assessments • Comprehensiveness: Does assessment reflect full range of standards? • Content and Performance Match: Does assessment measure what the standards state students should both know & be able to do? • Emphasis: Does assessment reflect same degree of emphasis on the different content standards as is reflected in the standards? • Depth: Does assessment reflect the cognitive demand &depth of the standards? Is assessment as cognitively demanding as standards? • Consistency with achievement standards: Does assessment provide results that reflect the meaning of the different levels of achievement standards? • Clarity for users: Is the alignment between the standards and assessments clear to all members of the school community?

Mathematical Complexity of ItemsNAEP 2005 Framework The demand on thinking the items requires: Low Complexity Relies heavily on the recall and recognition of previously learned concepts and principles. Moderate Complexity Involves more flexibility of thinking and choice among alternatives than do those in the low-complexity category. High Complexity Places heavy demands on students, who must engage in more abstract reasoning, planning, analysis, judgment, and creative thought.

Depth ofKnowledge (1997) Level 1 Recall Recall of a fact, information, or procedure. Level 2 Skill/Concept Use information or conceptual knowledge, two or more steps, etc. Level 3 StrategicThinking Requires reasoning, developing plan or a sequence of steps, some complexity, more than one possible answer. Level 4 ExtendedThinking Requires an investigation, time to think and process multiple conditions of the problem.

Practice Exercise • Read the passage, The End of the Storm • Read and assign a DOK to each of the 5 test questions • Form groups of 4-5 to discuss your work and reach consensus of a DOK for each test question

Issues in Assigning Depth-of-Knowledge Levels • Variation by grade level • Complexity vs. difficulty • Item type (MC, CR, ER) • Central performance in objective • Consensus process in training • Aggregation of DOK coding • Reliabilities

Web Sites http://facstaff.wcer.wisc.edu/normw/ Alignment Tool http://www.wcer.wisc.edu/WAT/index.aspx Survey of the Enacted Curriculum http://www.SECsurvey.org

NECAP Operational Test 2005-06 What is the development cycle for this year? What is your role in all this? Tim Kurtz Director of Assessment New Hampshire Department of Education

NECAP Operational Test 2005-06 • 1st Bias Committee meeting – March 8-9 18 teachers – 6 from each state • 1st Item Review Committee meeting – March 30 72 teachers – 12 from each state in each content area • 2nd Item Review Committee meeting – April 27-28 • Practice Test on DoE website – early May • 2nd Bias Committee meeting – May 3-4 • Face-to-Face meetings – May 25-27 & June 1-3 • Test Form Production and DOE Reviews – July

NECAP Operational Test 2005-06 • Printing – August • Test Administration Workshops – Aug & Sept • Shipments to schools – September 12-16 • Test Administration Window – October 3-21 204,000 students and 25,000 teachers from the 3 states • Scoring – November • Standard Setting – December Teachers and educators from the three states • Reports shipped to schools – Late January

TIRC – So, why are we here? This assessment has been designed to support a quality program in mathematics and English language arts. It has been grounded by the input of hundreds of NH, RI, and VT educators. Because we intend to release assessment items each year, the development process continues to depends on the experience and professional judgment and wisdom of classroom teachers from our three states.

TIRC – Our role. We have worked hard to get to this point. Today, you will be looking at passages in reading and some items in mathematics. The role of Measured Progress staff is to keep the work moving along productively. The role of DoE content specialists is to listen and ask clarifying questions as necessary.

TIRC – Your role? You are here today to represent your diverse contexts. We hope that you… • share your thoughts vigorously, and listen just as intensely – we have different expertise and we can learn from each other, • use the pronouns “we” and “us” rather than “they” and “them” – we are all working together to make this the best assessment possible, and • grow from this experience – I know we will. And we hope that today will be the beginning of some new interstate friendships.

Information, Questions and Comments • Tim Kurtz Director of Assessment NH Department of Education TKurtz@ed.state.nh.us (603) 271-3846 • Mary Ann Snider Director of Assessment and Accountability Rhode Island Department of Elementary and Secondary Education MaryAnn.Snider@ride.ri.gov (401) 222-4600 ext. 2100 • Michael Hock Director of Educational Assessment Vermont Department of Education MichaelHock@education.state.vt.us (802) 828-3115

New England Common Assessment Program