1 / 26

Purpose of High Stake Tests is to Determine Ability Estimates Unidimensional Tests (IRT Models) are Designed for Policy

NCES National Forum Stats-WDC 2009 NESAC Presentation July 28, 2009 Al Larson, Ph.D., LEA Connecticut Longitudinal Data System and Effective Use of Data at the LEA Level.

arleen
Download Presentation

Purpose of High Stake Tests is to Determine Ability Estimates Unidimensional Tests (IRT Models) are Designed for Policy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCES National ForumStats-WDC 2009NESAC Presentation July 28, 2009Al Larson, Ph.D., LEA ConnecticutLongitudinal Data System and Effective Use of Data at the LEA Level

  2. Purpose of High Stake Tests is to Determine Ability EstimatesUnidimensional Tests (IRT Models) are Designed for Policy MakersSummative Assessments High (400) Low (100) Vertical Scale Scores measure “growth” (not proficiency levels)

  3. To be more diagnostic, state’s report strands Math Strand 1: Place Value (about 4 types of questions) Math Strand 8: Computation with fractions and integers Need a finer “grain size” (NAEP and ETS)

  4. Reading: Critical Thinking Reading Strands (Based on NAEP Contexts) Strand A: Forming a General Understanding     Strand B: Developing Interpretation     Strand C: Making Reader/Text Connections     Strand D: Examining the Content and Structure Strong use of inference … no literal questions Connecticut: 60% to 65% p-value • NAEP study by Zwick, 1987; and • Strand raw score reported to teachers and administrators (without basic assessment literacy) Need a finer “grain size”

  5. LDS at LEA Over 300 data files linked by LEA ID number SPSS Programming Merge by ID if …then … do if … compute …sort … print … report

  6. Red = bad Green = good

  7. For this student, our assessments have failed to tell us “what to do next” in a timely manner. We need to design tests for a new purpose: to help teachers and students identify cognitive errors to help guide current instruction (formative assessment).

  8. LEA Assessment Purpose (finer grain size): Develop a district-wide assessment system that is meaningful to teachers in Math and Reading for grades 2-9 (3 to 4 or 5 administrations per year) • Similar to Cognitive Diagnostic Assessments, multiple-choice items were constructed by designing foils/distracators that mimic typical student cognitive processing errors (teacher input: add constructed response) • Utilize the error vocabulary of each domain to report error descriptions teachers understand The transition from report data to instruction, is more difficult in reading than math …

  9. The Cognitive Task MC Cognitive Task: evaluation of the differences between foils A 1/6 B 2/4 C 2/6 D 3/4 E 1/8 ?

  10. The Cognitive Task and Error Description in Math MC Cognitive Task: evaluation of the differences between foils A 1/6 B 2/4 C 2/6 D 3/4 E 1/8 ? The “Effective Use of Data” is the Error Description: “adding both numerator and denominator” or “did not complete last step of multi-step problem” (Diagnostic Reports within 3 to 5 days) What is meaningful to teachers: a finer “grain size” in a timely manner

  11. A Finer Grain Size: Sample of other Cognitive Error Descriptions in Math About 4 times a year x 4 foils x 45 items x 7 grades = 5,040 error descriptions #8 = ' ' answer8='absent or left blank' . /* strand 3 'Equivalent Fract, Decimal & Percents'. #8 = ‘A’ answer8='error: word prob; chose 1:6 vs. 1:3'. #8 = ‘B’ answer8='error: word prob; reversed ratio, chose 3:1 vs 1:3'. #8 = ‘C’ answer8='correct: word prob; 3:9 is the same as 1/3'. #8 = ‘D’ answer8='error: word prob; chose 3:12 (1:4) ratio vs. 1:3'. #8 = ‘E’ answer8='error: word prob; chose 2:3 ratio vs. 1:3'. #9 = ' ' answer9='absent or left blank' . /* strand 4 'Order, Magnitude, and Rounding of Numbers'. #9 = ‘A’ answer9='error: ordering from table: selected 3rd place' . #9 = ‘B’ answer9='error: ordering from table: selected 1st place' . #9 = ‘C’ answer9='error: ordering from table: selected 2nd place' . #9 = ‘D’ answer9='correct: ordering from table: found 4 th place ordering from G to L' . #9 = ‘E’ answer9='error: ordering from table: selected 5th place' . #10 = ' ' answer10='absent or left blank' . /* strand 5 'Models for Operations (one item)' . #10 = ‘A’ answer10='correct: word prob; chose correct number sentence ((6+4)X$5.00) for situation' . #10 = ‘B’ answer10='error: word prob; divided instead of multiplying' . #10 = ‘C’ answer10='error: word prob; divided instead of multiplying' . #10 = ‘D’ answer10='error: word prob; added all data vs. adding 6 & 4, then multiplying' . #10 = ‘E’ answer10='error: word prob; subtracted instead of multiplying' .

  12. Reading is Different from Math • You can “see” math errors … but not reading inferential thinking errors • Math teachers are trained in an error vocabulary that is aligned with foil misconceptions … reading literature emphasize strategies, not errors • Math methods and materials (text books)are very similar in both format and rigor to state and federal high stakes tests …

  13. Disconnect: Testing vs. Teacher Training/Experience High Stakes Reading Tests are Inferential • Multiple Choice Items are difficult (p-values .3 to .7) Teacher Experience with Purchased Materials • too few multiple-choice and too easy (p-values .7 to .9) • too literal • current teaching methods emphasize constructed-response items that are often too accepting, and without a rubric • no assessment literacy Teacher Training and Vocabulary • Metacognition and fix-up strategies: look back to clarify, predict, author's purpose, main idea, activate background knowledge, etc. (Based upon Literature and NAEP contexts)

  14. Cognitive Model of Task Performance for Reading Comprehension Multiple-choice Items High Scoring versus Low Scoring Students Motivated and will spend time and effort to: • Be metacognitively aware; • “look back” to clarify/re-read; and • Evaluate differences between foils • Reading comprehension of the passage • Reading comprehension of the item stem • Reading comprehension of each foil • Sometimes just the nuance of one word in a foil (SAT, ACT, GRE, etc.) (identified error: “X”) (identified error: “T”) (identified error: “R”)

  15. Reading Errors that are Meaningful to Teachers Each EIa* foil is coded (a finer grain size): * Presented at 2009 AERA Convention under name of Error Identification assessments (EIa)

  16. Error Identification assessments (EIa)Text Matching and No-Support Foils

  17. Error Identification assessments (EIa)The Carefully Crafted … Related Foil: (SAT, ACT, GRE …)requires students to evaluate subtle differences between foils (critical thinking) The related foil, as a constructed-response, would be an acceptable summative answer

  18. Error Identification assessments (EIa)Sample Summary Error Identification Report to a Teacher Summative Levels … and … Formative Diagnostics (Students need to explain their reasoning)

  19. Error Identification assessments (EIa)The Assessment is Consumed for Instruction EIa test items and foils are used as instructional aids Lesson plans are in development Teachers conference with students: • an “internal view” with “retroactive verbal reports” (Leighton & Gierl; Norris; Gorin; 2007); “think alouds” (Davey, 1983); • Help students get involved in their own learning by making their thinking visible to themselves, peers and teachers; (Black & Wiliam, et al, 1998; Stiggins, et al, 2004, 2006); and • For the teaching of critical thinking (inference) and understanding of ideas in the text [foils] (Wells, 2000; Block, Gambrell & Pressley, 2002).

  20. Error Identification assessments (EIa)Teacher Opinion of EIa: • I use them to see what they are thinking … which errors seem to be used consistently - what they are “tricked on”. I turn it into a game-type activity: students vs. teacher(reading teacher, grades 2-5). • If students are aware of the errors they make, determined students will change their behavior both in reading and testing(classroom teacher, grade 5). • They now don’t grab the first answer that they connect to, they take more time to evaluate and critique each choice(reading teacher, unknown grade). • No support errors identifies students who clearly can't read on grade level or aren't taking it seriously(classroom teacher, grade 4).

  21. Error Identification assessments (EIa)Teacher Opinion of EIa: • I meet with students in small groups … Keeping the test booklets and handing them back to students is helpful indiscussing why they chose their answersand itmakes them accountablefor their choices (classroom teacher, grade 4). • Understanding their misconceptions is one thing – getting them to change the misconception is the difficult part (classroom teacher, grade 7). • As we are working on it, [EIa post-conferencing] often students will “get” it when it is a clear error. The related [versus the] correct answer does not come as easily during the explanation (classroom teacher, grade 4). (Related is an important foil for “ability estimates” or critical thinking, some student’s will need more reading experience and scaffolding)

  22. Error Identification assessments (EIa) Summary: EIa with Teachers for Instruction With a combination of: EIa reports, items and teacher conferencing with students,teachers diagnose student misconceptions and provide scaffolding during repeated critical thinking activities. (Instructional Utilization of Diagnostic Data) Teacher EIa Student

  23. Error Identification assessments (EIa)Reliability, Validity and Prediction

  24. Effective Use of Data: Matched Average Vertical Scale Score Growth in Reading Across Grades 4 to 8; F=16.6 P< .000 2008-2009 low high Instructional Utilization of Diagnostic Data

  25. Effective Use of Data: Matched Average Vertical Scale Score Growth in Reading 2008-2009 Gr 4 and 5 Gr 6 to 8 low high Instructional Unitization of Diagnostic Data

More Related