1 / 46

Three part colloquium series:

Can Large-Scale Tests be Fair to All Students? Bias Issues Related to WASL . Catherine S. TaylorUniversity of WashingtonNovember 2, 2006 . Background:. 10 years experience in test development (1981 1991) prior to coming to the University of WashingtonMoved to the University of Washington in 19

ayala
Download Presentation

Three part colloquium series:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Three part colloquium series: Can Large Scale Tests be Fair to All Students: Research on Bias Issues for WASL (November 2) WASL History and Early Research: Everything You Needed to Know About WASL but Didn’t Think to Ask (December 1) Classroom-Based Assessments and State Standards: Implementing Alternatives to Standardized Tests (December 11)

    2. Can Large-Scale Tests be Fair to All Students? Bias Issues Related to WASL Catherine S. Taylor University of Washington November 2, 2006

    3. Background: 10 years experience in test development (1981 – 1991) prior to coming to the University of Washington Moved to the University of Washington in 1991 (School Reform Law passed in 1993) Principal Investigator for R&D Grant (1994 - 1995): to support development of prototype assessments of the Essential Academic Learning Requirements (EALRs) Washington State Technical Advisory Committee for Assessment (1995-1999) Principal Investigator for WASL Validity Research Grant (2000-2004): to investigate validity of WASL scores

    4. The focuses of my research: How to prepare teachers for effective classroom-based assessments Validity theory Validity and large scale testing policy Threats to the validity of large scale tests

    5. Focuses of this presentation: Study of Bias and Sensitivity Review procedures used for WASL (2004) Report of input from two Public Forums on Bias and Sensitivity (2004) Yakima Seattle Studies of ‘Differential Item Functioning’ (AKA statistical bias) in WASL test items (1997-2001)

    6. What is an Item? An item is a question or set of directions (prompt) Multiple-choice item: A question or prompt 3-4 answer choices, only one of which is correct Performance item: A question or prompt Space in which students construct an answer A rule for assigning points to students’ answers WASL performance items: short answer (0-2 points) Extended response (0-4 points)

    7. WASL items are developed using state of the art procedures: Test Specifications: define how many and what types of items will be on a test Item Specifications: define exactly what kinds of items will assess each Grade Level Expectation (GLE) Item writing: overseen by skilled test developers Item reviews: check for match to GLEs by teachers Bias and sensitivity reviews: by individuals who represent the diversity of WA State students

    8. WASL test items are ‘tested’ using state of the art procedures: Item pilots: items are randomly assigned to students throughout WA State Item data reviews: based on students’ performances Statistical difficulty: Is the item easy or difficult because of content tested NOT some flaw in the item? Statistical validity: Do high performing students do better on the item than low performing students? Statistical bias: Is item performance related to level of knowledge and skill NOT group membership?

    9. Study 1: Bias & Sensitivity Reviews Committee members represent diversity in the student population (regions, ethnicity, gender, socio-economic status, religion, special population issues) Members review reading passages and items for: Implied or overt stereotyping or negative representations of any group Too much or too little representation of any group Terms that may be confusing to students based on language, region, culture, socio-economic status, etc. Controversial issues and topics that may affect some groups more than others

    10. Procedures Used to Observe Bias & Sensitivity Reviews: Participant-observer Recorded panelists comments during review process Cross checked records with facilitator notes Looked for patterns in notes/records in relation to reading passages and items

    11. Results of Bias and Sensitivity Review Observations: Few passages or test items are identified as problematic Reading passages present the greatest potential for bias Sources of bias in reading passages are subtle

    12. Reading passages present the greatest potential for bias: WASL includes: narrative and informative passages passages with social studies, science, and literary content WASL reading passages are from published sources Authors resist changes to their published writing (even when changes lessen bias/stereotyping)

    13. Sources of bias in reading passages are subtle: Alterations of original narratives: Use of legends and folk tales may be altered to fit Western notions of literature Language changes can change meaning (first feast vs. barbeque) “Othering”: Biographies may focus on how individuals overcame or coped with their minority status (Jackie Robinson; Helen Keller) Informational passages about cultural groups may have a patronizing tone (i.e., aren’t “their” ways cute) Interpretations: Items may focus on interpretations that are unique to middle class values rather than values of the culture of origin

    14. Study 2: Bias & Sensitivity Forums Two community forums (Yakima and Seattle) Community members came together to discuss concerns about WASL Participants included: Teachers and school administrators Tribal elders Latino community leaders Parents and community members

    15. Procedures used to Gather Data during Bias & Sensitivity Forums Did mock bias & sensitivity review Presented methods used for statistical “bias” analysis (also called differential item functioning (DIF)) Showed items flagged for DIF and asked for likely causes Small group discussion with reports to larger group Recorded participant ideas about bias issues in WASL Examined written notes and chart paper for themes

    16. Themes in Participant Comments Need for involvement of minority teachers in all stages of WASL development work Need for sensitivity to cultural values in selection of reading passages, item content, and the types of questions (particularly in reading) Need for inclusion of tribal elders in selection of text and contexts for WASL items Need for inclusion of individuals with cultural expertise in bias/sensitivity review panels

    17. Study 3: Differential Item Functioning (DIF) Analyses Typical Steps in a DIF Analysis: Identify groups to be compared Compute item performance for students in different groups at each total test score Summarize the differences in performance across all test scores

    19. DIF Can Go Both Ways: When individual students get their total scores from different items – that’s normal When there is a pattern in how groups of students get their total scores - that’s DIF When students in a group do better than expected on an item based on their total test score DIF is in favor of the group When students in a group do more poorly than expected on an item based on their total test score, DIF is against the group.

    20. Typical Causes of DIF: Impact: Students from different groups receive different educational experiences such that item performance differences reflect true differences in knowledge/skills. Culture/Background: Students from different backgrounds bring unique perspectives to bear on test items. Flaws: Flaws in items that cause one group to respond differently than another.

    21. Research on DIF for WASL Test Items: Studies conducted after items had been: reviewed by bias & sensitivity committee examined for statistical bias used in an operational test Compared performance of: Males and Females White students and Black/African American students White students and Latino/Hispanic students White students and Native American students White students and Asian/Pacific Islander students

    22. Research on DIF for WASL Test Items: Examined test items from: 1997, 1998, 1999, 2000, 2001 Grade 4 Reading and Mathematics 1998, 1999, 2000, 2001 Grade 7 Reading and Mathematics 1999, 2000, 2001 Grade 10 Reading and Mathematics

    23. DIF Results for Reading: Most reading items showed no statistical bias Reading items flagged for Gender DIF: Multiple choice items tend to favor boys Performance items tend to favor girls DIF items favoring boys tend to be related to informational passages Reading items flagged for Ethnic DIF Multiple-choice items asking for text interpretation tend to favor white students Performance-items asking for text interpretation tend to favor minority students Patterns became more extreme across grade levels

    24. Mean Number of Reading Items Flagged for DIF (Males & Females)

    25. Mean Number of Reading Items Flagged for DIF (Asian/Pacific Islander & White)

    26. Mean Number of Reading Items Flagged for DIF (Black/African & White)

    27. Mean Number of Reading Items Flagged for DIF (Native American & White)

    28. Mean Number of Reading Items Flagged for DIF (Latino/Hispanic & White)

    29. Excerpt from a reading passage: The best looking fences are often the simplest. A simple fence around a beautiful home can be like a frame around a picture. The house isn’t hidden; its beauty is enhanced by the frame. But a fence can be a massive, ugly thing, too, made of bricks and mortar. Sometimes the insignificant little fences do their job just as well as the ten-foot walls. Maybe it’s only a string stretched between here and there in a field. The message is clear; don’t cross here. Every fence has its own personality and some don’t have much. There are friendly fences. A friendly fence takes kindly to being leaned on. There are friendly fences around some playgrounds. And some playgrounds fences are more fun to play on than anything they surround. There are more mean fences than friendly fences overall, though. Some have their own built-in invitation not to be sat upon. Unfriendly fences get it right back sometimes. You seldom see one that hasn’t been hit, bashed, or bumped or in some way broken or knocked down.

    30. Example of a Reading an Item that Shows Statistical Bias in Favor of Focal Groups: In the sixth paragraph, the author talks about friendly and unfriendly fences. How can you tell them apart? ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ * Favors Latinos, Blacks/African Americans, and Asian/Pacific Islanders

    31. Example of a Reading Item that Shows Statistical Bias in Favor of Focal Groups: What is the author’s attitude toward fences? Give three pieces of evidence from the essay to support your point. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ * favors females, Asian/Pacific Islanders, and Latinos

    32. Example of a Reading Item that Shows Statistical Bias in Favor of Males and Whites

    33. DIF Results for Mathematics: Most mathematics items showed no statistical bias Mathematics items flagged for Gender DIF: Multiple choice items tend to favor boys Performance items tend to favor girls DIF items favoring boys tend to require simple applications of mathematical procedures in number, algebra, geometry, and statistics DIF items favoring girls tend to assess data analysis, measurement, complex applications, reasoning, and problem-solving Number of items flagged for DIF increased across grade levels

    34. DIF Results for Mathematics: Ethnic DIF statistical patterns: Performance items were flagged for DIF more often than multiple-choice items Slightly more of the flagged performance items favored minority students, although differences were small

    35. DIF Results for Mathematics: Content analysis of Mathematics items flagged for Ethnic DIF: Flagged items favoring Asian/Pacific Islander students generally assessed number concepts, computation, geometric procedures, algebraic procedures, and simple statistics Flagged items favoring Black/African, Native American, and Latino/Hispanic students generally assessed number, number patterns, computation, and logical reasoning Flagged items favoring White students generally assessed data analysis, data representation, measurement, reasoning, and problem-solving

    36. Mean Number of Mathematics Items Flagged for DIF (Males & Females)

    37. Mean Number of Mathematics Items Flagged for DIF (Asian/Pacific Islander & White)

    38. Mean Number of Mathematics Items Flagged for DIF (Black/African & White)

    39. Mean Number of Mathematics Items Flagged for DIF (Native American & White)

    40. Mean Number of Mathematics Items Flagged for DIF (Latino/Hispanic & White)

    41. Example of a Mathematics Item that Shows Statistical Bias in Favor of Focal Groups: Favor Latinos, Native Americans, Asian/Pacific Islanders, Black/African Americans, and Females

More Related