460 likes | 570 Views
Can Large-Scale Tests be Fair to All Students? Bias Issues Related to WASL . Catherine S. TaylorUniversity of WashingtonNovember 2, 2006 . Background:. 10 years experience in test development (1981 1991) prior to coming to the University of WashingtonMoved to the University of Washington in 19
E N D
1. Three part colloquium series: Can Large Scale Tests be Fair to All Students: Research on Bias Issues for WASL (November 2)
WASL History and Early Research: Everything You Needed to Know About WASL but Didn’t Think to Ask (December 1)
Classroom-Based Assessments and State Standards: Implementing Alternatives to Standardized Tests (December 11)
2. Can Large-Scale Tests be Fair to All Students?Bias Issues Related to WASL Catherine S. Taylor
University of Washington
November 2, 2006
3. Background: 10 years experience in test development (1981 – 1991) prior to coming to the University of Washington
Moved to the University of Washington in 1991 (School Reform Law passed in 1993)
Principal Investigator for R&D Grant (1994 - 1995): to support development of prototype assessments of the Essential Academic Learning Requirements (EALRs)
Washington State Technical Advisory Committee for Assessment (1995-1999)
Principal Investigator for WASL Validity Research Grant (2000-2004): to investigate validity of WASL scores
4. The focuses of my research: How to prepare teachers for effective classroom-based assessments
Validity theory
Validity and large scale testing policy
Threats to the validity of large scale tests
5. Focuses of this presentation: Study of Bias and Sensitivity Review procedures used for WASL (2004)
Report of input from two Public Forums on Bias and Sensitivity (2004)
Yakima
Seattle
Studies of ‘Differential Item Functioning’ (AKA statistical bias) in WASL test items (1997-2001)
6. What is an Item? An item is a question or set of directions (prompt)
Multiple-choice item:
A question or prompt
3-4 answer choices, only one of which is correct
Performance item:
A question or prompt
Space in which students construct an answer
A rule for assigning points to students’ answers
WASL performance items:
short answer (0-2 points)
Extended response (0-4 points)
7. WASL items are developed using state of the art procedures: Test Specifications: define how many and what types of items will be on a test
Item Specifications: define exactly what kinds of items will assess each Grade Level Expectation (GLE)
Item writing: overseen by skilled test developers
Item reviews: check for match to GLEs by teachers
Bias and sensitivity reviews: by individuals who represent the diversity of WA State students
8. WASL test items are ‘tested’ using state of the art procedures: Item pilots: items are randomly assigned to students throughout WA State
Item data reviews: based on students’ performances
Statistical difficulty: Is the item easy or difficult because of content tested NOT some flaw in the item?
Statistical validity: Do high performing students do better on the item than low performing students?
Statistical bias: Is item performance related to level of knowledge and skill NOT group membership?
9. Study 1: Bias & Sensitivity Reviews Committee members represent diversity in the student population (regions, ethnicity, gender, socio-economic status, religion, special population issues)
Members review reading passages and items for:
Implied or overt stereotyping or negative representations of any group
Too much or too little representation of any group
Terms that may be confusing to students based on language, region, culture, socio-economic status, etc.
Controversial issues and topics that may affect some groups more than others
10. Procedures Used to Observe Bias & Sensitivity Reviews: Participant-observer
Recorded panelists comments during review process
Cross checked records with facilitator notes
Looked for patterns in notes/records in relation to reading passages and items
11. Results of Bias and Sensitivity Review Observations: Few passages or test items are identified as problematic
Reading passages present the greatest potential for bias
Sources of bias in reading passages are subtle
12. Reading passages present the greatest potential for bias: WASL includes:
narrative and informative passages
passages with social studies, science, and literary content
WASL reading passages are from published sources
Authors resist changes to their published writing (even when changes lessen bias/stereotyping)
13. Sources of bias in reading passages are subtle: Alterations of original narratives:
Use of legends and folk tales may be altered to fit Western notions of literature
Language changes can change meaning (first feast vs. barbeque)
“Othering”:
Biographies may focus on how individuals overcame or coped with their minority status (Jackie Robinson; Helen Keller)
Informational passages about cultural groups may have a patronizing tone (i.e., aren’t “their” ways cute)
Interpretations: Items may focus on interpretations that are unique to middle class values rather than values of the culture of origin
14. Study 2: Bias & Sensitivity Forums Two community forums (Yakima and Seattle)
Community members came together to discuss concerns about WASL
Participants included:
Teachers and school administrators
Tribal elders
Latino community leaders
Parents and community members
15. Procedures used to Gather Data during Bias & Sensitivity Forums Did mock bias & sensitivity review
Presented methods used for statistical “bias” analysis (also called differential item functioning (DIF))
Showed items flagged for DIF and asked for likely causes
Small group discussion with reports to larger group
Recorded participant ideas about bias issues in WASL
Examined written notes and chart paper for themes
16. Themes in Participant Comments Need for involvement of minority teachers in all stages of WASL development work
Need for sensitivity to cultural values in selection of reading passages, item content, and the types of questions (particularly in reading)
Need for inclusion of tribal elders in selection of text and contexts for WASL items
Need for inclusion of individuals with cultural expertise in bias/sensitivity review panels
17. Study 3: Differential Item Functioning (DIF) Analyses Typical Steps in a DIF Analysis: Identify groups to be compared
Compute item performance for students in different groups at each total test score
Summarize the differences in performance across all test scores
19. DIF Can Go Both Ways: When individual students get their total scores from different items – that’s normal
When there is a pattern in how groups of students get their total scores - that’s DIF
When students in a group do better than expected on an item based on their total test score DIF is in favor of the group
When students in a group do more poorly than expected on an item based on their total test score, DIF is against the group.
20. Typical Causes of DIF: Impact: Students from different groups receive different educational experiences such that item performance differences reflect true differences in knowledge/skills.
Culture/Background: Students from different backgrounds bring unique perspectives to bear on test items.
Flaws: Flaws in items that cause one group to respond differently than another.
21. Research on DIF for WASL Test Items: Studies conducted after items had been:
reviewed by bias & sensitivity committee
examined for statistical bias
used in an operational test
Compared performance of:
Males and Females
White students and Black/African American students
White students and Latino/Hispanic students
White students and Native American students
White students and Asian/Pacific Islander students
22. Research on DIF for WASL Test Items: Examined test items from:
1997, 1998, 1999, 2000, 2001 Grade 4 Reading and Mathematics
1998, 1999, 2000, 2001 Grade 7 Reading and Mathematics
1999, 2000, 2001 Grade 10 Reading and Mathematics
23. DIF Results for Reading: Most reading items showed no statistical bias
Reading items flagged for Gender DIF:
Multiple choice items tend to favor boys
Performance items tend to favor girls
DIF items favoring boys tend to be related to informational passages
Reading items flagged for Ethnic DIF
Multiple-choice items asking for text interpretation tend to favor white students
Performance-items asking for text interpretation tend to favor minority students
Patterns became more extreme across grade levels
24. Mean Number of Reading Items Flagged for DIF (Males & Females)
25. Mean Number of Reading Items Flagged for DIF (Asian/Pacific Islander & White)
26. Mean Number of Reading Items Flagged for DIF (Black/African & White)
27. Mean Number of Reading Items Flagged for DIF (Native American & White)
28. Mean Number of Reading Items Flagged for DIF (Latino/Hispanic & White)
29. Excerpt from a reading passage: The best looking fences are often the simplest. A simple fence around a beautiful home can be like a frame around a picture. The house isn’t hidden; its beauty is enhanced by the frame. But a fence can be a massive, ugly thing, too, made of bricks and mortar. Sometimes the insignificant little fences do their job just as well as the ten-foot walls. Maybe it’s only a string stretched between here and there in a field. The message is clear; don’t cross here.
Every fence has its own personality and some don’t have much. There are friendly fences. A friendly fence takes kindly to being leaned on. There are friendly fences around some playgrounds. And some playgrounds fences are more fun to play on than anything they surround. There are more mean fences than friendly fences overall, though. Some have their own built-in invitation not to be sat upon. Unfriendly fences get it right back sometimes. You seldom see one that hasn’t been hit, bashed, or bumped or in some way broken or knocked down.
30. Example of a Reading an Item that Shows Statistical Bias in Favor of Focal Groups: In the sixth paragraph, the author talks about friendly and unfriendly fences. How can you tell them apart?
________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
* Favors Latinos, Blacks/African Americans, and Asian/Pacific Islanders
31. Example of a Reading Item that Shows Statistical Bias in Favor of Focal Groups: What is the author’s attitude toward fences? Give three pieces of evidence from the essay to support your point.
________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
* favors females, Asian/Pacific Islanders, and Latinos
32. Example of a Reading Item that Shows Statistical Bias in Favor of Males and Whites
33. DIF Results for Mathematics: Most mathematics items showed no statistical bias
Mathematics items flagged for Gender DIF:
Multiple choice items tend to favor boys
Performance items tend to favor girls
DIF items favoring boys tend to require simple applications of mathematical procedures in number, algebra, geometry, and statistics
DIF items favoring girls tend to assess data analysis, measurement, complex applications, reasoning, and problem-solving
Number of items flagged for DIF increased across grade levels
34. DIF Results for Mathematics: Ethnic DIF statistical patterns:
Performance items were flagged for DIF more often than multiple-choice items
Slightly more of the flagged performance items favored minority students, although differences were small
35. DIF Results for Mathematics: Content analysis of Mathematics items flagged for Ethnic DIF:
Flagged items favoring Asian/Pacific Islander students generally assessed number concepts, computation, geometric procedures, algebraic procedures, and simple statistics
Flagged items favoring Black/African, Native American, and Latino/Hispanic students generally assessed number, number patterns, computation, and logical reasoning
Flagged items favoring White students generally assessed data analysis, data representation, measurement, reasoning, and problem-solving
36. Mean Number of Mathematics Items Flagged for DIF (Males & Females)
37. Mean Number of Mathematics Items Flagged for DIF (Asian/Pacific Islander & White)
38. Mean Number of Mathematics Items Flagged for DIF (Black/African & White)
39. Mean Number of Mathematics Items Flagged for DIF (Native American & White)
40. Mean Number of Mathematics Items Flagged for DIF (Latino/Hispanic & White)
41. Example of a Mathematics Item that Shows Statistical Bias in Favor of Focal Groups: Favor Latinos, Native Americans, Asian/Pacific Islanders, Black/African Americans, and Females