160 likes | 323 Views
ONR Advanced Distributed Learning Impact of Language Factors on the Reliability and Validity of Assessment for ELLs Jamal Abedi University of California, Los Angeles National Center for Research on Evaluation, Standards, and Student Testing (CRESST) July 18, 2003.
E N D
ONR Advanced Distributed Learning Impact of Language Factors on the Reliability and Validity of Assessment for ELLs Jamal Abedi University of California, Los Angeles National Center for Research on Evaluation, Standards, and Student Testing (CRESST) July 18, 2003
Classical Test Theory: Reliability 2 s2X = s2T + s2E X: Observed ScoreT: True ScoreE: Error Score rXX’=s2T /s2X rXX’= 1- s2E /s2X Textbook examples of possible sources that contribute to the measurement error: RaterOccasionItemTest Form
Generalizability Theory:Partitioning Error Variance into Its Components 3 s2(Xpro) = s2p + s2r + s2o + s2pr + s2po + s2ro + s2pro,e p: Personr: Ratero: Occasion Are there any sources of measurement error that may specifically influence ELL performance?
Validity of Academic Achievement Measures 4 We will focus on construct and content validity approaches: A test’s construct validity is the degree to which it measures the theoretical construct or trait that it was designed to measure (Allen & Yen, 1979, p. 108). A test’s content validity involves the careful definition of the domain of behaviors to be measured by a test and the logical design of items to cover all the important areas of this domain (Allen & Yen, 1979, p. 96). Examples: A content-based achievement test has construct validity if it measures the content that it is supposed to measure. A content-based achievement test has content validity if the test content is representative of the content being measured.
Two major questions on the psychometric of academic achievement tests for ELLs: 5 Are there any sources of measurement error that may specifically influence ELL performance? Do achievement tests accurately measure ELLs’ content knowledge?
Study #9 Impact of students’ language background on content-based performance: analyses of extant data (Abedi & Leon, 1999). Analyses were performed on extant data, such as Stanford 9 and ITBS SAMPLE: Over 900,000 students from four different sites nationwide. Study #10 Examining ELL and non-ELL student performance differences and their relationship to background factors (Abedi, Leon, & Mirocha, 2001). Data were analyzed for the language impact on assessment and accommodations of ELL students. SAMPLE: Over 700,000 students from four different sites nationwide. Finding • The higher the level of language demand of the test items, the higher the performance gap between ELL and non-ELL students. • Large performance gap between ELL and non-ELL students on reading, science and math problem solving (about 15 NCE score points). • This performance gap was reduced to zero in math computation.
Normal Curve Equivalent Means and Standard Deviations for Students in Grades 10 and 11, Site 3 School District Reading Science Math MSDMSDMSDGrade 10 SD only 16.4 12.7 25.5 13.3 22.511.7 LEP only 24.0 16.4 32.9 15.3 36.8 16.0 LEP & SD 16.3 11.2 24.8 9.3 23.6 9.8 Non-LEP & SD 38.0 16.0 42.6 17.2 39.6 16.9 All students 36.0 16.9 41.3 17.5 38.5 17.0 Grade 11 SD Only 14.9 13.2 21.5 12.3 24.3 13.2 LEP Only 22.5 16.1 28.4 14.4 45.5 18.2 LEP & SD 15.5 12.7 26.1 20.1 25.1 13.0 Non-LEP & SD 38.4 18.3 39.6 18.8 45.2 21.1 All Students 36.2 19.0 38.2 18.9 44.0 21.2 Note. LEP = limited English proficient. SD = students with disabilities.
Disparity Index (DI) was an index of performance differences between LEP and non-LEP. SITE 3 Disparity Index (DI) Non-LEP/Non-SD Students Compared to LEP-Only Students Disparity Index (DI) Math Math Grade Reading Math Total Calculation Analytical 3 53.4 25.8 12.9 32.8 6 81.6 37.6 22.2 46.1 8 125.2 36.9 25.2 44.0
Issues and problems in classification of students with limited English proficiency
Findings The relationship between language proficiency test scores and LEP classification. Since LEP classification is based on students’ level of language proficiency and because LAS is a measure of language proficiency, one would expect to find a perfect correlation between LAS scores and LEP levels (LEP versus non-LEP). The results of analyses indicated a weak relationship between language proficiency test scores and language classification codes (LEP categories). • Correlation between LAS rating and LEP classification for Site 4
Correlation coefficients between LEP classification code and ITBS subscales for Site 1
Generalizability Theory:Language as an additional source of measurement error s2(Xprl) = s2p + s2r + s2l + s2pr + s2pl + s2rl + s2prl,e p: Personr: Raterl: Language Are there any sources of measurement error that may specifically influence ELL performance?