310 likes | 495 Views
Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations. Cara Cahalan-Laitusis. Review types of evidence Review current research designs Pros/Cons for each approach. Types of Validity Evidence. Psychometric research
E N D
Operational Data or Experimental Design?A Variety of Approaches to Examining the Validity of Test Accommodations Cara Cahalan-Laitusis
Review types of evidence • Review current research designs • Pros/Cons for each approach
Types of Validity Evidence • Psychometric research • Experimental research • Survey research • Argument based approach
Psychometric Indicators (National Academy of Sciences, 1982) • Reliability • Factor Structure • Item functioning • Predicted Performance • Admission Decisions
Psychometric Evidence • Is the test as reliable when taken with and without accommodations? (Reliability) • Does the test (or test items) appear to measure the same construct for each group? (Validity) • Are test items of relatively equal difficulty for students with and without a disability who are matched on total test score? (Fairness/Validity)
Psychometric Evidence • Are completion rates relatively equal between students with and without a disability who are matched on total test score? (Fairness) • Is equal access provided to testing accommodations across different disability, racial/ethnic, language, gender, and socio-economic groups? (Fairness) • Do tests scores under or over predict an alternate measure of performance (e.g., grades, teacher ratings, other test scores, post graduate success) for students with disabilities? (Validity)
Advantages of Operational Data • Cost effective • Quick results • Easy to replicate • Provides evidence of validity • Large sample size • Motivated test takers
Disability and accommodation are confounded Order effects can not be controlled for Sample size can be insufficient Difficult to show reasons why data is not comparable between subgroups Disability and Accommodation codes are not always accurate Approved accommodations may not be used Disability category may be too broad Limitations of Operational Data
Types of Analyses • Correlations • Factor Analysis • Differential Item Functioning • Descriptive analyses
Relationship Among Content Areas • Correlation between content areas (e.g. reading and writing) can also assess a tests reliability. • Compare correlations among content areas by population (e.g., LD with read aloud vs. LD without an accommodation) • Does the accommodation alter construct being measured? (e.g., correlations between reading and writing may be lower if read aloud is used for writing but not reading). • Is correlation significantly lower for one population? (difference of .10 or greater)
Reliability • Examine internal consistency measures • with and without specific accommodations • with and without a disability • Examine test-retest reliability between different populations • with and without specific accommodations • with and without a disability
Factor Structure • Types of questions • Are the number of factors invariant? • Are the factor loadings invariant for each of the groups? • Are the intercorrelations of the factors invariant for each of the groups?
Differential Item Functioning • DIF refers to a difference in item performance between two comparable groups of test takers • DIF exists if test takers who have the same underlying ability level are not equally likely to get an item correct • Some recent DIF studies on accommodations/disability • Bielinski, Thurlow, Ysseldyke, Freidebach & Friedebach, 2001 • Bolt, 2004 • Barton & Finch, 2004 • Cahalan-Laitusis, Cook, & Aicher, 2004
Issues Related to the Use of DIF Procedures for Students with Disabilities • Group characteristics • Definition of group membership • Differences between ability levels of reference and focal groups • The characteristics of the criterion • Unidimensional • Reliable • Same meaning across groups
Procedures/Sample • DIF Procedures (e.g., Mantel-Haenszel, Logistic regression, DIF analysis paradigm, Sibtest) • Reference/focal groups • minimum of 100 per group, ETS uses a minimum of 300 for most operational tests • Select groups that are specific (e.g., LD with read aloud) rather than broad (e.g., all students with IEP or 504)
DIF with hypotheses • Generate hypotheses on why items may function differently • Code items based on hypotheses • Compare DIF results with item coding • Examine DIF results to generate new hypotheses
Other Psychometric Research • DIF to examine fatigue on extended time • Item completion rates between groups matched on ability • Loglinear analysis to examine if specific demographic subgroups (SES, race/ethnicity, geographic regions, gender) are using specific accommodation less than other groups.
Other Research Studies • Experimental Research • Differential Boost • Survey/Field Test Research • Argument-based Evidence
Advantages of Collecting Data • Disability and accommodation can be examined separately • Form and Order effects can be controlled • Sample can be specific (e.g., reading-based LD rather than all LD or LD with or without ADHD) • Opportunity to collect additional information • Reasons for differences can be tested • Data can be reused for psychometric analyses
Cost of large data collection Test takers may not be as motivated More time consuming than psychometric research Over testing of students Disadvantages
Differential Boost (Fuchs & Fuchs 1999) • Would students without disabilities benefit as much from the accommodation as students with disabilities? • If Yes then the accommodation is not valid. • If No, then the accommodation may be valid.
Ways to reduce cost: • Decrease sample size • Randomly assign students to one of two conditions • Use operational test data for one of the two sessions
Additional data to collect: • Alternate measure of performance on construct being assessed • Teacher survey (ratings of student performance, history of accommodation use) • Student survey • Observational data (how student used accommodation) • Timing data
Additional Analyses • Differential Boost • by subgroups • controlling for ability level • Psychometric properties (e.g, DIF) • Predictive Validity (alt performance measure required)
Field Testing Survey • How well does item type measure intended construct (e.g., reading comprehension, problem solving)? • Did you have enough time to complete this item type? • How clear were the directions (for this type of test question)?
Field Testing Survey • How would you improve this item type? • To make the directions clearer • To measure the intended construct • What specific accommodations would improve this item type? • Which presentation approach did the test takers prefer?
Additional Types of Surveys • How accommodation decisions are made • Expert opinion on how/if accommodation interferes with construct being measured • Information on how test scores with and without accommodations interpreted • Correlation between use of accommodations in class and on standardized tests
Additional Research Designs • Think Aloud Studies or Cognitive Labs • Item Timing Studies • Scaffolded Accommodations
Argument-Based Validity • Clearly Define Construct Assessed • Evidence Centered Design • Decision Tree