210 likes | 396 Views
Large-scale testing: Uses and abuses. Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014. Large-scale testing: Uses and abuses. 3 types of large-scale tests Measuring test quality A chronology of mistakes E conomists misunderstand testing How SIMCE is affected.
E N D
Large-scale testing: Uses and abuses Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014
Large-scale testing: Uses and abuses 3 types of large-scale tests Measuring test quality A chronology of mistakes Economists misunderstand testing How SIMCE is affected
1. Three types of large-scale tests AchievementAptitudeNon-cognitive
Achievement tests J.M. Rice - systematically analyzed test structures & effects E.L. Thorndike - developed scoring scales Historically, were larger versions of classroom tests ~ 1900 - “scientific” achievement tests developed (Germany & USA) SOURCE: Phelps, Standardized Testing Primer, 2007
Achievement tests Purpose: to measure how much you know and can recall Developed using: content coverage analysis How validated: retrospective or concurrent validity (correlation with past measures, such as high school grades) Requires a mastery of content prior to test. Fairness assumes that all have same opportunity to learn content Coachable – specific content is known in advance SOURCE: Phelps, Standardized Testing Primer, 2007
1890s – A. Binet & T. Simon (France) • Pre-school children with mental disabilities • - achievement test not possible • - developed content-free test of mental abilities • (association, attention, memory, motor skills, reasoning) Aptitude tests 1917 – Adapted by U.S. Army to select, assign soldiers in World War 1 1930s – Harvard University president J. Conant wanted new admission test to identify students from lower social classes with the potential to succeed at Harvard developed the first Scholastic Aptitude Test (SAT) SOURCE: Phelps, Standardized Testing Primer, 2007
Aptitude tests Purpose: predict how much can be learned Developed using: skills/job analysis How validated: predictive validity, correlation with future activity (e.g., university or job evaluations) Content independent. Measures: … what student does with content provided … how student applies skills & abilities developed over a lifetime Not easily coachable – the content is either… … not known in advance, … basic, broad, commonly known by all, curriculum-free; … less dependent on the quality of schools SOURCE: Phelps, Standardized Testing Primer, 2007
Aptitude tests Aptitude tests can identify: - Students bored in school who study what interests them on their own - Students not well adapted to high school, but well adapted to university - Students of high ability stuck in poor schools SOURCE: Phelps, Standardized Testing Primer, 2007
Non-cognitive tests More recently developed – measure values, attitudes, preferences Types: integrity tests career exploration matchmaking employment “fit”
Non-cognitive tests Purpose: to identify “fit” with others or a situation Developed using: surveys, personal interviews How validated? success rate in future activities Content is personal, not learned “Faking” can be an issue (e.g., “honesty” tests)
Test reports can be “data dumps” 2. Measuring test quality 3 measures are important: 1. Predictive validity 2. Content coverage 3. Sub-group differences
Predictive validity(values from -1.0 to +1.0) …measures how well higher scores on admission test match better outcomes at university (e.g., grades, completion) A test with low predictive validity provides a little information.
A positive correlation between two measures Source: NIST, Engineering Statistics Handbook
A negative correlation between two measures Source: NIST, Engineering Statistics Handbook
No correlation between two measures Source: NIST, Engineering Statistics Handbook
Howdoesonemeasurepredictivecapacity?CorrelationCoefficient: I--------------------------------------------I-1 0 1
Predictive validities: SAT and PSU SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
Predictive validities: SAT and PSU (faculty: Administracion) SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013