Changes in test Scores with Multiple Sittings of CanTEST

Changes in test Scores withMultiple Sittings of CanTEST Philip Nagy

Rationale Research Questions • Do test scores change on repeating the test? • Is change related to length of time between sittings? Test Development Questions • Can data from repeaters be used in test calibration for new form development? Context: Receptive Skills Official Languages and Bilingualism Institute

The Data Listening Tests: Six forms with 15 short and 25 long passage items Reading Tests: Seven forms with 15 skim-and-scan, 20 reading passage, and 25 cloze items The Sample: Mean first score of 3.6, compared to 4.3 for those who write only once Assumptions • Difficulty of forms is balanced across sittings (true) • Samples writing each form are equivalent (untested) Official Languages and Bilingualism Institute

Listening Results: Sitting 2 minus Sitting 1 (N=179) Official Languages and Bilingualism Institute

Listening Results, another look Official Languages and Bilingualism Institute

Listening Results Interpretation • How important is the improvement? • On average, 3.6 points needed out of 40 to improve one band • So, 2.6 points is about 75% of a band improvement Official Languages and Bilingualism Institute

Listening Results Interpretation • Can the data be used for test calibration? • The changes in average item difficulty are different for the subtests • .088 for short passages • .052 for long passages • The difference of .036 (.088 - .052) is about the same as the standard error of the difficulty indices • Listening data from repeaters should not be used for item calibration Official Languages and Bilingualism Institute

Changes in Listening by Length of Time between Sittings 1Difference significant, p=0.05 Those who repeat sooner do better than those who repeat later Official Languages and Bilingualism Institute

Reading Results: Sitting 2 minus Sitting 1 (N=284) Note: Reading Score is doubled to give a total out of 80 rather than 60. Official Languages and Bilingualism Institute

Reading Results, another look Official Languages and Bilingualism Institute

Reading Results Interpretation • How important is the improvement? • On average, 6.5 points needed (out of 80) to improve one band • So, 3.45 points is about 55% of a band improvement Official Languages and Bilingualism Institute

Reading Results Interpretation • Can the data be used for test calibration? • The changes in average item difficulty are different for the subtests • +0.072 for skim-and-scan • +0.050 for reading passages • +0.002 for cloze • The largest difference of .070 (.072 - .002) is two to three times larger than the standard error of the difficulty indices • Reading data from repeaters should not be used for item calibration Official Languages and Bilingualism Institute

Changes in Reading by Length of Time between Sittings 1Difference significant, p=0.05 Those who repeat later actually do worse than those who repeat sooner Official Languages and Bilingualism Institute

Conclusion • Listening: • 30% of sample do more poorly on 2nd sitting • Average gain is 75% of a band score • Differences in gains across item types vary by an item standard error • Reading • 40% of sample do more poorly on 2nd sitting • Average gain is 55% of a band score • Differences in gains across item types vary by 2-3 times an item standard error • Both • Those who rewrite within six months do better • Data from repeaters should not be used for item calibration Official Languages and Bilingualism Institute

Changes in test Scores with Multiple Sittings of CanTEST

Changes in test Scores with Multiple Sittings of CanTEST

Presentation Transcript

Content-based Interpretations of Test Scores

Confidence in the cut scores: Evaluating the consistency of scores for a test of oral English

How to Raise Test Scores

Linear Regression: Test scores vs. HW scores

Problems with the Use of Student Test Scores to Evaluate Teachers

Making Sense of Standardized Test Scores

Statistics: Test Scores

Test 1 changes

RAISING TEST SCORES

Beyond test scores: the role of primary schools in improving multiple child outcomes

Statistics: Test Scores

How to Interpret Test Scores

Test scores

Interpreting Test Scores: Making Sense of the Numbers

Importing Standardized Test Scores

Ranking with Uncertain Scores

Impact of Interruptions on Test Scores in Indiana

Upcoming Test Changes

Issues in Comparability of Test Scores Across States

Changes in Client Participation in Home Visits with Multiple Nursing Contacts

Recommendation for English multiple-choice cloze questions based on expected test scores

c. The t Test with Multiple Samples