340 likes | 629 Views
Choosing appropriate summative tests. Presented by Philip Holmes-Smith School Research Evaluation and Measurement Services. Overview of this module. Choosing Appropriate Summative Tests The reliability of summative (standardised) tests. Choosing appropriate summative tests.
E N D
Choosing appropriate summative tests. Presented by Philip Holmes-Smith School Research Evaluation and Measurement Services
Overview of this module • Choosing Appropriate Summative Tests • The reliability of summative (standardised) tests. • Choosing appropriate summative tests. • When should you administer summative tests?
The Reliability of Summative Tests
Three Questions • Do you believe that your students’ NAPLAN and/or On-Demand and/or PAT results accurately reflect their level of performance?
Three Questions • Do you believe that your students’ NAPLAN and/or On-Demand and/or PAT results accurately reflect their level of performance? • If we acknowledge that the odd student will have a lucky guessing day or a horror day, what about the majority? • Do your weakest students usually receive low scores? • Do your average students usually received scores at about expected level? • Do your best students usually receive high scores?
Three Questions • Do you believe that your students’ NAPLAN and/or On-Demand and/or PAT results accurately reflect their level of performance? • If we acknowledge that the odd student will have a lucky guessing day or a horror day, what about the majority? • Do your weakest students usually receive low scores? • Do your average students usually received scores at about expected level? • Do your best students usually receive high scores? • However, think about your students who received high and low scores: • Are your low scores too low? - (i.e. indicatively correct but too low) • Are your high scores too high? - (i.e. indicatively correct but too high)
Examples of High highs and Low lows Is this reading score reliable? This high is probably too high. Is this reading score reliable? This low is probably too low.
Item difficulties for a typical test(A test pitched at average year level standard does not have enough easy or hard questions to reliably or accurately reflect low or high scores)
Summary Statements about Scores • Low scores (i.e. more than a year below expected) indicate poor performance but the actual values should be considered as indicative only (i.e. such scores are associated with high levels of measurement error). • High scores (i.e. more than a year above expected) indicate good performance but the actual values should be considered as indicative only. (i.e. such scores are associated with high levels of measurement error). • Average scores indicate roughly expected levels of performance and the actual values are more reliable (i.e. such scores are associated with lower levels of measurement error).
Summative (Standardised) Testing • Summative testing is essential to monitor the effectiveness of your teaching, but: • NAPLAN is not reliable for all students. Furthermore, if used incorrectly, the other summative tests you administer (e.g. On-Demand, PAT, etc.) may also be unreliable. • More importantly, if NAPLAN is the only summative data used in your school you are not gathering enough information to monitor the effectiveness of your teaching at all year levels. What about Prep, Yr1, Yr2, Yr4, Yr6, Yr8 and Yr10? For example: • Year 3 NAPLAN reflects the effectiveness of your Prep-Yr2 teaching but what about the Prep teaching vs. Yr1 teaching vs. the Yr2 teaching? • Year 9 NAPLAN reflects the effectiveness of your Yr7-Yr8 teaching but what about the Yr 7 teaching vs. Yr 8 teaching?
Summative (Standardised) Testing • We need to maximise the reliability of the tests we use to monitor the effectiveness of our teaching (by better matching the difficulty of the items to the ability of the studnets). • We need to choose appropriate summative tests to monitor the effectiveness of our teaching at all year levels from Prep – Yr10!
Choosing appropriate summative tests
For whom is this test most appropriate? Prep?, Yr4?, Yr10? Test is too easy for the average Yr10 student Item Difficulties for Booklet 6 on the PAT-R (Comprehension) scale score scale Average Item Difficulty Test is about right for the average Yr4 student Test is too hard for the average Prep student
Converting Raw test Scores to PAT-R (Comprehension) scale score A Yr10 student of ability 144 who answers every question correctly (35/35) would be falsely placed at ability 169.0 (i.e. an unreliable high high) A Yr4 student of ability 120 who answers approximately half the questions correctly (18/35) would be accurately placed at ability 120.2 A Prep student of ability 79 who answers every question incorrectly (0/35) would be falsely placed at ability 67.4 (i.e. an unreliable low low)
Test difficulties of the PAR-R (Comprehension) Tests on the PAT-R score scale together with Year Level mean scores
Item difficulties of the PAR-R (Comprehension) Tests on the PAT-R score scale together with Year Level mean scores Test Booklet 2 would be a good test to give to a typical Yr 1 student because the typical item difficulties are around about the ability level of typical Yr 1 students
Test difficulties of the PAT-Maths Tests on the PATM scale score scale together with Year Level mean scores Which is the best test for an average Year 4 student? Year 10 Year 8&9 Year 6&7 Year 5 Year 4 Year 3 Year 2 • Source: • ACER, 2006 Year 1
Test difficulties of the PAT-Maths Tests on the PATM scale score scale together with Year Level mean scores The best test for an average Year 4 student is probably Test 5 (or perhaps Test 4) Year 10 Year 8&9 Year 6&7 Year 5 Year 4 Year 3 Year 2 • Source: • ACER, 2006 Year 1
Things to look for in a summative test • Needs to have a single developmental scale that shows increasing levels of achievement over all the year levels at your school. • Needs to have “norms” or expected levels for each year level (e.g. The National “norm” for Yr 3 students on TORCH is an average of 34.7). • Needs to be able to demonstrate growth from one year to the next (e.g. during Yr 4, the average student grows from a score of 34.7 in Yr 3 to an expected score of 41.4 in Yr 4 – that is 6.7 score points). • As a bonus, the test could also provides diagnostic information.
N.B. Don’t expect growth to be linear (Growth in the early and later years is more rapid than in the middle years) TORCH NORMS 90th Percentile 50th Percentile 10th Percentile
My Recommended Summative Tests(Pen & Paper) • Reading Comprehension • Progressive Achievement Test - Reading(Comprehension) (PAT-R, 4th Edition) • TORCH (2nd Ed.) and TORCH plus • Mathematics • Progressive Achievement Test - Mathematics (PAT-Maths, 3rd Edition) combined with the I Can Do Maths • Spelling • South Australian Spelling (Use Test A and Test B alternatively) • Single Word Spelling Test (SWST)
My Recommended Summative Tests(On-Line) • On-Demand - Reading Comprehension • The 30-item “On-Demand” Adaptive Reading test (Yr3 – Yr10) • On-Demand - Spelling • The 30-item“On-Demand” Adaptive Spelling test (Yr3 – Yr10) • On-Demand - Writing Conventions • The 30-item“On-Demand” Adaptive Writing Conventions test (Yr3 – Yr10) • On-Demand – General English (Comprehension, Spelling & Writing Conventions) (Yr3 – Yr10) • The 60-item“On-Demand” Adaptive General English test • English Online (Victorian Gov. Schools) • Prep-Yr2 Individual interview • On-Demand - Number • The 30-item “On-Demand” Adaptive Number test (Yr3 – Yr10) • On-Demand – Measurement, Chance & Data • The 30-item “On-Demand” Adaptive Measurement, Chance & Data test (Yr3 – Yr10) • On-Demand - Space • The 30-item “On-Demand” Adaptive Space test (Yr3 – Yr10) • On-Demand - Structure • The 30-item “On-Demand” Adaptive Structure test (Yr3 – Yr10) • On-Demand - Mathematics (Number, Measurement, Chance & Data and Space) (Yr3 – Yr10) • The 60-item“On-Demand” Adaptive General Mathematics test • PAT-Maths Plus • 10 tests from Yr1 to Yr10
Available “Adaptive” ENGLISH Tests(Choosing the right starting point is still important)
Available “Adaptive” MATHEMATICS Tests(Choosing the right starting point is still important)
Summative Testing and Triangulation • Even if you give the right test to the right student, sometimes, the test score does not reflect the true ability of the student – every measurement is associated with some error. • To overcome this we should aim to get at least three independent measures – what researchers call TRIANGULATION. • This may include: • Teacher judgment • NAPLAN results • Other pen & paper summative tests (e.g. TORCH, PAT-R, PAT-Maths, I Can Do Maths) • On-line summative tests (e.g. On-Demand ‘Adaptive’ testing, PAT-Maths Plus, English Online)
Summative Testing and Triangulation • BUT remember, more summative testing does not lead to improved learning outcomes so keep the summative testing to a minimum
When should you administer summative tests?
Timing for Summative Testing • Should be done at a time when teachers are trying to triangulate on each student’s level of performance. (i.e. mid-year and end-of-year reporting time.) • Should be done at a time that enables teachers to monitor growth – say, every six months. (i.e. From the beginning of the year to the middle of the year and from the middle of the year to the end of the year.)
Suggested timing • For Year 1 – Year 6 and Year 8 – Year 10 • Late May/Early June (for mid-year reporting and six-monthly growth*) • Late October/Early November (for end-of-year reporting and six-monthly growth) • For Prep and Year 7 and new students at other levels • Beginning of the year (for base-line data) – but record as November the year before • Late May/Early June (for mid-year reporting and six-monthly growth) • Late October/Early November (for end-of-year reporting and six-monthly growth) * November results from the year before form the base-line data for the current year. (i.e. February testing is not required for Year 1 – Year 6 or for Year 8 – Year 10)