150 likes | 257 Views
Standard setting and maintenance for Reformed GCSEs. Robert Coe. Defining ‘standards’. Don’t Think about criteria or intended meanings of grades Think about subject-specific knowledge Do Focus on when (and why) grades are treated interchangeably
E N D
Standard setting and maintenance for Reformed GCSEs Robert Coe
Defining ‘standards’ • Don’t • Think about criteria or intended meanings of grades • Think about subject-specific knowledge • Do • Focus on when (and why) grades are treated interchangeably • Focus on the actual interpretations given to grades
How are grades interpreted? • The claim by teachers in the recent GCSE English dispute that students who met the criteria deserve a C • The grade indicates specific competences within the subject domain that have been demonstrated on the assessment occasion. • The use of a B in GCSE maths as a filter for A level study in maths • The grade indicates specific competences within the subject domain that the candidate is likely to be able reproduce in the future. • Employers requiring C in maths and English (or 5Cs) • The grade indicates competences transferable to employment contexts that the candidate is likely to be able reproduce in the future. • Use of GCSE results in league tables to judge schools • Average grades for a class or school (especially if referenced against prior attainment) indicate the impact (and hence quality) of the teaching experienced.
Standard setting and maintenance in high performing jurisdictions
Typology of methodologies for standard setting and maintaining • Judgement-based methods • Criterion-based judgement • Item-based judgement • Comparative judgment • Judgement of demand • Statistics-based methods • Classical equating models • IRT equating • Equating designs • Reference/anchor test • Common candidate methods • Pre-testing designs • Norm/cohort referencing
Jurisdictions • England (GCSE & GCE) • China (Gaokao) • Finland (Matriculation Exam) • Australia (NSW) • USA (Delaware, Texas) • South Korea • Hong Kong • PISA
Judgement against criteria (p49) Pros Cons Hard to develop criteria that are neither vague nor constrainingly narrow. Very difficult to maintain a standard – systemic grade inflation and annual fluctuations are likely. Undermined by ‘compensation’ – overall grades do not guarantee specific competences. Criterion-referenced interpretations are problematic if approach is blended with other (statistical) approaches. Awarding using collective judgement of experts is expensive (if done properly) and adds time to the process. • Familiarity & (perceived) continuity with the current system • Grades are readily interpretable in terms of performance (skills, knowledge) • Provides a sense check on statistical methods
Norm referencing (p54) Pros Cons Does not allow interchangeability of grades across years or subjects. Grades have no extrinsic meaning. Cannot measure change in performance over time. Big discontinuity with previous standards if all subjects have same norm. • Simple to understand • Quick and easy to apply • Prevents spurious rises in grades (‘grade inflation’)
Scaling against a reference test (p62) Pros Cons New to UK, unfamiliar and potentially controversial (Rasch, etc). Complex. May be hard to explain to public and other stakeholders Annual cost (financial and time) of additional testing. Initial development cost of reference test. Security of reference test may be hard to maintain. • Clear reference points for standard setting. • Definite reference points for maintaining standards. • Rigorous, academically supported (research papers etc.) • Stops ‘grade inflation’ • Allows international comparison. • Allows grades to be interpreted against construct. • Can measure change in performance over time. • Allows interchangeability of grades across years and subjects.
Recommendations (1-4) • The SS&M approach should combine elements of criterion-referencing, norm-referencing and scaling against a reference test • A clear statement of the interpretations of outcomes (grades or scores) that are intended or supported, and any expected but unintended interpretations that are not supported. • Outcomes should be reported both as broad grades and as fine scores. • Development of a high-quality reference test must include piloting, psychometric analysis and validation against intended interpretations.
Recommendations (5-7) • The reference test should be taken during the normal examination window • Initial standard setting should draw on a range of approaches including expert judgement against grade descriptors and specific competences, analysis of demand, Angoff and bookmark methods; using population norms as a guide and scaling against items from international benchmarks in a reference test. • Standards across subjects that may be treated interchangeably must be aligned.
Recommendations (8-10) • Maintaining standards in subsequent years should depend largely on the reference test, supported by judgement methods and checked against changes in population norms. • A strategy for updating and releasing items from the reference test needs to be developed. • As far as possible, the principle of full transparency and disclosure of the SS&M procedures and results should be observed, supported by a strong communications strategy.