210 likes | 223 Views
Explore the efficacy of judges in predicting exam performance using the Angoff methodology in criterion-referenced assessment. Analysis of judges' predictions compared between senior examiners and post-membership trainees for Part II MRCPCH exams.
E N D
11 Criterion Referencing JudgesWho are the best predictors? Kumar P, Dinwiddie R, Davis L, Muir-Davies A, Muir G, Newell SJ
Criterion referencing of the Part II written paper • RCPCH use a modified Angoff method to set the standard for the MRCPCH part II written paper and determine the pass mark. • This involves a panel of judges who have been trained on the Angoff methodology, where they are asked to estimate the proportion of a group of ‘borderline’ candidates that they would expect to answer the question correctly.
Angoff Methodology • In 1971, William Angoff proposed a method of assessment where “ a panel of judges are asked to independently think of a ‘borderline’ candidate who would answer the question correctly” • When Angoff first proposed the method, his instruction was to think of only one candidate. However the hypothetical pool of 100 candidates is used.
RCPCH Judging Panel • Usually 7 or 10 judges on the panel. • The judges are a mix of senior examiners and post membership trainees (SpRs). • Each judge attends an Angoff training day prior to the Angoff judging process. • We were interested to see who were better at predicting the candidates’ exam performance.
The Borderline candidate • The concept of the borderline candidate is at the core of the Angoff methodology. • Judges should continually consider this concept throughout the procedure. • It is the “just passing” candidate who has a 50% chance of passing the exam.
‘The borderline candidates’ What is a Borderline Candidate?
Modified Angoff Method for MRCPCH Part II • First step: Homework Grade Involves each panellist judging all of the questions independently and giving their estimate of the proportion of borderline candidates that they would expect to answer the question correctly. • Second step: Angoff Day The judges meet (shortly after the exam has been sat by the candidates) where all the individual judgements from the first step are shown, and discussed, and then they judge again.
Modified Angoff method for MRCPCH Part II • Third step: Presentation of Normative Data The judges are shown the proportion of candidates who answered the question correctly and are then asked to judge again. • Final step: Consensus grade The aggregate mean of each of the judgements is the pass mark for that question.
Who are the best Predictors? • Hypothesis: Senior examiners are better than the SpRs at predicting the candidates’ exam performance.
Method • 2007: 3 exam papers were criterion referenced using the modified Angoff method described. • 8000 judgements were analysed. • Individual judges’ homework grades and the consensus Angoff grade were compared to see if there was a difference between the examiners and the SpRs.
Results • In 2007, the judging panel was 33% examiners and 67% SpRs. • 2007 (1 and 2 diets): no difference between the homework grades and the Consensus Angoff rating (median 60%). p>0.05. • In 2007(3), the median homework grade was 50% for examiners and 70% for SpRs.
Results • The percentage change between the homework grade and the final consensus grade was similar in two of the diets: +3.16% for 2007(1) and +3.96% for 2007(2). • In 2007(3), the percentage change between the judges’ homework grade and the final consensus grade was +13.3% for examiners and -6.67% for the SpRs. • The examiners gave a lower final pass mark (63.3%) than the SpRs (66.7%).
Discussion • Little difference in the median gradings between the examiners and SpRs. • SpRs are better at predicting consensus scores and candidates’ performance. • SpRs gave a slightly higher final pass mark than examiners. • The examiners have a higher mean percentage change in grades between their first and final vote.
Discussion • The selection procedure for judges must not only consider judge expertise but also their anticipated ability to conceptualise a pool of ‘borderline’ examinees. • The judges’ concept of minimal competence should remain the same over the entire process, and not be influenced by exposure to test items, panel discussion or fatigue.
It is recommended that the judges’ knowledge of the examinee population be considered when selecting judges (Berk 1986) • The more confident the judges are about their expertise, the less likely they are to be unduly swayed by empirical data.
Judges should have a good understanding of how examinees think during the exam. (Jaeger 1991) • Taking the test themselves is one way for judges to better understand the thought processes of the test takers. (Hambleton and Plake 1995) • The greater the agreement between item estimates and actual performance data, the lower the error associated with the cut-off score. (Kane and Wilson 1984)
We propose a number of reasons for our results. • The trainees have recently sat the examination and can understand the thought processes of the candidates. • They work closely with the examinees and will assist them in their knowledge base for the exam. • They can contribute to an open discussion without introducing bias. • They can conceptualise a borderline candidate throughout the Angoff process.
Summary • The RCPCH judging panel of the part II written paper is unique in that it comprises both senior examiners and SpRs. • The data shows the SpRs have similar ability to examiners to predict final scores and are good predictors of consensus scores and the candidates’ performance. • This supports our policy of including trainees as judges in the Criterion Referencing of this high stakes exam.
References • Berk, R.A (1986). A consumer’s guide to setting performance standards on criterion referenced tests. Review of Educational research, 56,137-172. • Hambleton, R.K., and Plake, B.S (1995). Using an extended Angoff procedure to set standards on complex performance assessments. Applied Measurement in education, 8, 41-55. • Jaeger, R.M (1991). Selection of judges for standard setting. Educational Measurement: Issues and Practice, 10(2), 3-6, 10. • Kane, M.T.,Wilson, J (1984). Errors of measurement and standard setting in mastery testing. Applied Psychological Measurement, 8, 107-115. • Plake, B.S., and Impara, J.C. (2001). Ability of panelists to estimate item performance for a target group of candidates: An issue in judgemental standard setting. Educational Assessment, 7, 87-97