240 likes | 411 Views
M AKING A PPROPRIATE P ASS- F AIL D ECISIONS. D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA. P ASSING S CORES. Essential component of high stakes exams Reaffirm standards Their purpose is to ensure that qualified candidates pass
E N D
MAKING APPROPRIATE PASS-FAIL DECISIONS DWIGHT HARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATIONUNIVERSITY OF ALBERTA
PASSING SCORES • Essential component of high stakes exams • Reaffirm standards • Their purpose is to ensure that • qualified candidates pass • unqualified candidates do not pass • How much is enough? • Is 50% the passing score on this exam ?
REAFFIRMING STANDARDS • Performance standard • Minimally adequate level of performance to enter practice • Passing score • Point on the score scale which separates those who are successful and those who are not
THE BASIS FOR PASSING SCORES • Arbitrary judgment unavoidable • Reflect consensus of experts on reasonable expectations for evidence of competence • Imposing discrete categories on a continuum • Set to serve the interests of public and profession • Process should be as open as possible • Based on as much relevant data as possible • Rationale presented as clearly as possible
PROCESSOFSETTINGPASSINGSCORES • Unreasonable to expect 100% correct • Possible to construct tests with predetermined passing scores • Possible to adjust passing scores to achieve an acceptable pass rate • Possible to estimate a minimum passing score by combining estimates of the importance of individual test items
PASSING SCORE LEVEL • Determined by the situation and purpose • Provide society with enough sufficiently competent practitioners • Raising the passing score increases the average competence of those who pass but decreases their number • Proportions passing should remain constant • The more relevant and demanding the requirements for writing the test, the fewer are expected to fail • If more than a small proportion of successful candidates fail the exam, its validity may be subject to serious challenge.
CRITERIA FOR DEFENSIBILITY A standard setting method should … • produce appropriate classification information • be sensitive to candidate performance • be sensitive to instruction • be statistically sound • identify the “true” standard • be easy to implement and compute • be credible and easily interpretable by lay people
STANDARD SETTING METHODS • More than 3 dozen methods • Some of the better known methods include • Nedelsky • Angoff • Bookmark • Ebel • Jaeger • IRT methods
“THE INDUSTRYSTANDARD” The Angoff Method is: • the most commonly used method • convenient to use • well-researched • easily explained • easily customized • applicable to several response formats
ANGOFF METHOD • Judges assign probabilities that a hypothetical minimally competent borderline candidate will be able to answer each item correctly. • For each judge, probabilities are summed to get a minimum performance level (MPL) • MPLs are averaged to get a final passing score
MINIMALLY COMPETENT • The effectiveness of the Angoff method rests on the judges’ ability to accurately conceptualize a “minimally competent, borderline candidate.” • Repeated references to a formal summary of the behaviours and performance indicators is required • Judge training and calibration are essential
ANGOFF CALCULATIONS Passing score for this test is 3.1 items correct out of 5.
AMINORVARIANT • Judges are asked to imagine a pool of 100 minimally competent borderline students and then estimate the number of these students who would answer the item correctly • Reduces cognitive complexity of the task
VARIATIONS ON A THEME • Scales • Iterative process • Feedback between rounds • Judges’ results • Past item performance • p-values • % passing • Yes/No procedure
SCALES • Probability scales are sometimes provided to simplify the process. For example: 5%, 20%, 40%, 60%, 75%, 90%, 95% 0%, 5%, 10%, 15% … 95%, 100% 20%, 25%, 30% … 95%, 100%
ANGOFF WITH ITERATION • Most commonly used modification. • “Angoff-ing” is done a number of times. • Time between rounds is used for discussion among judges. • Intent is to reduce variability among judges on item estimates.
NORMATIVE DATA • Normative or impact data is presented just prior to the final iteration. • Improves inter-rater reliability. • Greatest impact on items that have been greatly over or underestimated.
YES/NO PROCEDURE • Judges decide whether or not a single minimally competent borderline student would or would not answer the item correctly • Attempt to simplify the cognitive complexity of the judges’ task • Comparable results to the traditional method
YES/NO CALCULATIONS • Passing score = Average of MPLs= (3+2)/2= 2.5itemscorrect
IN AN EMERGENCY • When a committee is not available, Angoff-ing can be done solo • Assign Angoff values to each item ands sum the values • Ask a colleague to review your Angoff assignments • Use an item analysis as a reality check
ROUNDING PASSING SCORES • Rarely do derived passing scores produce exact whole numbers • Rounding may have an impact on the pass/fail rate • Consider the consequences of rounding