Evaluation Rating Forms

Evaluation Rating Forms • Craig McClure, MD • May 15, 2003 • Educational Outcomes Service Group

Typical Use of Rating Scales • End of Rotation (global) • After single encounter (focused) • To incorporate input from multiple evaluators • Videotaped encounters • NOT As checklist for single encounters: Yes/No

Alternate Forms • Multiple episodes versus focused (single) episode • Measuring global (six domains) versus task-specific behavior

Global Rating of Learner • Domains of competence, not specific skills, tasks, or behaviors • Completed retrospectively concerning multiple days and activities • May be from multiple sources • Use rating scales

Focused Rating Scale • Single patient encounter • Concerning specific task, skill, behavior

Advantages (Global) • Easy to develop • Easy to use (training minimal) • Can be used to evaluate all domains • Reasonable reliability when • Focused evaluation • Tailored to competencies measured

Systematic Rater Errors (Global) • Leniency/Severity • Range Restriction • Halo Effect • Inappropriate Weighting

Drawbacks (Global) • Content validity uncertain • Questionable validity of general assessments extrapolated to whole domain • Inefficient at directing learner improvement • Accuracy variable • Generosity factor • Poor discrimination between learners

Mixed Research results • Discriminating between competence levels • Reliably rating more skilled physicians higher than less skilled • Reliability of ratings • Reproducibility • Best: knowledge • Harder: patient care, interpersonal skills

Clarify Evaluative Objectives • Global versus focused • Define using competency-based language emphasized by ACGME

Group the Competencies • Patient Care, • Medical knowledge, • Practice-Based Learning and Improvement, • Interpersonal and Communication Skills, • Professionalism, and • Systems-Based Practice.

Composition of Form • Short is better than long • Big font is better than small • Clean better than cluttered

Each Behavior is Evaluated Independently • Otherwise: • Uncertain what to evaluate • Learner uncertain what to address

Decide on Options in the Scale • Best if minimum of five • Best if a descriptor present for each • Absence of middle labels skews ratings toward the positive side

Primacy Effect “The results showed that when the positive side of the scale was on the left, the ratings were more positive and had reduced variance than when the positive label was on the right.”

Lake Wobegon Effect • Where all the children are above average • Faculty tend to interpret anchors as more negative than literal • Generosity effect

Consider Changing Anchors • IF desire to keep evaluative anchors • Poor, fair, below average, average, above average and excellent • Very poor, poor, fair, good, very good, excellent

Consider Using Frequency Anchors • Frequency of observable resident behaviors from “never” to “always” • Considerable education of the evaluators to minimize inter-rater variability needed for judgmental rating • Permits PD competency judgment

Example of Stem for Frequency Anchor • Resident demonstrates respect in speaking to patient… • Never, • 25%, • 50%, • 75%, • Always

Competency Judgment at Program Level • Permits competency definitions to vary by year of training • Diminishes effect of inter-rater variability • Focuses on observable behavior • Requires less training of evaluators

References • Evaluations, S. Swing, Academic Emergency Medicine 2002;9:1278-88 • Assessment of Communication and Interpersonal Skills Competencies, Academic Emergency Medicine 2002;9: 1257-69 • ACGME/ABMS Joint Initiative Toolbox of Assessment Methods, September 2000

References (2) • Challenges in using rater judgments in medical education, M.A. Albanese, Journal of Evaluation in Clinical Practice,6:3: 305-319

Evaluation Rating Forms