180 likes | 283 Views
“Sterling Examples of Computer Simulations & OSCEs (Objective Structured Clinical Examinations) ”. Carol O’Byrne Jeffrey Kelley Richard Hawkins Sydney Smee. Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona. Session Format.
E N D
“Sterling Examples of Computer Simulations & OSCEs(Objective Structured Clinical Examinations)” Carol O’Byrne Jeffrey Kelley Richard Hawkins Sydney Smee Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Session Format • Introduction: 25+ years of Performance Assessment • Presentations • Richard Hawkins, National Board of Medical Examiners overview of a new national OSCE • Jeff Kelley, Applied Measurement Professionals development of a new real estate computer simulation • Sydney Smee, Medical Council of Canada setting performance standards for a national OSCE • Carol O’Byrne, Pharmacy Examining Board of Canada scoring performance and reporting results to candidates for a national OSCE • Q&A Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Session Goals • Consider the role and importance of simulations in a professional qualifying examination context • Explore development and large scale implementation challenges • Observe how practice analysis results are integrated with the implementation of a simulation examination • Consider options for scoring, standard setting and reporting to candidates • Consider means to enhance fairness and consistency • Identify issues for further research and development Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Defining ‘Performance Assessment’ ...the assessment of the integration of two or more learned capabilities …i.e., observing how a candidate performs a physical examination (technical skill) is not performance-based assessment unless findings from the examination are used for purposes such as generating a problem list or deciding on a management strategy (cognitive skills) (Mavis et al, 1996) Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Why Test Performance? To determine if individuals can ‘do the job’ • integrating knowledge, skills and abilities to solve complex client and practice problems • meeting job-related performance standards To complement MC tests • measuring important skills, abilities and attitudes which are difficult to impossible to measure through MCQs alone • reducing impact of factors, such as cuing, logical elimination & luck or chance that may confound MC test results Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
A 25+ Year Spectrum of Performance Assessment • ‘Pot luck’ direct observation apprenticeship, internship, residency programs • Oral and pencil-paper, short- or long-answer questions • Hands-on job samples military, veterinary medicine, mechanics, plumbers • Portfolios advanced practice, continuing competency Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Simulations • Electronic: architecture, aviation, respiratory care, real estate, nursing, medicine, etc. • Objective Structured Clinical Examination (OSCE): medicine, pharmacy, physiotherapy, chiropractic medicine, massage therapy and including the legal profession, psychology, and others Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Simulation Promotes Evidence-based Testing… 1900 Wright brothers flight test Flew manned kite 200 feet in 20 seconds 1903 Wright brothers flight test Flew manned glider 852 feet in 59 seconds, 8 to 12 feet in the air! In between they built a wind tunnel • to simulate flight under various wind direction and speed conditions, varying wing shapes, curvatures and aspect ratios • to test critical calculations and glider lift • to assess performance in important and potentially risky situations without incurring actual risk Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Attitudes, Skills and Abilities tested through Simulations Attitudes: • client centeredness • alignment with ethical and professional values and principles Skills: • interpersonal and communications • clinical, e.g. patient / client care • technical Abilities to: • analyze and manage risk,exercise sound judgment • gather, synthesize and critically evaluate information • act systematically and adaptively, independently and within teams • defend, evaluate and/or modify decisions/actions taken • monitor outcomes and follow up appropriately Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Performance / Simulation Assessment Design Elements • Domain(s) of interest & sampling plan • Realistic context – practice-related problems and scenarios • Clear, measurable performance standards • Stimuli and materials to elicit performance • Administrative, observation and data collection procedures • Assessment criteria that reflect standards • Scoring rules that incorporate assessment criteria • Cut scores/performance profiles reflecting standards • Quality assurance processes • Meaningful data summaries for reports to candidates and others Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Score Variability and Reliability • Multiple factors interact and influence scores • differential and compensatory aptitudes of candidates (knowledge, skills, abilities, attitudes) • format, difficulty and number of tasks or problems • consistency of presentation between candidates, locations, occasions • complex scoring schemes (checklists, ratings, weights) • rater consistency between candidates, locations, occasions • Designs are often complex (not crossed) • examinees ‘nested’ within raters - within tasks – within sites, etc. • Problems and tasks are multidimensional Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Analyzing Performance Assessment Data • Generalizability (G) studies – to identify and quantify sources of variation • Dependability (D) studies – to determine how to minimize the impact of error and optimize score reliability • Heirarchicallinear modeling (HLM) studies – to quantify and rank sources of variation in complex nested designs Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Standard Setting • What score or combination of scores (profile) indicates that the candidate is able to meet expected standards of performance, thereby fulfilling the purpose(s) of the test? • What methods can be used to determine this standard? Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Reporting Results to Candidates Pass-fail (classification) May also include: • Individual test score and passing score • Sub-scores by objective(s) and/or other criteria • Quantile standing among all candidates – or among those who failed • Group data - score ranges, means, standard deviations) • Reliability and validity evidence (narrative, indices and/or error estimates and their interpretation) • Other Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Some Validity Questions • Exactly what are we measuring with each simulation? Does it support the test purpose? • To what extent is each candidate is presented with the same or equivalent challenges? • How consistently are candidates’ performances assessed no matter who or where the assessor is? • Are the outcomes similar to findings in other comparable evaluations? • How ought we to inform & report to candidates about performance standards / expectations & their own performance strengths/gaps? Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Evaluation Goals Validity evidence • Strong links from job analysis to interpretation of test results • Simulation performance relates to performance in training and other tests of similar capabilities • Reliable, generalizable scores and ratings • Dependable pass-fail (classification) standards Feasibility and sustainability • For program scale (number of candidates, sites, etc.) • Economic, human, physical, technological resources Continuous evaluation and enhancement plan Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Wisdom Bytes • Simulations should be as true to life as possible (fidelity) • Simulations should test capabilities that cannot be tested in more efficient formats • Simulation tests should focus on integration of multiple capabilities rather than on a single basic capability • The nature of each simulation/task should be clear but candidates should be ‘cued’ only as far as is realistic in practice • Increasing the number of tasks contributes more to the generalizability and dependability of results than increasing the number of raters Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
Expect the Unpredictable… Candidate diversity • Language • Training • Test format familiarity • Accommodation requests Logistical challenges • Technological glitches • Personnel fatigue and/or attention gaps • Site variations Security cracks • Test content exposure in prep programs, study materials – in various languages Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona