420 likes | 440 Views
Explore basic principles and create a measurement tool for assessing chocolates. Learn to identify and control errors in performance-based measures. Develop criteria, rating scale, train raters, and evaluate reliability.
E N D
A “Sweet Approach” to Understanding Basic Principles of Educational Measurement Assessing the Performance of Chocolates
Objectives • At the end of instruction, participants will • Describe sources of error that threaten the reliability and validity of performance assessment measures • List specific strategies to address these threats • Define and appropriately employ performance measurement terminology • Anchors, Likert, Horns and Halo Effects • Develop and test a performance-based measurement instrument • Construct scale • Train raters in using it effectively • Assess validity and reliability of measures • Identify/ explain sources of error
Lesson Plan: Set the task • Judge at the Wisconsin State Fair for “Open Class” Commercial Chocolates • Develop key factors to rate chocolates • Develop the rating scale • Train other “judges” • Taste chocolates and rate • Overview Key Measurement Principles • Goal – to ID sources and strategies to control errors • Step-by-step approach to task // process in educational measurement
Timeline • Introduction to measurement 20 min • Development of criteria 20 min • Develop scale 10 min • Train raters 10 min • Sample and rate chocolates 20 min • Identify sources of error 10 min
Underlying Assumption of Performance Based Measures • An individuals observed performance/score is a combination of: • True Score +Errors of Measurement • Random • Controllable • All measurement seeks to control errors so that the measured score = true score Does OBSERVED score = TRUE score?
Familiar EBM terminology: • Validity: • Relevance: does the measure actually reflect the variable of interest? • Appropriateness: relevant to purpose of the study • Meaningfulness: measure reflects variable of interest • Usefulness: aids decision-making • Accuracy: is the measurement free from error? • Random error: who takes the test • Systematic error: bias
Principles of MeasurementCommon Types of Validity Evidence 1. Content-related evidence Looks like a duck, sounds like a duck = duck • Appropriateness, logically get at intended performance • Expert review (Face validity) • Representative sample from content domain - objectives
Principles of Measurement3 Types of Validity Evidence 2. Criterion-related evidence Sugar content of the grapes = best wine • Relationship between this and others • Predictive - MCAT and medical school • Concurrent – comparison to a gold standard • Often Expressed as: • Correlation coefficients for continuous variables • Cross-tabulations for dichotomous variables
Principles of Measurement3 Types of Validity Evidence 3. Construct-related evidence • Psychological construct or characteristic where gold standard does not exist • Ex Medicine: Self efficacy → medication adherence • Ex. Educ: Body/Kinesthetic IQ → Suturing Skills • Test relationship between performance and a theoretical model • Multiple Regression, Correlation, Factor analysis
Validity: Accuracy and ReliabilityExamples of Error: Random Error Things that you can not control • Subject Variability • Snow storm delays Controllable error Things you CAN Control • Instrument Variability • Observer Variability • (Intra- vs. Inter-observer) • Halos/ Horns
Principles of Measurement:Reliability • Defined: consistency of the scores obtained • at one time • over time Which color is: • Not reliable • Reliable but not Valid • Reliable and Valid?
Subject Motivation, energy, anxiety Location Maturation History Regression (high/low) Rater Characteristics age, gender, ethnicity biases halo/horns fatigue Instrument Scales Normative Criterion Length Inadequate instructions Poor formatting Illogical order Vague terminology Reponses that fail to fit question/ scale Items favor one group over another Controlling Error in Performance Measures
KEY: Always Think4 Common Categories of ERRORS • Instrument • Raters • Design/Administration • Subjects
Strategies to Control Errors • Standardize Conditions (location, instrument, attitude) • Clear specific descriptors of desired behaviors • Trained raters/ Proctors • How and under what conditions data is collected • Obtain more information on subjects • relevant characteristics (e.g. do they like chocolate)? • Obtain more information on details • location, instrumentation, history, subject attitude • Appropriate Design
AND NOW . . . • You have been selected to create the measurement tool with which to judge . . . WISCONSIN’s BEST CHOCOLATE
Review Your TasksChocolate Judge • List criteria indicative of “best chocolate” • Develop Likert-scale rating item, with descriptive anchors for each criterion • Train other raters to use your item • Pilot all items on samples of chocolate • Identify potential sources of error • Examine reliability of ratings • Which errors contributed – could be controlled?
A few words about scales . . . • Summated rating scale (Likert): • Length of scale = accuracy with which raters can make decisions • Assumes equal intervals between decision points (Ratio scale) • Gap between Excellent and good = gap between good and satisfactory (Ratio) • So need to provide scale anchors to inform raters (control error) of your intent
A few words about scale assumptions. . . • Normative: Compared/relative to other learners • Standard score or mean score (USMLE Steps) • Compared to other chocolates in the group its “average”; “above average” • Criterion-based: Compared to a “gold standard”; or minimum threshold that is pre-established • 80% correct on examination • Godiva? Smooth like silk or granular like sand
Step 5: Sources of Error True Score ≠ Observed Score • Turn rating sheets in now • Note on the Sources of Error Worksheet any factors that would affect reliability (consistency) of your ratings • Goal: to identify variance due to true variability (in chocolate) • + Variance due to raters + variance due to instrumentation + variance due to administration + etc.
Potential Sources of Error • . • . • . • . • . • . • . • . Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)
Step 6: Review of Scores • Within individual variance • all “high” • Between individuals • Halo/Horns Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)
Measurement = Chocolate • Validity: Criterion measure essence of Wisconsin’s “best” chocolate? • Reliability: control errors due to • Instrumentation • sample #’s confused; clarity of criterion; • Raters • competent to judge; biases • Administration • allow drink soda; eat in any order, time, directions, location, standardization Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)
Chocolates • # 1: Confections Solid Milk Chocolate • 1997 Wisconsin State Farm Seal of Excellence • #2: Hershey’s Extra Dark 60% Cocoa • #3: Regal Dynasty Milk • #4: Hershey’s Cookies ‘n’ Cream • #5: Dove Dark Hearts • #6: Palmer Milk Hearts • #7: Nestle’s Milk hearts
Do Your Sources of Errors Explain Ratings? • Raters • Did they chat amongst themselves • Contamination? Drink diet coke, coffee? • Rating strategy: • Taste them all then rate? • Rate each independently? • Recognize brands? Bias? • Rater Fatigue? • Instrument • Scale descriptors clear? • All variables considered in scale development (e.g. cookie pieces? // Non-traditional students/ learners)
Beyond ChocolateApply Principles to Assessment of Learner Performance • What are common errors / problems in learner assessment? Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)
Beyond ChocolateApply Principles to Assessment of Learner Performance • Rating Forms for Resident/Student Performance • Criterion (Behavioral Anchors) vs Normative • What content/dimensions • Scales • OSCE’s (added variability – why?) • What else need to control? • Why OSVE? Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)
SUMMARY: Part I True Score + Controllable Errors (Sources of Error) + Random Error Observed Score (Faculty rating, MCQ) • Sources of error: • Instrument (Valid dimensions, clear directions, pilot and revise) • Administration/Design (Standardize) • Raters (Train) Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)
SUMMARY: Part II • Think, think, think a-bout • Errors of mea-sure-ment • Rater Bias • Va-lid-it-ty Too • Every time you test ROW ROW ROW YOUR BOAT
Follow-up: Chocolates • Seven different chocolates were evaluated on seven different indicators during the 2006 Chocolate Survey.
Results: Descriptive Stats *Scale: 7=Lowest Rating 1=Highest Rating
Inter-rater reliability • Kendall’s Concordance varies on a scale of 0 = no agreement 1 = perfect agreement • Hope for concordance > 0.7