460 likes | 727 Views
Introduction to Test Development. Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program. Learning Objectives. Understand the pros and cons to various testing questions for written examinations Learn how to determine Item difficulty and Item discrimination
E N D
Introduction to Test Development Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods Program
Learning Objectives • Understand the pros and cons to various testing questions for written examinations • Learn how to determine • Item difficulty and • Item discrimination • Understand the psychometrics of a high stakes test • Validity • Reliability • Standard Setting
Come to our Workshop! • Work in small groups to… • Review problematic multiple choice items • Establish validity and reliability for a test • Participate in standard setting exercise
Question Types – Pros and Cons • Essay Items • Short Answer and Completion Items • Matching Items • True-False and Multiple-Choice Tests • Interviews • Portfolios ….all can be scored and can be subject to test development
Stem Lead in Responses Correct response Distractors Multiple-Choice Items • An 85-year-old woman has difficulty raising her arms above her head and combing her hair. She has morning aches in her shoulders and neck. Her reflexes are symmetrical and normal. There is no muscle tenderness or joint swelling. Which one of following laboratory tests should be obtained to confirm the most likely diagnosis? • A. Anti-nuclear antibody. • B. Erythrocyte sedimentation rate. • C. Serum concentration of creatine kinase. • D. Serum concentration of angiotensin-converting enzyme. • E. Urine microscopy.
Tips for writing discriminant MCQs • Be sure that each item reflects a clearly defined learning outcome • Stem • The stem of the item should be self-contained and written in clear and precise language. • Avoid ‘trigger’ words (e.g. pin-rolling tremor) • Negatives, excepts, absolutes and qualifiers in question stems are no-no’s. • Responses • All answers should be plausible and homogenous • Items need to be independent of one another • Answer choices should be similar in length and grammatical form • List answer choices in alphabetical or numerical order • Avoid ‘all of the above’ as a response • Avoid technical flaws (tense or plurality for example)
Pros Useful for measuring learning outcomes at almost any level Easy to understand Easy to score Easily analyzed for effectiveness Allow broad coverage efficiently Cons Good questions Take a long time to write Are difficult to write Constrain creative responses from learners May have more than one correct answer Pros and Cons of MCQ’s
Item Analysis • Qualitative: looks at whether the content matches the information, attitude, characteristic or behavior being assessed • Quantitative: • Item difficulty • Item discrimination
Determining item difficulty • The percentage of participants who get that item correct • Item difficulty scores can range from 0 to 100% • Low value = high difficulty • High value = low difficulty 0 10 20 30 40 50 60 70 80 90 100
Discrimination Index • The Discrimination Index distinguishes for each item between the performance of students who did well on the exam and students who did poorly. • Index of discrimination: • The difference in the % of people in one extreme group minus the % of people in the other extreme group • Item discrimination scores can range from -1.00 to +1.00 • Example • 100 test takers: 20 in top 25 were correct but only 5 in the lowest 25 students were correct. • DI = (20-5)/25 = 0.8
Item Analysis Report • The left half shows percentages, the right half counts. • The correct option is indicated in parentheses. • Point Biserial is similar to the discrimination index, but is not based on fixed upper and lower groups. For each item, it compares the mean score of students who chose the correct answer to the mean score of students who chose the wrong answer. Order ID and group number percentages counts
Test Validity • Validity: • The extent to which inferences made from a test are appropriate, meaningful, or useful. • Does my test measure what it is intended to measure? • Content validity • Expert review • Criterion validity – Predictive/Concurrent • Scores can be related to another known metric • Construct validity • Successfully differentiates between levels of learners
Kissing Cousins • A test can not be valid until it is reliable:
Test Reliability • Reliability: Measure the underlying construct consistently = trustworthiness/stability • Test-Retest Reliability • Alternate forms reliability • Internal consistency reliability (cronbach’s alpha) • Inter-rater reliability
How do I set a passing grade? • Standard Setting • Norm referenced: Z-scores • Number of standard deviations below the mean • Criterion Referenced: Angoff Method • Panel of experts are asked to evaluate each item and estimate the number fraction of minimally competent students who would answer each item correctly • Ratings are averaged across the experts for each item, discussed and then summed to get panel raw cutscore
Welcome to Our Workshop on Test Development! Graham McMahon, MD, MMSc. Sarah E. Peyre, EdD Educational Research Methods The Academy at Harvard Medical School
Outline • Learning Objectives • Creating MCQ Items • Item Template • Item Flaws • Tips for Success • Establishing Validity and Reliability for a Test • Mock Standard Setting
Item Creation Learning Activities Objectives Evaluation • Consider beginning with the end in mind • What is it that you think the medical student should demonstrate that he/she knows or knows how to do? • This should be an objective from your lesson plan.
Item Stems: Clinical Vignettes • Things to consider: • Patient description (46-year-old-female) • Functional disability (difficulty rising from a seated position, but has no difficulty flexing her legs) • The question based on this item template: • A 46-year-old-female has difficulty rising from a seated position, but has no difficulty flexing her legs. Which of the following muscles has been injured? [Objective: Identify and explain the function of the muscles in the…. ]
Item Creation Lead-in:The most likely diagnosis is Options: disorders, diseases Objective: Describe the signs and symptoms of X. Compare and contrast the signs and symptoms of XY and Z. Lead-in: Which of the following additional symptoms would you expect to be present? Options: symptoms Objective: same as above Lead-in: The most likely cause is Options: bacteria, toxins, medications, metabolic defects Objective: List and explain the causes of X. Lead-in: The most likely mechanism is Options: disease mechanisms, pharmacologic mechanisms Objective: Diagram and explain the mechanism of drug X.
Item Templates • Other considerations: • Age, gender, race, ethnicity • Site of care (ER, office visit) • Presenting complaint • presents for a routine physical exam • presents with a headache • Duration • Patient history, family history • There is no history of… • He has a history of… • Physical findings • Lab values, imaging studies, pathology reports • Treatment, subsequent findings
Item Creation • Add the lead-in (question) and the options • Which of the following pulmonary variables is most likely to be lower than normal in this patient? A. Alveolar-arterial PO2 difference B. Compliance of the lung C. Oncotic pressure of the alveolar fluid D. Work of breathing E. Residual volume
Item Creation: Taking Recall up to Another Level Recall question: What area is supplied with blood by the posterior inferior cerebral artery? [Objective: Identify the areas of the brain supplied by the major cerebral arteries.]
Item Creation: Taking Recall up to Another Level Application question: A 62-year-old man develops left-sided limb ataxia, Horner’s syndrome, nystagmus and loss of facial pain and temperature. Which artery is most likely to be occluded? [Objective: Differentiate the signs and symptoms that would occur upon occlusion of each of the major cerebral arteries.]
Your Turn!Review the distributed questions and identify strengths and weaknesses in each.
Question • Acute intermittent porphyria is the result of a defect in the biosynthetic pathway for • A. collagen • B. corticosteroid • C. fatty acid • D. glucose • E. heme
Rewritten…. • An otherwise healthy 33-year-old male has mild weakness and occasional episodes of steady, severe abdominal pain with some cramping but no diarrhea. One aunt and a cousin have had similar episodes. During an episode, his abdomen is distended, and bowel sounds are decreased. Neurological examination shows mild weakness in the upper arms. These findings suggest a defect in the biosynthetic pathway for: • A. collagen • B. corticosteroid • C. fatty acid • D. glucose • E. heme
Question A 52-year-old male presents to the office with a one-week history of flank pain and hematuria. Past medical history is unremarkable. Physical examination reveals a left-sided abdominal mass. The greatest risk factor for renal cell carcinoma is A. diabetes B. female gender C. hyperlipidemia D. low body mass index E. smoking
Question Which of the following is a correct statement about cystic fibrosis (CF)? A. The incidence of CF is 1:2000. B. Children with CF usually die in their teens. C. Males with CF are sterile. D. CF is an autosomal recessive disease. E. Symptoms of CF only appear in infancy. What other flaws can you detect in this question?
Item Flaws: Unfocused items Which of the following is correct regarding [topic]? There is not enough information in the stem to answer the question without looking at the options. The responses are disparate. The distractors have to be 100% false. Thus, the question basically becomes a true/false question. Avoid these!
A 45-year-old man comes to the physician because of a 6 week history of a non-productive cough. An X-ray film of the chest shows a 0.8 cm well circumscribed peripheral nodule in the right lung. Biopsy shows a necrotizing granuloma. Which of the following is the most likely diagnosis? Pulmonary embolus Small cell carcinoma Pseudomonas aeruginosa infection Histoplasma capsulatum Herpes pneumonitis Metastatic renal cell carcinoma
A healthy 57-year-old woman comes to the physician because of 2 cm mass in her right breast. Biopsy reveals an invasive ductal carcinoma. Which of the following is the most important prognostic factor? High grade tumor cytology Infiltrative nature of tumor into benign breast Numerous mitotic figures Amount of tumor fibrosis Presence of Lymph node metastasis Number of plasma cells in tumor
A 63-year-old man comes to the physician because of a 6-week history of progressive dyspnea on exertion, orthopnea, and ankle edema. He has received multiagent chemotherapy for Waldenström’s macroglobulinemia for the past year. Urinalysis shows proteinuria. A bone marrow biopsy shows a partial response to therapy with ongoing marrow involvement still identified. Which of the following is the most likely diagnosis? Cardiac amyloidosis Viral myocarditis Cardiac sarcoidosis Myocardial infarct Hypertrophic cardiomyopathy
A question submitted In aortic stenosis what other abnormal heart sounds might accompany the resulting murmur? • Physiological splitting of S2 • An accentuated S2 • Paradoxical splitting of S2 • A muffled S2
Revised question A 60 year old patient with an active lifestyle is found to have a systolic murmur on a routine physical exam. He currently has no symptoms. If this were aortic stenosis, what other abnormal heart sounds might accompany the systolic murmur? A.) Physiological splitting of S2 B.) An accentuated S2 C.) Paradoxical splitting of S2 D.) A muffled S2
Determining item difficulty The percentage of participants who get that item correct Item difficulty scores can range from 0 to 100% Low value = high difficulty High value = low difficulty 0 10 20 30 40 50 60 70 80 90 100
Discrimination Index Index of discrimination: The difference in the % of people in one extreme group minus the % of people in the other extreme group Item discrimination scores can range from -1.00 to +1.00 Example 100 test takers: 20 in top 25 were correct but only 5 in the lowest 25 students were correct. DI = (20-5)/25 = 0.8 • The Discrimination Index distinguishes for each item between the performance of students who did well on the exam and students who did poorly.
Item Analysis Report • The left half shows percentages, the right half counts. • The correct option is indicated in parentheses. • Point Biserial is similar to the discrimination index, but is not based on fixed upper and lower groups. For each item, it compares the mean score of students who chose the correct answer to the mean score of students who chose the wrong answer. Order ID and group number percentages counts
Summary • Utilize action verbs to write objectives • Write your exam items based on the objectives • Tie the clinical vignette to the lead-in • Choose appropriate options with one best answer • Avoid technical flaws • Utilize an item checklist to ensure that you have done all you can to write the best items possible. • Pretest your items
Standard Setting (Groups)
Graham McMahon gmcmahon@partners.org
Item Discrimination: Examples 0.7 0.1 1 0 0 -0.4 Number of students per group = 100
Distracter Analysis: Examples (*) marks the correct answer.