200 likes | 491 Views
CREATING EFFECTIVE QUESTIONS FOR ASSESSMENT AND AS AIDS IN LEARNING IN TODAY'S PHARMACOLOGY PROGRAMS. IS THERE MORE TO TESTING THAN WRITING QUESTIONS?. George A. Dunaway, Ph.D. Emeritus Professor Department of Pharmacology Southern Illinois University School of Medicine Springfield, IL.
E N D
CREATING EFFECTIVE QUESTIONS FOR ASSESSMENT AND AS AIDS IN LEARNING IN TODAY'S PHARMACOLOGY PROGRAMS IS THERE MORE TO TESTING THAN WRITING QUESTIONS? George A. Dunaway, Ph.D. Emeritus Professor Department of Pharmacology Southern Illinois University School of Medicine Springfield, IL Experimental Biology Meetings April 10, 2011 Washington, DC
MAKING THE MOST OF AN ITEM ANALYSIS • Item analysis information can be used to make important decisions for high-risk examinations. • For present examination • Validity of each question to decide to retain or omit exam questions • Whole test validityto makepass/fail decisions • For subsequent examinations, it can provide insights into improvement of questions. Experimental Biology Meetings April 10, 2011 Washington, DC
ITEM ANALYSIS INFORMATON • Test Information • Date, number of examinees and test items • High and low scores, median and mean scores, SEM and SD, • Test reliability, e.g., Cronbach’s alpha, which can be interpreted as the mean of all possible split-half coefficients. • Ranking of individual examination scores • Question information • Difficulty, i.e., % answering correct or “p” value • Discrimination power of each question, e.g., biserial or point biserial (rbp) • Frequency of selection of each option • Test performance of group selecting each option, correctly or incorrectly. Experimental Biology Meetings April 10, 2011 Washington, DC
MAKING SENSE OF THE POINT BISERIAL CORRELATION COEFICIENT rpb = (Yc - Yt)/S [Nc /(Nt - Nc) Nt/(Nt- 1)]1/2 rpb = discrimination power of a question, i.e., how well does the ranking of students on each question correlate with their ranking using their test average. Yc= mean test score of students answering question correctly Yt= mean test score of all students S= standard deviation of test mean Nc= number answering question correctly Nt= total answering question • If Yc > Yt a positive correlation apparently exists between whole test population and question population. • The breadth of the S has an inverse effect on the rpb. • Population variations are weighted using the last term. • The magnitude of rpb suggests the extent of correlation of student scores on question and test.
SETTING STANDARDS FOR QUESTIONS • Average (p value) on question is similar to test average. • Majority of students (~70%) chose correct answer. • All responses have been selected, i.e., no options were easily eliminated by guessing. • Performance of individuals on question correlates with their performance on whole test. • Realistically, we often have to settle for less. Experimental Biology Meetings April 10, 2011 Washington, DC
PRACTICAL USE OF ITEM ANALYSIS TO EVALUATE TEST QUESTIONS • Consider MCQ’s used in a high risk testing environment. • Using item analysis information, to consider two aspects • Question suitability for current high risk examination. • Deciding potential modifications for use on a later exam. • Question items that are particularly useful include p value, rpb, frequency of selection of each option, and student test scores choosing each option. • For each question, determine the following: • What is the p value? • How well did the rpb for correct answer population correlate with whole test population? • For each question how well does the population selecting the incorrect options correlate with whole test? • What is the frequency of selecting each option? Experimental Biology Meetings April 10, 2011 Washington, DC
GENERALITIES USING ITEM ANALYSIS INFORMATION • If p value, question is likely too easy and testing only memorization. • If p value, question is likely confusing, poorly written, or testing obscure information not in learning issues. • Action step: For high or for low p values, (1) revise and retain tested concept, (2) discard question, and/or (3) improve learning resources. • If rpb for correct answer population has a poor correlation with whole test population • Action step: Consider possible keying error or poorly written question needing to be edited. • If one or more of the incorrect options are too poorly or highly selected. • Action step: Consider revisions that exploit predictable misconceptions. Experimental Biology Meetings April 10, 2011 Washington, DC
ITEM ANALYSIS INFORMATION FROM RECENT EXAMINATION (2010) • Student test scores were segregated into deciles, which yielded a “pseudo-normal” distribution. • Mean: 74.0% (Median score: 73.7%) • Std. Dev: 6% • High score: 92.2% • Low score: 55.7% • Test Reliability: 0.88 • Outcomes • Pass: Test average must at least = 1 SD below mean, i.e., 68% • Concern*: Test average between 1 and 2 SD below mean, i.e., 62% < score < 68% • Failure*: Score ≤ 62% • *Unit-specific remediation(s) required at end of term Experimental Biology Meetings April 10, 2011 Washington, DC
EXAMPLE 1 • p value: 0.26 • A: 18% selected Test score: 67.1% rpb: -0.109 • B: 19% selected Test score: 68.6% rpb: -0.118 • C: 20% selected Test score: 72.8% rpb: -0.008 • D: 17% selected Test score: 65.8% rpb: -0.088 • E: 26% selected Test score: 81.4% rpb: +0.339 What are the primary concerns for this question? • Are there concerns with p value for question? • What is suggested by the rpb for correct answer population? • Was average test performance of those selecting incorrect options consistent with their test performance? • What was the distribution of selection of the test question options? • Keep (test-worthy) or discard (not suitable for current test)? • For later use, what potential question-specific modifications would you consider? Experimental Biology Meetings April 10, 2011 Washington, DC
EXAMPLE 2 • p value: 0.77 • A: 02% selected Test score: 65.6% rpb: -0.228 • B:77% selected Test score: 77.4% rpb: +0.182 • C: 05% selected Test score: 73.5% rpb: -0.045 • D: 08% selected Test score: 67.8% rpb: -0.213 • E:08% selected Test score: 71.1% rpb: -0.159 What are the primary concerns for this question? • Are there concerns with p value for question? • How well does rpb for correct answer population correlate with whole test population? • Was average test performance of those selecting incorrect options consistent with test performance? • Was there adequate selection of test question options? • Keep (test-worthy) or discard (not suitable for current test)? • For later use, what potential question-specific modifications would you consider? Experimental Biology Meetings April 10, 2011 Washington, DC
EXAMPLE 3 • p value: 0.18 • A:18% selected Test score: 74.0% rpb: -0.066 • B: 06% selected Test score: 62.6% rpb: -0.452 • C: 06% selected Test score: 74.4% rpb: -0.019 • D: 12% selected Test score: 72.9% rpb: -0.109 • E:58% selected Test score: 77.0% rpb: +0.351 What are the primary concerns for this question? • Are there concerns with p value for question? • How well does rpb for correct answer population correlate with whole test population? • Was average test performance of those selecting incorrect options consistent with test performance? • Was there adequate selection of test question options? • Keep (test-worthy) or discard (not suitable for current test)? • For later use, what potential question-specific modifications would you consider? Experimental Biology Meetings April 10, 2011 Washington, DC
EXAMPLE 4 • p value: 0.06 • A: 45% selected Test score: 76.5% rpb: +0.209 • B: 05% selected Test score: 66.5% rpb: -0.267 • C: 06% selected Test score: 71.3% rpb: -0.136 • D: 30% selected Test score: 75.4% rpb: +0.101 • E: 06% selected Test score: 70.3% rpb: -0.171 What are the primary concerns for this question? • Are there concerns with p value for question? • How well does rpb for correct answer population correlate with whole test population? • Was average test performance of those selecting incorrect options consistent with test performance? • Was there adequate selection of test question options? • Keep (test-worthy) or discard (not suitable for current test)? • For later use, what potential question-specific modifications would you consider? Experimental Biology Meetings April 10, 2011 Washington, DC
APPENDIX Experimental Biology Meetings April 10, 2011 Washington, DC
GOOD TEST RESULTS ARE GENERATED BY BOTH GOOD TEACHING AND QUESTIONS • Give the students a reasonable expectation of test material • That is, a reasonable set of objectives or expected outcomes and references for attaining the information. • Use a question format, which tests information to be learned by assessing skills, facts, knowledge in a context that it will be used. • That is, when the student applies it as a professional. • Common mistakes reducing test effectiveness • Lack of reasonable or predictable association of expectations and tested material • Questions that do not require adequate understanding of tested material • Poor syntax and grammatical skills makes expectations difficult to predict leading to poor responses. Experimental Biology Meetings April 10, 2011 Washington, DC
COMMON (AVOIDABLE) MISTAKES LEADING TO UNRELIABLE MCQ ASSESSMENTS • Question construction gives clues to correct answer or allows elimination of incorrect answers • Heterogeneous or nonparallel content choices • Series of True/False options with no particular relevance to stem • Use of “all of the above are correct” or “none of the above are correct” as answer choices. Experimental Biology Meetings April 10, 2011 Washington, DC
COMPOSING AND ASSESSING ESSAY QUESTIONS (EQS) • EQs require significant time and effort to compose • EQs can be difficult to effectively and subjectively grade. • The level of knowledge that can be assessed by EQs is somewhat different from other types of questions. • That is, a well-designed EQ can assess conceptual clarity, organizational skills, and problem solving skillss. • Further, insight can be gained by the teacher into their teaching and curriculum design effectiveness. • The benefit to the student is that this type of problem-solving environment simulates science career experiences. • EQ assessment permits the student and teacher insight into basic knowledge and the ability to use it with their existing knowledge base to solve problems. • Another advantage is that EQs can be stimulating and exciting for graduate students, which could reduce test anxiety. • With feedback, the student can use this experience to identify the status and accessibility of their knowledge, and recognition of the need for improvement. Experimental Biology Meetings April 10, 2011 Washington, DC
ELEMENTS OF EQ CONSTRUCTION • Examining learning objectives to decide information is to be tested. • Incorporate into an EQ as many concepts from learning objectives as are practical to minimize the probes needed for their assessment. • After deciding on concepts to evaluate knowledge for each EQ, • Determine the extent of knowledge to be required. • Consider what background knowledge the student should know and what you will provide. • Create a “scenario” that poses a problem that requires recall and use of the information (new and existing) to be tested. • To provide an ability to evaluate the student’s response, the conundrum can have multiple imbedded problems and distracters of varying levels of conceptual difficulty. • The goal is to measure the student’s ability to use effectively their new knowledge in concert with an existing knowledge base. • Provocative EQs present situations that have not been previously discussed, are likely to unfamiliar, but can be analyzed using their knowledge. • After reading the goal of the EQ is to provoke: • Recall all acquired knowledge relative to the problem, • Assembly of the information • Coherently integration all knowledge (previous and newly acquired) • Composition of a cogent response Experimental Biology Meetings April 10, 2011 Washington, DC
ELEMENTS OF EQ CONSTRUCTION • A critical aspect of EQ construction is minimizing unintended distractions and ensuring an understanding of the breadth and depth of expected response. • Students should understand clearly, what problem(s) must be considered. • Remember that unlike MCQ, test takers they do not have response options to cue them into what is being asked. • Ambiguous questions often mislead informed students. • Expectations should be consistent with how the graduate student will need to use the information to solve career-associated problems. • Instead of fact recall, questions should probe concepts primarily or secondarily associated with research or work related experiences. • Ambiguities can also be reduced by avoiding the use words with multiple usages that could be confusing, using appropriate grammar, spelling, and punctuation errors, • Provide clear instruction concerning how the question SHOULD and SHOULD NOT be answered. • For example, indicate that responses to be graded should be complete sentences and that outlines will not substitute for answers. • Adding time expectations to answer each question is instructional. • Ask others to read your question for clarity. • Ask the reader if they can tell you what you are asking of the student. • Reviewers will not need to know the correct answer to provide feedback. Experimental Biology Meetings April 10, 2011 Washington, DC
EQ ASSESSMENT • Prior to grading, construct for each question an outline listing all of the components that you expect for a perfect score. • Assign relative point values for each component to obtain a score. • Use outline to explain to student what your expected. • Use anonymity until final grading decisions are made. • Prior to assessment, make sets of identical questions • Referring to the question outline, evaluate all responses before proceeding to grading the next question. • Using grading outline for each question has many advantages. • It is easier to be consistent and fair • It facilitates consistent discussion of grading standards. • It minimizes more subtle forms of unrecognized bias. • Depending on EQ risk (e.g., course exam or Ph.D. exam) a percent grade or pass/fail recommendation can be chosen. • For example, if a percentage grade is not needed, student performance could be categorized as (1) meeting, (2) exceeding, or (3) beneath passing standards. Experimental Biology Meetings April 10, 2011 Washington, DC