1 / 26

Multiple Choice Test Item Analysis

Multiple Choice Test Item Analysis. Facilitator: Sophia Scott. Workshop Format. What is Multiple Choice Test Item Analysis? Background information Fundamentals Guided Practice Individual Practice. What is Multiple Choice Test Item Analysis?.

Download Presentation

Multiple Choice Test Item Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Choice Test Item Analysis Facilitator: Sophia Scott

  2. Workshop Format • What is Multiple Choice Test Item Analysis? • Background information • Fundamentals • Guided Practice • Individual Practice

  3. What is Multiple Choice Test Item Analysis? Statistically analyzing your multiple choice test items so that you can ensure that your items are effectively evaluating student learning.

  4. Background information • What does a test score mean? • Reliability and Validity • Norm-referenced or Criterion-referenced

  5. What does a Test Score Mean? • A score that is a reflection of what you really knew (true score) and error (things like atmosphere, nerves etc that modify your true score). • The purpose of a systematic approach to test design is to reduce error in test taking.

  6. Reliability and Validity • Reliability – the test scores are consistent • Test-retest reliability (measure of an individual score is consistent over time) • Inter-rater reliability (consistency of individual judges’ ratings of a performance) • Validity – the test measured what it was suppose to measure. You want your test to be both reliable and valid

  7. Norm-referenced or Criterion-referenced • Norm-referenced – defines the performance of test-takers in relation to one another. Use the frequency distribution and can rank students. Often used to predict success like GRE or GMAT. • Criterion-referenced – defines the performance of each test taker without regard to the performance of others. The success is being able to perform a specific task or set of competencies. Uses a mastery curve.

  8. Item analysis How you interpret the results of a test and use individual item statistics to improve the quality of a test Terms used • Standard deviation – range above and below the average score, the more the scores are spread out the high the SD • Mean – average score • N – number of items on the test • Raw scores – actual scores • Variance = standard deviation squared

  9. Fundamentals of Item Analysis • Were any of the items too difficult or easy? • Do the items discriminate between those students who really knew the material from those that did not? • What is the reliability of the exam?

  10. 1. Were any of the items too difficult or too easy? • Use the Difficulty Factor of a question • Proportion of respondents selecting the right answer to that item D = c / n D = difficulty factor c = number of correct answers n = number of respondents • Range 0 -1 • The HIGHER the difficulty factor – the easier the question is, so a value of 1 would mean all the students got the question correct and it may be too easy

  11. Difficulty Factor • Optimal Level is .5 • To be able to discriminate between different levels of achievement, the difficulty factor should be between .3 and .7 • If you want the students to master the topic area, high difficulty values should be expected. D = c / n

  12. Guided Practice What is the D for Items 1-3

  13. Difficulty Factor • Item # 1 = .8 • Item # 2 = .6 • Item # 3 = .4 What does it mean? • Item # 1 = .8 may be too easy • Item # 2 = .6 good • Item # 3 = .4 good

  14. Individual Practice What is the D for Items 4-5

  15. Difficulty Factor • Item # 4 = .5 • Item # 5 = .6 What does it mean? • Item # 4 = .5 optimal • Item # 5 = .6 good Overall, you can say that only item #1 may be too easy

  16. 2. Do the items discriminate between those students who really knew the material from those that did not? • The Discrimination Index • DI = (a-b) / n • a=response frequency of the High group • b=response frequency of the Low group • n-number of respondents • Point- Biserial Correlation

  17. 2. Do the items discriminate between those students who really knew the material from those that did not? • Correlates the test-takers performance on a single test item with their total score. • Range +1.00 to -1.00 • Items which discriminate well are those which have difficulties between .3 and .7

  18. 2. Do the items discriminate between those students who really knew the material from those that did not? • Positive coefficient means that test-taker who got the item right generally did well on the test as a whole, while those who did poorly on the item did poorly on the test. • Negative coefficient means that the test-taker who did well on the test missed the item, while those who did poorly got the item right. • Zero coefficient means that all test-takers got the item correct or incorrect.

  19. 2. Do the items discriminate between those students who really knew the material from those that did not? The Discrimination Index Steps • Rank test scores from highest to lowest, so the highest is at the top of the list • Define high group (top 27%) • Define low group (bottom 27%) • Calculate DI= a-b / n

  20. What does it mean? Point Biserial • Item # 1 = .48 • Item # 2 = .43 • Item # 3 = .47 • Item # 4 = .62 • Item # 5 = .83 Item 5 is close to not discriminating Overall the test does discriminate

  21. 3. What is the reliability of the exam • Kuder- Richardson 20 • Kuder-Richardson 21 • Cronbach alpha

  22. 3. What is the reliability of the exam • Range 0-1 • Higher value indicates a strong relationship between items and test • Lower value indicates a weaker relationship between test item and test r = n / n-1[s2 + Σp1q1 / s2 ] n = number of items on test s= standard deviation p1= proportion of correct responses q1= 1-p1

  23. What does it mean? Kuder 20 • Item # 1 = .88 • Item # 2 = .63 • Item # 3 = .40 • Item # 4 = .76 • Item # 5 = .89 Item 3 may not relate as well Overall the test is reliable

  24. Review Purpose - statistically analyze multiple choice test items to ensure items are effectively evaluating student learning. • Were any of the items too difficult or easy? (Difficulty index) • Do the items discriminate between those students who really knew the material from those that did not? (Discrimination index or Point Biserial) • What is the reliability of the exam? (Kuder 20)

  25. More Practice…

  26. Thank you for your Time Any Questions or Comments?

More Related