1 / 21

Understanding Test Development: Classical Test Theory and Item Response Theory

Explore the stages of test development, including test conceptualization, construction, tryout, and item analysis. Learn about Classical Test Theory and Item Response Theory, their advantages, disadvantages, and application in assessing student abilities. Understand the importance of item calibration and ability estimation in creating effective tests.

jrichins
Download Presentation

Understanding Test Development: Classical Test Theory and Item Response Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8 Test Development

  2. Test Development: 5 Stages • Test conceptualization • Test construction • Test tryout • When all items are new, the test is called beta test. • When you insert some new items into the existing test, those items are called field test items. • Item analysis • Test revision

  3. Classical True Score Theory • Began in 1920’s with Classical Test Theory – True Score theory • Focus on Observed Score=True Score + Error • The theory assumes that traits are constant and the variation in observed scores are caused by random errors. • These random errors over many repeated measurements are expected to cancel out each other. In the long run, the expected mean of measurement errors should be zero. • Disadvantage of CTT • Student scores are item-dependent: Estimation of ability based only on # correct answers

  4. Introduction to IRT • Item response theory models the relationship between characteristics of items (item parameters) and characteristics of individuals (latent traits) to estimate the probability of a correct response. • Advantages of item response theory • Improved precision of measurement • Enables persons to be measured using different sets of items: Adaptive testing.

  5. Introduction to IRT • Disadvantage: When there is no variance, nothing you can do • When a student answered all items correctly (100%), IRT cannot estimate his/her ability • When an item is too easy (100% students can score it) or too difficult (0%), IRT cannot estimate its psychometric attributes.

  6. A teacher wants to give her students a math test to assess their skill level at the beginning of the school year. • She uses Item Response Theory (IRT) to determine different characteristics of each question. • HARD questions • AVERAGE questions • EASY questions

  7. Item Calibration and Ability Estimation • Is there no problem ? We cannot judge a student's ability based solely on the number of items answered correctly. Item attributes, such as difficulty level should be taken into account.

  8. Item Calibration and Ability Estimation • The ideal case - Guttman pattern • HARD • AVERAGE • EASY • More proficient • Less proficient

  9. Who is better ? • In this case, we cannot draw a firm conclusion that they have the same level of proficiency because Student 4 answered two easy items correctly, whereas Student 6 answered two hard questions correctly. HARD Q’s • AVERAGE EASY Q’s

  10. Item Characteristic Curve • The item characteristic curve is the basic building block of item response theory; all the other constructs of the theory depend upon this curve. • The shape of the item characteristic curve is related to item difficulty and student proficiency (skill level). Skill Level Probability Item Difficulty • Item Characteristic Curve

  11. One Parameter Item Characteristic Curve • The probability of answering an item correctly • Average • Standardized scale • The student skill levels • Relationship between skill level and probability of correct answer

  12. Difficult Item vs. Easy Item • Item Characteristic Curve • in case of a difficult item • Item Characteristic Curve • in case of an easy item

  13. Misfit (Optional) MS (Mean square) = Chi-square/degree of freedom) Don’t worry about what “out” means now.

  14. Summary • IRT considers student proficiency/skill level AND item difficulty together. • The Item Characteristic Curve (ICC) indicates the probability of answering an item correctly given a particular student's proficiency level. • Problematic items(misfit) can be identified by mean square (chi-square/df). In a mis-behaved item, the response pattern does not correspond to the rest.

  15. Think-aloud protocol Qualitative Item Analysis Also known as mental protocol User interface design: what do users think during the process? Diagnostic tool: What are the learners thinking? What are their misconceptions? Instructional design and assessment design: How do the experts solve the problem? Assessment tool: Does the learner really know what he is doing?

  16. User interface design and ergonomics A method to collect data for testing usability or user interface Introduced by Clayton Lewis at IBM The user verbalize what he or she is thinking while performing the task

  17. Diagnosis • The goal of having the participants think aloud is to reveal what information is kept in short-term memory during the problem solving process. • Short-term memory corresponds the "voice" one is aware of during self-talk. • Because most cognitive theories hold short-term memory is the working space for problem solving and holding information from long-term memory, think aloud protocols have been suggested as an essential data collection device for understanding cognitive processes for diagnosis.

  18. Evidence-centered design Introduced by Robert Mislevy at ETS (now he is at U. of Maryland, College Park) We need to know what the experts know to solve real problems We can interview them or ask them to do a think aloud protocol What evidence should be shown to prove that the examinee has the expertise?

  19. Exercise 1 • Construct to be assessed: skills of post-processing in photography • Method: Think aloud protocol. The best photographer at APU will verbalize the process of enhancing photos in Adobe PhotoShop. The item authors (you) will observe the process and write test items that provide evidence of competence of post-processing. • Item format: Multiple-choice, T/F or short essay

  20. Exercise 2 • Select a task or a software application that you are familiar with • Perform a think aloud protocol; do not lecture. Lecturing is presenting the information to an audience but thinking aloud is talking to yourself while doing the job • Write three test items that provide evidence of mastering the task or the software package. • Item format: Multiple-choice, or short essay (use concept map if possible)

  21. Exercise 2 If you do not want to do a software demo, you can choose a non-computer-related activity e.g. CPR.

More Related