1 / 31

The Design of Statistical Specifications for a Test

Learn how to methodically design target information functions for tests using item response theory. Develop precision in trait measurement and test design.

reginap
Download Presentation

The Design of Statistical Specifications for a Test

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Designof Statistical Specificationsfor a Test Mark D. Reckase Michigan State University

  2. Procedures for Test Design • Test design has been considered to be a subjective, artistic endeavor. • But, with the development of item response theory, test design has become more scientific. • Lord suggested that tests be constructed to match a target information function. • Very sophisticated methods have been developed to select items to match target information functions. • Little work has been done on the design of test information functions.

  3. Purposes for this Paper • Present methodology for designing target information functions or item difficulty distributions for a test. • Demonstrate that methodology for several common testing situations. • Measure all examinees from a normal distribution of the trait to a desired level of precision. • Measure a range of a trait to a desired level of precision.

  4. Basic Concepts • If examinee q is known, optimal test should contain a set of items that provide the required information at that q. • Information from an item covers a range so items that are optimal for one person supply some information for other persons. • General approach is to randomly select persons from target population then select optimal items for that person. • For each additional person, select only the additional items that are needed to reach information target.

  5. Example • Suppose target examinee population is N(0,1) • Randomly select examinee. • Information equivalent to reliability .90 is 10. • Select items until information 10 is reached assuming Rasch model (b = q). • Randomly select additional examinees. • Select items for those examinees until a test length of 50 is reached.

  6. Results -- Comments • Results are from one sample of 6 examinees randomly selected. • 14 items needed for first examinee. • Other examinees need fewer additional items because of overlap of information functions. • Need to consider the effects of sampling variation.

  7. Information from One Item

  8. Results – Selected Items

  9. Results – Information Function

  10. The Complete Process • Create ideal set of items for a sample. • Replicate the process many times (500 seems to work well) • Average information functions from the samples. • Average number of items in .2-unit bins to determine difficult spread. • Check specifications against target.

  11. Conditionsfor Rasch-based Design • N(0,1) trait distribution • 50 item test • Rasch model • 500 replications • Minimum information 10

  12. Average Test Information

  13. Item Difficulty Distribution

  14. Match of Test to Target

  15. Comments • Minimum information requirement met from -2.3 to 2.3. • Information accumulates to higher values in the middle of the distribution. • Difficulty distribution is essentially rectangular. • Test information exceeds the target because item numbers are rounded upward in many cases.

  16. Process Can HelpSelect Test Length • Run process for different test lengths. • Also can consider forcing selection of first examinee at 0.0. • What test length allows criteria to be met?

  17. Effect of Test Length

  18. Results – Test Length • With increase test length, information function widens and increases in height. • Test length of 15 is too short to meet requirements unless it is focused at 0.0. • Forcing first examinee at 0.0 makes information function narrower and more peaked. • 75 items is maximum number of items that makes sense for the criteria specified here.

  19. Test Designed to Measure with Precision over a Range • Brian Junker suggested the following procedure. • Select range • Pick items at extremes of range • Fill in with items between extremes to yield flat information function • Continue until information criterion is reached over entire range

  20. Increment of Informationwith Each Added Item

  21. Target Information Function for Range from -2 to 2

  22. Items that Match Target

  23. Specifications Counter to Traditional Specifications • Most tests have normal distributions of difficulties. • These results seem very odd compared to traditional results. • Need to investigate further. • What is distribution of scores? • What is distribution of p-values?

  24. Number-CorrectScore Distribution

  25. P-value Distribution

  26. Odd Results • Distribution of scores is near normal. • Distribution of p-values mirrors b-parameter distribution. • Extreme item difficulties are .08 and .92. • Surprising that these items yield normal distribution of scores. • Look at test characteristic curve.

  27. Test Characteristic Curve

  28. Test Characteristic Cure • Test characteristic curve is virtually linear from -2 to 2. • When curve is linear, the form of the distribution of qis mapped to theestimated true score scale. • In this case, since the q distribution was normal, so is the number-correct score distribution.

  29. Test Information Function for Test with c = .16

  30. Items that Match Target

  31. Conclusions • A process has been developed for designing target information functions and item difficulty distributions for tests. • The process suggests that either a rectangular or a U-shaped distribution is appropriate if it is desired to measure with equal precision over a range. • The number of items needed is related to the range of the scale that needs to be measured. • The U-shaped item difficulty distribution works best if it is desired to recover the underlying q distribution. • The results are quite different than traditional test development procedures.

More Related