The Design of Statistical Specifications for a Test

The Designof Statistical Specificationsfor a Test Mark D. Reckase Michigan State University

Procedures for Test Design • Test design has been considered to be a subjective, artistic endeavor. • But, with the development of item response theory, test design has become more scientific. • Lord suggested that tests be constructed to match a target information function. • Very sophisticated methods have been developed to select items to match target information functions. • Little work has been done on the design of test information functions.

Purposes for this Paper • Present methodology for designing target information functions or item difficulty distributions for a test. • Demonstrate that methodology for several common testing situations. • Measure all examinees from a normal distribution of the trait to a desired level of precision. • Measure a range of a trait to a desired level of precision.

Basic Concepts • If examinee q is known, optimal test should contain a set of items that provide the required information at that q. • Information from an item covers a range so items that are optimal for one person supply some information for other persons. • General approach is to randomly select persons from target population then select optimal items for that person. • For each additional person, select only the additional items that are needed to reach information target.

Example • Suppose target examinee population is N(0,1) • Randomly select examinee. • Information equivalent to reliability .90 is 10. • Select items until information 10 is reached assuming Rasch model (b = q). • Randomly select additional examinees. • Select items for those examinees until a test length of 50 is reached.

Results -- Comments • Results are from one sample of 6 examinees randomly selected. • 14 items needed for first examinee. • Other examinees need fewer additional items because of overlap of information functions. • Need to consider the effects of sampling variation.

Information from One Item

Results – Selected Items

Results – Information Function

The Complete Process • Create ideal set of items for a sample. • Replicate the process many times (500 seems to work well) • Average information functions from the samples. • Average number of items in .2-unit bins to determine difficult spread. • Check specifications against target.

Conditionsfor Rasch-based Design • N(0,1) trait distribution • 50 item test • Rasch model • 500 replications • Minimum information 10

Average Test Information

Item Difficulty Distribution

Match of Test to Target

Comments • Minimum information requirement met from -2.3 to 2.3. • Information accumulates to higher values in the middle of the distribution. • Difficulty distribution is essentially rectangular. • Test information exceeds the target because item numbers are rounded upward in many cases.

Process Can HelpSelect Test Length • Run process for different test lengths. • Also can consider forcing selection of first examinee at 0.0. • What test length allows criteria to be met?

Effect of Test Length

Results – Test Length • With increase test length, information function widens and increases in height. • Test length of 15 is too short to meet requirements unless it is focused at 0.0. • Forcing first examinee at 0.0 makes information function narrower and more peaked. • 75 items is maximum number of items that makes sense for the criteria specified here.

Test Designed to Measure with Precision over a Range • Brian Junker suggested the following procedure. • Select range • Pick items at extremes of range • Fill in with items between extremes to yield flat information function • Continue until information criterion is reached over entire range

Increment of Informationwith Each Added Item

Target Information Function for Range from -2 to 2

Items that Match Target

Specifications Counter to Traditional Specifications • Most tests have normal distributions of difficulties. • These results seem very odd compared to traditional results. • Need to investigate further. • What is distribution of scores? • What is distribution of p-values?

Number-CorrectScore Distribution

P-value Distribution

Odd Results • Distribution of scores is near normal. • Distribution of p-values mirrors b-parameter distribution. • Extreme item difficulties are .08 and .92. • Surprising that these items yield normal distribution of scores. • Look at test characteristic curve.

Test Characteristic Curve

Test Characteristic Cure • Test characteristic curve is virtually linear from -2 to 2. • When curve is linear, the form of the distribution of qis mapped to theestimated true score scale. • In this case, since the q distribution was normal, so is the number-correct score distribution.

Test Information Function for Test with c = .16

Items that Match Target

Conclusions • A process has been developed for designing target information functions and item difficulty distributions for tests. • The process suggests that either a rectangular or a U-shaped distribution is appropriate if it is desired to measure with equal precision over a range. • The number of items needed is related to the range of the scale that needs to be measured. • The U-shaped item difficulty distribution works best if it is desired to recover the underlying q distribution. • The results are quite different than traditional test development procedures.

The Design of Statistical Specifications for a Test