310 likes | 494 Views
Optimal Test Design With Rule-Based Item Generation. Wim J. van der Linden. Cees A.W. Glas. Hanneke Geerlings. Item generation and automated test assembly. Hierarchical item response theory Second-level for each family of item parameters. Family calibration
E N D
Optimal Test Design With Rule-Based Item Generation Wim J. van der Linden Cees A.W. Glas Hanneke Geerlings
Item generation and automated test assembly • Hierarchical item response theory • Second-level for each family of item parameters. • Family calibration • New item generated from a family does not need to be calibrated • Only family parameters are known, “family- information function” replaces “item-information” • Depend on the degree of within-family item-parameter variability across all families.
Three different cases of item generation • Test assembly from a pool of pregenerated individually calibrated items. • Baseline • Test generation on the fly from pools with calibrated item families. • Test generation on the fly using calibrated radicals that define the item families.
Rule-Based Item Generation • Radicals • Systematic use can help to ensure the content validity of the items • Incidentals • Do not have any systematic effect on item difficulty • Item cloning • Item with identical radicals but different incidentals
The features are radicals or incidentals? • Coding existing items on the presence or absence of the specified radicals and performing • Exploratory • Developers define their radicals and incidentals. Model checking is using. • Confirmatory
Modeling Rule-Based Item Generation • Between family variation – radicals • Within family variation – incidentals • Multilevel response model • Item cloning model (Glas & van der Linden, 2003) • Level 1 • Level 2 • Common Σ : all families are generated by the same set of incidentals, and no interaction between radicals and incidentals
Modeling Rule-Based Item Generation • The mean family difficulty as a wieghted sum of the effects of radicals • Linear Item Cloning Model (LICM) • with Σf : LICM-F • with Σ: LICM-C
Family-Information Funciton • Item-information • Family- information • Expected about θ in the response to a random item from family f.
Three Cases of Automated Test Design • Test Assembly From a Pregenerated Item Pool
Three Cases of Automated Test Design • The Generation on the Fly From Calibrated Families
Three Cases of Automated Test Design • Test Generation on the Fly Using Calibrated Radicals Only
Effect of Within-Family Item-Parameter Variability on Family Information • σ2=0 means all item parameters are equal to their respective family means. • ρ= .0, .5 or -.5
σ2b : decrease in the family information • σ2a : decrease at ability away from optimal difficult value • σ2c : no effect
σ2a =0.05, σ2b =0.5, σ2c =0.00 • ρab=.5 : • θ>μb, Family information increase θ<μb, Family information decrease Because a↑ b↑ information ↑
σ2a =0.00, σ2b =0.5, σ2c =0.2 • ρbc=.5 : small shift to left • Large guessing result in less information
σ2a =0.05, σ2b =0.00, σ2c =0.2 • ρac=.5 : counterbalance effect, a↑ c↑ • ρac=-.5 : higher family information
If σ2 ≠ 0 (ρ≠.0 ), use of item information lead to overestimation information on θ in a random item from the family.
Simulation Study • Illustrate Case 1 and Case 2 • The use of family information instead of item information • Effects of test assembly based on knowledge of item families only.
β=(-2.0, 1.0, 0.3, 0.9, 0.6, 1.2) • μa~0.8(0.01)1.7, μc~0.1(0.01)0.2 • Common Σ, W-B ratio: 0.01, 0.05, 0.1 and 0.2 • 10 or 20 items were sampled from the family distribution
M1: without any constraints on radicals M2: with constraints on radicals • θp= -1, 0, 1. Rp= (1, 1, 1) or (1, 2, 1) • Number of families l= 10 or 20 (one item in each family) • In M2, Each of the five radicals occur • 5~6 times (l= 10) or 10~12times(l= 20) • Case 1 (PIP) vs. Case 2 (CF): function (8) vs. (17)
10 families and 10 items per family in pool Rp= (1, 1, 1) • PIP: W-B↑, information↑, most informative item are selected • Family information↓, large uncertainty • FamInf < True Item Inf (large W-B) • M2 (constraint radical) < M1 (unconstraint radical)
10 families and 20 items per family in pool Rp= (1, 1, 1) • Doubling pool per family, Slight increase
20 families and 10 items per family in pool Rp= (1, 1, 1) • The different (PIP & CF), increase • Shape is the same
Figure 4 • Rp= (1, 2, 1) • The value smaller than uniform target
Figure 5. uniform • l=10, lf=10, M1 • W-B increase, • CF decrease • PIP tend to target
Discussion • Model fit • Exposure control • Item uncertainty – capitalization-on-cnance • Small calibration samples • Variability of the true item parameters • Bank size – test length ratio • Content constraint can mitigate it
Questions • radicals and incidentals are a bit of abstract • certain combinations of radicals and incidentals may result in invalid item. How do we know its in validness? • if the cognitive processes involved in solving the test items are known", which is neally impossible
Questions • μb=0.258 (Wolfe, 1981)