Optimal Test Design With Rule-Based Item Generation

Optimal Test Design With Rule-Based Item Generation Wim J. van der Linden Cees A.W. Glas Hanneke Geerlings

Item generation and automated test assembly • Hierarchical item response theory • Second-level for each family of item parameters. • Family calibration • New item generated from a family does not need to be calibrated • Only family parameters are known, “family- information function” replaces “item-information” • Depend on the degree of within-family item-parameter variability across all families.

Three different cases of item generation • Test assembly from a pool of pregenerated individually calibrated items. • Baseline • Test generation on the fly from pools with calibrated item families. • Test generation on the fly using calibrated radicals that define the item families.

Rule-Based Item Generation • Radicals • Systematic use can help to ensure the content validity of the items • Incidentals • Do not have any systematic effect on item difficulty • Item cloning • Item with identical radicals but different incidentals

The features are radicals or incidentals? • Coding existing items on the presence or absence of the specified radicals and performing • Exploratory • Developers define their radicals and incidentals. Model checking is using. • Confirmatory

Modeling Rule-Based Item Generation • Between family variation – radicals • Within family variation – incidentals • Multilevel response model • Item cloning model (Glas & van der Linden, 2003) • Level 1 • Level 2 • Common Σ : all families are generated by the same set of incidentals, and no interaction between radicals and incidentals

Modeling Rule-Based Item Generation • The mean family difficulty as a wieghted sum of the effects of radicals • Linear Item Cloning Model (LICM) • with Σf : LICM-F • with Σ: LICM-C

Family-Information Funciton • Item-information • Family- information • Expected about θ in the response to a random item from family f.

Three Cases of Automated Test Design • Test Assembly From a Pregenerated Item Pool

Three Cases of Automated Test Design • The Generation on the Fly From Calibrated Families

Three Cases of Automated Test Design • Test Generation on the Fly Using Calibrated Radicals Only

Effect of Within-Family Item-Parameter Variability on Family Information • σ2=0 means all item parameters are equal to their respective family means. • ρ= .0, .5 or -.5

σ2b : decrease in the family information • σ2a : decrease at ability away from optimal difficult value • σ2c : no effect

σ2a =0.05, σ2b =0.5, σ2c =0.00 • ρab=.5 : • θ>μb, Family information increase θ<μb, Family information decrease Because a↑ b↑ information ↑

σ2a =0.00, σ2b =0.5, σ2c =0.2 • ρbc=.5 : small shift to left • Large guessing result in less information

σ2a =0.05, σ2b =0.00, σ2c =0.2 • ρac=.5 : counterbalance effect, a↑ c↑ • ρac=-.5 : higher family information

If σ2 ≠ 0 (ρ≠.0 ), use of item information lead to overestimation information on θ in a random item from the family.

Simulation Study • Illustrate Case 1 and Case 2 • The use of family information instead of item information • Effects of test assembly based on knowledge of item families only.

β=(-2.0, 1.0, 0.3, 0.9, 0.6, 1.2) • μa~0.8(0.01)1.7, μc~0.1(0.01)0.2 • Common Σ, W-B ratio: 0.01, 0.05, 0.1 and 0.2 • 10 or 20 items were sampled from the family distribution

M1: without any constraints on radicals M2: with constraints on radicals • θp= -1, 0, 1. Rp= (1, 1, 1) or (1, 2, 1) • Number of families l= 10 or 20 (one item in each family) • In M2, Each of the five radicals occur • 5~6 times (l= 10) or 10~12times(l= 20) • Case 1 (PIP) vs. Case 2 (CF): function (8) vs. (17)

10 families and 10 items per family in pool Rp= (1, 1, 1) • PIP: W-B↑, information↑, most informative item are selected • Family information↓, large uncertainty • FamInf < True Item Inf (large W-B) • M2 (constraint radical) < M1 (unconstraint radical)

10 families and 20 items per family in pool Rp= (1, 1, 1) • Doubling pool per family, Slight increase

20 families and 10 items per family in pool Rp= (1, 1, 1) • The different (PIP & CF), increase • Shape is the same

Figure 4 • Rp= (1, 2, 1) • The value smaller than uniform target

Figure 5. uniform • l=10, lf=10, M1 • W-B increase, • CF decrease • PIP tend to target

Figure 6. Rp= (1, 2, 1)

Discussion • Model fit • Exposure control • Item uncertainty – capitalization-on-cnance • Small calibration samples • Variability of the true item parameters • Bank size – test length ratio • Content constraint can mitigate it

Questions • radicals and incidentals are a bit of abstract • certain combinations of radicals and incidentals may result in invalid item. How do we know its in validness? • if the cognitive processes involved in solving the test items are known", which is neally impossible

Questions • μb=0.258 (Wolfe, 1981)

Optimal Test Design With Rule-Based Item Generation

Optimal Test Design With Rule-Based Item Generation

Presentation Transcript

Criteria-Based Test Case Design

Automated Software Test Generation : Model-Based Testing

Query-based Test Generation for Database Applications

Test Item Analysis

Design Rule Generation for Interconnect Matching

UCM-Based Generation of Test Goals

Test Item Quality

Collaborating with Teachers for Next Generation Item Development

UCM-Based Generation of Test Goals

Bit-Vector Rewriting with Automatic Rule Generation

Optimal auction design

On Optimal Single-Item Auctions

Multidimensional Adaptive Testing with Optimal Design Criteria for Item Selection

Test Input Generation with Java PathFinder

Test Item

DSL Composition for Model-Based Test Generation

Test Generation with Abstraction and Concretization

Improving Test Design with Model Based Testing

Optimal Feature Generation

Rule-based generation of requirements traceability relations

Math Item Design

Test generation