180 likes | 296 Views
An Investigation of Sample Size Splitting on ATFIND and DIMTEST. Alan Socha and Christine E. DeMars James Madison University. Overview. Background Unidimensionality DIMTEST/ATFIND Research Questions/Hypotheses Method Simulated Parameters Simulated Conditions Results Type I Error
E N D
An Investigation of Sample Size Splitting on ATFIND and DIMTEST Alan Socha and Christine E. DeMars James Madison University
Overview • Background • Unidimensionality • DIMTEST/ATFIND • Research Questions/Hypotheses • Method • Simulated Parameters • Simulated Conditions • Results • Type I Error • Power • Conclusions • Conclusions • Limitations & Future Research
BackgroundUnidimensionality • Unidimensionality is an assumption of one-, two-, and three-parameter IRT models • All of these models estimate a unidimensional latent variable • Unjustified use of these models can result in serious statistical errors • Bias of item parameter and examinee ability estimates • Loss of information
BackgroundUnidimensionality • Conditions where multidimensional data can be modeled as unidimensional • The strength of each dimension may not be strong • Tests that contain items that measure the same weighted composite of multiple dimensions • When examinees only vary in their level of one of the abilities • But a unidimensional latent variable is not always appropriate • When unidimensionality is rejected, other options include testlet scoring, multidimensional modeling, and separating the test into several unidimensional subtests. • Various procedures exist for testing unidimensionality
BackgroundDIMTEST • DIMTEST is a non-IRT procedure • Compares an Assessment Subtest (AT) with a Partitioning Subtest (PT) • Computes the pairwise item covariances of the AT, conditioned on score on the PT • Will be near zero if items do not share a secondary dimension because examinees within each score group will have the same estimated score on the primary dimension
BackgroundATFIND • AT can be chosen theoretically or empirically • When chosen empirically, sample must be split with part used to find AT and part used to test whether AT measures a second dimension • ATFIND is an empirical method for finding the AT • Utilizes an agglomerative hierarchical cluster analysis (HCA/CCPROX) procedure • DETECT statistic used to determine AT
BackgroundPrevious Research • DIMTEST maintains the nominal α level and has power in detecting multidimensionality when the AT and PT are chosen appropriately • DIMTEST is more likely to have higher Type I error rates when tests are short • Different studies have used different versions of DIMTEST and different procedures in deriving AT • Proportion of the sample used for ATFIND versus that used for DIMTEST has not been consistent throughout the literature • No studies have investigated the effects of splitting the sample on deriving the AT versus testing whether the AT represents a secondary dimension
BackgroundHypotheses/Research Questions • Should a smaller sample be used to select the AT, leaving a large sample for the statistical significance test, or vice versa? • Larger sample for ATFIND: Better selection of AT which should increase power • Larger sample for DIMTEST: Better power for statistical test • Hypothesis 1: Power will be greater when the abilities have simple structure • Hypothesis 2: Power will increase as the interability correlation decreases
MethodSimulated Parameters • Unidimensional data follows 3PL model • Multidimensional data follows MC3PL model with 2 dimensions • Discriminations: lognormal with M = 0, SD = .5 • Difficulties : Normal with M = 0, SD = .6 • Those beyond |2| were regenerated • Guessing: Uniform from 0-.2 • Abilities: Normalwith M = 0, SD = 1
MethodSimulated Conditions • Test Sizes: 20; 40; 60 • Sample Sizes: 500; 1,000; 2,000; 4,000 • Interability Correlations: 0; .35; .70 • Dimensional Structure: Simple; Complex • Sample Size Splits: 25/75; 50/50; 75/25 • 1,000 replications
ConclusionsConclusions & Implications • The results suggest that a 50/50 split maximizes power and keeps the Type I error rate below the nominal level unless the test is short and the sample size is large • Otherwise a 75/25 split controls Type I error better • Hypothesis 1: Power will be greater when the abilities have simple structure – supported • Hypothesis 2: Power will increase as the interability correlation decreases – supported
ConclusionsLimitations & Future Research • Ideal conditions of simulated data do not always occur in practice • Noncompensatory IRT model may have produced different results • What about more than 2 dimensions? • Effects of variations on ability distributions not investigated
Thank you. Questions?