1 / 18

An Investigation of Sample Size Splitting on ATFIND and DIMTEST

An Investigation of Sample Size Splitting on ATFIND and DIMTEST. Alan Socha and Christine E. DeMars James Madison University. Overview. Background Unidimensionality DIMTEST/ATFIND Research Questions/Hypotheses Method Simulated Parameters Simulated Conditions Results Type I Error

bond
Download Presentation

An Investigation of Sample Size Splitting on ATFIND and DIMTEST

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Investigation of Sample Size Splitting on ATFIND and DIMTEST Alan Socha and Christine E. DeMars James Madison University

  2. Overview • Background • Unidimensionality • DIMTEST/ATFIND • Research Questions/Hypotheses • Method • Simulated Parameters • Simulated Conditions • Results • Type I Error • Power • Conclusions • Conclusions • Limitations & Future Research

  3. BackgroundUnidimensionality • Unidimensionality is an assumption of one-, two-, and three-parameter IRT models • All of these models estimate a unidimensional latent variable • Unjustified use of these models can result in serious statistical errors • Bias of item parameter and examinee ability estimates • Loss of information

  4. BackgroundUnidimensionality • Conditions where multidimensional data can be modeled as unidimensional • The strength of each dimension may not be strong • Tests that contain items that measure the same weighted composite of multiple dimensions • When examinees only vary in their level of one of the abilities • But a unidimensional latent variable is not always appropriate • When unidimensionality is rejected, other options include testlet scoring, multidimensional modeling, and separating the test into several unidimensional subtests. • Various procedures exist for testing unidimensionality

  5. BackgroundDIMTEST • DIMTEST is a non-IRT procedure • Compares an Assessment Subtest (AT) with a Partitioning Subtest (PT) • Computes the pairwise item covariances of the AT, conditioned on score on the PT • Will be near zero if items do not share a secondary dimension because examinees within each score group will have the same estimated score on the primary dimension

  6. BackgroundATFIND • AT can be chosen theoretically or empirically • When chosen empirically, sample must be split with part used to find AT and part used to test whether AT measures a second dimension • ATFIND is an empirical method for finding the AT • Utilizes an agglomerative hierarchical cluster analysis (HCA/CCPROX) procedure • DETECT statistic used to determine AT

  7. BackgroundPrevious Research • DIMTEST maintains the nominal α level and has power in detecting multidimensionality when the AT and PT are chosen appropriately • DIMTEST is more likely to have higher Type I error rates when tests are short • Different studies have used different versions of DIMTEST and different procedures in deriving AT • Proportion of the sample used for ATFIND versus that used for DIMTEST has not been consistent throughout the literature • No studies have investigated the effects of splitting the sample on deriving the AT versus testing whether the AT represents a secondary dimension

  8. BackgroundHypotheses/Research Questions • Should a smaller sample be used to select the AT, leaving a large sample for the statistical significance test, or vice versa? • Larger sample for ATFIND: Better selection of AT which should increase power • Larger sample for DIMTEST: Better power for statistical test • Hypothesis 1: Power will be greater when the abilities have simple structure • Hypothesis 2: Power will increase as the interability correlation decreases

  9. MethodSimulated Parameters • Unidimensional data follows 3PL model • Multidimensional data follows MC3PL model with 2 dimensions • Discriminations: lognormal with M = 0, SD = .5 • Difficulties : Normal with M = 0, SD = .6 • Those beyond |2| were regenerated • Guessing: Uniform from 0-.2 • Abilities: Normalwith M = 0, SD = 1

  10. MethodSimulated Conditions • Test Sizes: 20; 40; 60 • Sample Sizes: 500; 1,000; 2,000; 4,000 • Interability Correlations: 0; .35; .70 • Dimensional Structure: Simple; Complex • Sample Size Splits: 25/75; 50/50; 75/25 • 1,000 replications

  11. ResultsType I Error

  12. ResultsPower

  13. ResultsPower

  14. ResultsPower

  15. ResultsPower

  16. ConclusionsConclusions & Implications • The results suggest that a 50/50 split maximizes power and keeps the Type I error rate below the nominal level unless the test is short and the sample size is large • Otherwise a 75/25 split controls Type I error better • Hypothesis 1: Power will be greater when the abilities have simple structure – supported • Hypothesis 2: Power will increase as the interability correlation decreases – supported

  17. ConclusionsLimitations & Future Research • Ideal conditions of simulated data do not always occur in practice • Noncompensatory IRT model may have produced different results • What about more than 2 dimensions? • Effects of variations on ability distributions not investigated

  18. Thank you. Questions?

More Related