1 / 35

Vertical Scaling: A Comparison of Equating Methods

Vertical Scaling: A Comparison of Equating Methods. Zachary R. Smith Matthew Finkelman Michael L. Nering Wonsuk Kim July 26, 2007. Introduction. NCLB requires measurement of growth Vertical scaling and growth have gotten more attention in recent years with the passing of this act in 2001

simone
Download Presentation

Vertical Scaling: A Comparison of Equating Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vertical Scaling: A Comparison of Equating Methods Zachary R. Smith Matthew Finkelman Michael L. Nering Wonsuk Kim July 26, 2007

  2. Introduction • NCLB requires measurement of growth • Vertical scaling and growth have gotten more attention in recent years with the passing of this act in 2001 • Students, parents, and teachers are all interested in how much growth is achieved

  3. Vertical Scaling • Vertical scaling puts tests with similar constructs and different difficulties onto the same scale • Controversial topic among researchers and policy-makers • Vertical scales can be sensitive and violation of assumptions can cause them to fail

  4. Vertical Scaling Difficulties • Statistical Considerations • Sparseness in the data • Error accumulating with in the system • No specific technology around vertical scaling • Content Considerations • Unknown how the vertical scale will relate to a construct across grades • For example, elementary school science tests

  5. Purpose of the Study • Compare 5 equating methods to determine which is best for vertical scaling • Provide guidance for researchers, practitioners, and policy-makers on the best equating method for vertical scaling • Display the severity of departures from unidimensionality

  6. Method • Monte Carlo simulation study • 5 IRT vertical scaling transformation methods • Stocking-Lord • FCIP • Haebara • mean/mean • mean/sigma methods

  7. Method • A vertical scale was created for grades 3 – 8 • Test consisted of 45 operational items and 15 scaling items varying in grade level with multiple forms for each grade • Grade 5 was used as the base for equating and all other grades were equated back to it

  8. Table 1: Linking Design

  9. Table 1: Linking Design

  10. Table 1: Linking Design

  11. Table 1: Linking Design

  12. Table 1: Linking Design

  13. Table 1: Linking Design

  14. Table 1: Linking Design

  15. Table 1: Linking Design

  16. Table 1: Linking Design

  17. Table 1: Linking Design

  18. Table 1: Linking Design

  19. Table 1: Linking Design

  20. Table 1: Linking Design

  21. Method • We used both unidimensional and multidimensional approaches • Condition 1 – Unidimensional • Grade 5 assigned a true ability of • Other grades increased or decreased by 0.5 • Lower grades: • Upper grades:

  22. Condition 2 - Multidimensional • Case 1 – Each examinee has a different ability dependent upon the grade level of the items they take • Theta values increased for lower grades (examinees expanded on previous knowledge and increased their ability) • Theta values decreased for upper grades (examinees have not seen this material before and their ability drops)

  23. Condition 2 - Multidimensional • Case 2 – Each examinee has the same ability level no matter what grade level the items are • Examinees still have 3 abilities, but they are all the same • This is an experimental case that is not generalizable

  24. Analysis • The simulation work was conducted in R • PARSCALE was used to obtain parameter estimates • STUIRT and MP programs were used to conduct the transformations • R was used for the final analysis to attain the sums of squared differences, RMSE, bias, and a percentage of examinees correctly ordered on the vertical scale

  25. Results • Considering only the unidimensional approach, all equating gets worse the farther it gets from the base grade • This poses the question of how far is too far from the base grade for vertical scaling • All methods seem to be performing about the same, with the FCIP method slightly farther than the rest

  26. Table 2: Average Sum of Squared Differences (Unidimensional)

  27. Figure 1: Average Sum of Squared Differences (by grade level)

  28. Figure 2: Average Sum of Squared Differences (by method)

  29. Table 3: Average Root Mean Squared Error (Unidimensional)

  30. Table 4: Average Bias (Unidimensional)

  31. Table 5: Average Random Pull Percentage (Unidimensional)

  32. Discussion • Overall, there was a large difference between the unidimensional and multidimensional cases • Multidimensional case was far from what is expected from a good vertical scale • The two cases show different severities of how a vertical scale can fail

  33. Discussion • Problems occurred for some of the multidimensional transformations • These only happened with the grade 8 equating, since it was farthest from the base grade, and with the Stocking-Lord procedure and Haebara method • Needs to be examined before finalizing the paper

  34. Future Research • We would like to increase the replications to 100 to eliminate variability • More mulitdimensional cases should be added • More linking methods should be included to determine the effects of placing the base grade elsewhere

  35. Contact Info Thank you! Zachary R. Smith zrs012@gmail.com

More Related