Vertical Scaling: A Comparison of Equating Methods

Vertical Scaling: A Comparison of Equating Methods Zachary R. Smith Matthew Finkelman Michael L. Nering Wonsuk Kim July 26, 2007

Introduction • NCLB requires measurement of growth • Vertical scaling and growth have gotten more attention in recent years with the passing of this act in 2001 • Students, parents, and teachers are all interested in how much growth is achieved

Vertical Scaling • Vertical scaling puts tests with similar constructs and different difficulties onto the same scale • Controversial topic among researchers and policy-makers • Vertical scales can be sensitive and violation of assumptions can cause them to fail

Vertical Scaling Difficulties • Statistical Considerations • Sparseness in the data • Error accumulating with in the system • No specific technology around vertical scaling • Content Considerations • Unknown how the vertical scale will relate to a construct across grades • For example, elementary school science tests

Purpose of the Study • Compare 5 equating methods to determine which is best for vertical scaling • Provide guidance for researchers, practitioners, and policy-makers on the best equating method for vertical scaling • Display the severity of departures from unidimensionality

Method • Monte Carlo simulation study • 5 IRT vertical scaling transformation methods • Stocking-Lord • FCIP • Haebara • mean/mean • mean/sigma methods

Method • A vertical scale was created for grades 3 – 8 • Test consisted of 45 operational items and 15 scaling items varying in grade level with multiple forms for each grade • Grade 5 was used as the base for equating and all other grades were equated back to it

Table 1: Linking Design

Method • We used both unidimensional and multidimensional approaches • Condition 1 – Unidimensional • Grade 5 assigned a true ability of • Other grades increased or decreased by 0.5 • Lower grades: • Upper grades:

Condition 2 - Multidimensional • Case 1 – Each examinee has a different ability dependent upon the grade level of the items they take • Theta values increased for lower grades (examinees expanded on previous knowledge and increased their ability) • Theta values decreased for upper grades (examinees have not seen this material before and their ability drops)

Condition 2 - Multidimensional • Case 2 – Each examinee has the same ability level no matter what grade level the items are • Examinees still have 3 abilities, but they are all the same • This is an experimental case that is not generalizable

Analysis • The simulation work was conducted in R • PARSCALE was used to obtain parameter estimates • STUIRT and MP programs were used to conduct the transformations • R was used for the final analysis to attain the sums of squared differences, RMSE, bias, and a percentage of examinees correctly ordered on the vertical scale

Results • Considering only the unidimensional approach, all equating gets worse the farther it gets from the base grade • This poses the question of how far is too far from the base grade for vertical scaling • All methods seem to be performing about the same, with the FCIP method slightly farther than the rest

Table 2: Average Sum of Squared Differences (Unidimensional)

Figure 1: Average Sum of Squared Differences (by grade level)

Figure 2: Average Sum of Squared Differences (by method)

Table 3: Average Root Mean Squared Error (Unidimensional)

Table 4: Average Bias (Unidimensional)

Table 5: Average Random Pull Percentage (Unidimensional)

Discussion • Overall, there was a large difference between the unidimensional and multidimensional cases • Multidimensional case was far from what is expected from a good vertical scale • The two cases show different severities of how a vertical scale can fail

Discussion • Problems occurred for some of the multidimensional transformations • These only happened with the grade 8 equating, since it was farthest from the base grade, and with the Stocking-Lord procedure and Haebara method • Needs to be examined before finalizing the paper

Future Research • We would like to increase the replications to 100 to eliminate variability • More mulitdimensional cases should be added • More linking methods should be included to determine the effects of placing the base grade elsewhere

Contact Info Thank you! Zachary R. Smith zrs012@gmail.com

Vertical Scaling: A Comparison of Equating Methods

Vertical Scaling: A Comparison of Equating Methods

Presentation Transcript

CHAPTER 9 BASIC ELEMENTS OF ORGANISATIONAL STRUCTURE

Iterative Methods for System of Equations

Vertical and Horizontal Integration. Mergers

Vertical Integration and The Scope of the Firm

Scaling Up Response to Intervention:

A- The airspeed to read lower than normal. B- The vertical speed to momentarily show a descent. C- The vertical speed

Chapter 3 – Agile Software Development

BGP 102: Scaling the Network

Chapter 3 – Agile Software Development

Module 6 : Scaling Leadership Building High Performing, Shared-Responsibility Teams

VERTICAL GARDENING

Application of coupled-channel Complex Scaling Method to the K bar N -πY system

Methods Version 1.1

Clustering Methods

CAP5510 – Bioinformatics Sequence Comparison

Code comparison

Sequence Analysis

Response to Intervention: Scaling Up an Every Ed Initiative

Advanced Scaling Techniques for the Modeling of Materials Processing

SCALING UP RtI 2.0

BGP 102: Scaling the Network