DeAnn Huinker, Daniel A. Sass, & Cindy M. Walker University of Wisconsin-Milwaukee

Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass, & Cindy M. Walker University of Wisconsin-Milwaukee

Introduction • We have been using the Learning Mathematics for Teaching (LMT) assessments to evaluate the impact of the MMP on teacher content knowledge • Appreciative of the strong theoretical foundation; however several pragmatic challenges exist • Purpose of this presentation is to share our experiences, challenges, and concerns

Item Response Theory (IRT) 101 • Mathematical function that relates item parameters (i.e. difficulty and discrimination) to examinee characteristics • IRT ability and item parameter interpretation • IRT parameter estimates are invariant up to a linear transformation (i.e. indeterminacy of scale) • Several competing models to choose from • How does IRT differ from classical test theory (CTT)?

Issue 1: Lack of Item Equating • Multiple sets of item parameters which can occur due to scaling the same items from 1) Different test compositions 2) Different groups of examinees 3) Both • Which set of item parameters should be used? • Will repeated measures be used? • Need to generalize to the population?

Issue 2: Scale Development • In using the LMT measures, projects must decide whether to use established LMT scales or to construct their own assessments by choosing problems from the item pool • Which method is best and when? • Content validity issue • Need to generalize ability estimates • Test length • Matching ability distribution to maximize test information • Equating concern • Should the pre- and post-test measures? • IRT vs. CTT

Issue 2: Scale Development • How do researchers decide which items to use in constructing assessments? • We have found that the LMT item pool often contains too few items to create the preferred level of match to project goals and state standards for student learning • Need to match item characteristics to expected ability distribution • In some content areas there are too few items and/or item characteristics are not ideal

Issue 3: Model Selection • What IRT model would be selected and how does it influence score interpretation? • One issue when modeling dichotomous data using IRT is selecting the most appropriate or best fitting model (i.e., 1-, 2-, or 3-PL) • Why not use polytomous models? • To date items are scored using either CTT (i.e., summing the number of correct items) or using the 2-PL model. • Comparability of models • Role of item discrimination parameter • Score interpretation for CTT and 2-PL

Table 1 • Data taken from Mathematical Explorations for Elementary Teachers Course

Conclusions • There are two primary issues related to analyzing data from the Michigan measures that needs to be address and improved on. 1) Item equating to ensure the ones are on the same measurement scale. • Benefit of invariance property (i.e., test length and item selection) 2) The second issue is which IRT model is more appropriate for the data and the degree to which fitting different models affects score interpretation.

Questions and Concerns • How have you addressed some of these issues? • What are some issues that you have encountered when using this measure? • Related measurement questions?

DeAnn Huinker, Daniel A. Sass, & Cindy M. Walker University of Wisconsin-Milwaukee