1 / 71

Goals of this session

The measurement model: what does it mean and what you can do with it? Presented by Michael Nering, Ph. D. Goals of this session. Present commonly used measurement models IRT Show how these models form the backbone of any large scale assessment program Equating Scaling

Download Presentation

Goals of this session

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The measurement model: what does it mean and what you can do with it?Presented by Michael Nering, Ph. D.

  2. Goals of this session • Present commonly used measurement models • IRT • Show how these models form the backbone of any large scale assessment program • Equating • Scaling • To discuss the meaning of the measurement models • Ability estimation • Item characteristics

  3. Now, really …. • This session is an introduction to the world of psychometrics • My goal is for you to understand that psychometrics: • Is not a black box • Is really just a set of procedures

  4. A little about me • Yes, I’m a psychometrican • B.A. in psychology at Kent State • Ph. D. in psychology at Univ of Minn • Working at Measured Progress since 1999 • Research areas of interest: IRT, equating, scaling, person fit, adaptive testing

  5. Why Psychology? • Psychometricians typically come from: • Educational measurement programs • Psychometric programs • I/O programs • Ultimately, we are all after the pursuit of understanding people by way of “quantification”

  6. Psychometrics Defined • Psycho metrics • The business of measuring psychological “things” Measurement Psychological

  7. What are psychological things? • Any “latent” trait • Any “characteristic” that is not directly “observable” • Examples: • Depression, bi-polar, personality disorder • Math, reading, writing, science abilities • We don’t care – let’s use “q”

  8. Counterparts to Psychometrics • Econometrics • Measurement of economic things • Sociometrics • Measurement of social things

  9. All “metrics” are ultimately a blend of things

  10. Quantification in Psychology • Deep roots that came originally from philosophy • Philosophy in the 1500s branched into several disciplines because of the need to quantify certain things to better understand human beings

  11. Philosophy’s Many Branches • This desire to better understand humans lead to two primary areas of study • Physiology • 1543 Belgian physiologists practices the dissection of cadavers • Psychology • 1524Marco Marulik publishes The Psychology of HumanThought

  12. Yes, I did just use the word “cadaver” … but trust me it’s okay

  13. The last 100 years of psychometrics • Classical test theory and Spearman’s 1904 contribution • True score theory • Reliability theory, p-values, point biserial coefficients • Item response theory

  14. Let’s talk about IRT • When I say “measurement model” I really do mean some sort of IRT model • Lots of historical developments • Lord & Novick text of 1968 • Many advantages over CTT

  15. So, what is IRT? • A family of mathematical models that describe the interaction between examinees and test items • Examinee performance can be predicted in terms of the underlying trait • Provides a means for estimating scores for people and characteristics of items • Common framework for describing people and items

  16. The ogive • Natural occurring form that describes something about people • Used throughout science, engineering, and the social sciences • Also, used in architecture, carpentry, engineering, photograph, art, and so forth

  17. The ogive

  18. The ogive

  19. A little jargon • The item characteristic curve (ICC) • Also called: • Item response function • Trace line • Etc. • Stochastic: 1) involving a random variable, or 2) involving chance or probability

  20. The ICC • Does this one little function really do everything? • Scale items & people onto a common metric? • Help in standard setting? • Foundation of equating? • Some meaning in terms of student ability?

  21. Does this one little function really do everything? Absolutely !! Let’s talk more about the ICC

  22. The ICC • Any line in a Cartesian system can be defined by a formula • The simplest formula for the ogive is the logistic function:

  23. The ICC • Where d is the item parameter, and bis the person parameter • The function represents the probability of responding correctly to item i given the ability of person j.

  24. dis the inflection point Item i di=0.125

  25. We can now use the item parameter to calculate p • Let’s assume we have a student with b =1.0, and we have ourd = 0.125 • Then we can simply plug in the numbers into our formula

  26. Using the item parameters to calculate p p = 0.705 bi=1.00

  27. Wait a minute • What do you mean a student with an ability of 1.0?? • Does an ability of 0.0 mean that a student has NO ability? • What if my student has a reading ability estimate of -1.2? What in the world does that mean????

  28. The ability scale • Ability is on an arbitrary scale that just so happens to be centered around 0.0 • We use arbitrary all the time: • Fahrenheit • Celsius • Decibels • DJIA

  29. Scaled Scores • Although ability estimates are centered around zero – reported scores are not • However, scaled scores are typically a linear transformation of ability estimates • Example of a linear transformation: • (Ability x Slope) + Intercept

  30. The need for scaled scores ½ the kids will have negative ability estimates

  31. Scaled Scores

  32. Use of scaled scores • Student/parent level report • School/district report • Cross year comparisons • Performance level categorization

  33. There’s a lot here • Scaled scores are surface level information • Behind the scenes: • we use fancy formulas to depict interaction between students and test items • there’s a “probabilistic” relationship between students and test items

  34. Unfortunately, life can get a lot worse • Items vary from one another in a variety of ways: • Difficulty • Discrimination • Guessing • Item type (MC vs. CR)

  35. Items can vary in terms of difficulty Easier item Harder item Ability of a student

  36. Items can vary in terms of discrimination • Discrimination is reflected by the “pitch” in the ICC • Thus, we allow the ICCs to vary in terms of their slope

  37. Good item discrimination Noticeable difference in p 2 close ability levels

  38. Poor item discrimination smaller difference Same 2 ability levels

  39. Guessing This item is asymptotically approaching 0.25

  40. Polytomous Items

  41. I’m sure by now you might be having a couple of thoughts I’m stuck in a “psycho”metric prison … help me! How can I get up, open the door, and walk out without anybody noticing?

  42. But, trust me … I’m really trying to make a simple point

  43. Items and people • Interact in a variety of ways • We can use IRT to show that there exists a nice little s-shaped curve that shows this interaction • As ability increases – the probability of a correct response increases

  44. Advantages of IRT • Because of the stochastic nature of IRT there are many statistical principles we can take advantage of • A test is a sum of its parts

  45. The test characteristic curve • A test is made up of many items • The TCC can be used to summarize across all of our items • The TCC is simply the summation of ICCs along our ability continuum • For any ability level we can use the TCC to estimate the overall test score for an examinee

  46. A bunch of ICCs are on a test

  47. The test characteristic curve

  48. The test characteristic curve • From an observed test score (i.e., a student’s total test score) we can estimate ability • The TCC is used in standard setting to establish performance levels • The TCC can also be used to equate tests from one year to the next

More Related