1 / 47

The end of construct validity

The end of construct validity. Denny Borsboom University of Amsterdam. Two kinds of validity. The working researcher’s idea: Validity concerns the question of whether a test measures what it should measure

lane
Download Presentation

The end of construct validity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The end of construct validity Denny Borsboom University of Amsterdam

  2. Two kinds of validity • The working researcher’s idea: Validity concerns the question of whether a test measures what it should measure • The construct validity idea: Validity is an evaluative, integrated judgement of the degree to which test score interpretations are justified in the light of empirical evidence and theoretical rationales (and, possibly, social consequences that follow from test use)

  3. What I will argue • The working researchers’ conception is theoretically and practically superior • The construct validity position has some sophication but that is mainly windowdressing; in general, it precisely misses the point of what validity is

  4. The pillars of construct validity Construct validity is • an evaluative judgement • about ‘test score interpretations’ • in terms of ‘constructs’ • that is a function of evidence • and a matter of degree • I will argue that this view • does not align with the working researcher’s view at all • has quite unreasonable consequences that one should not be comfortable with

  5. Why construct validity theory is dysfunctional

  6. The social consequences of construct validity theory

  7. The social consequences of construct validity theory

  8. A black hole that traps all psychometric problems

  9. Why construct validity has nothing to do with tests (and why this is wrong)

  10. Every interpretation can have construct validity

  11. There as as many ‘construct validities’ as there are judges

  12. Measurement instruments can ‘become valid’

  13. Some measurement instruments ‘were valid’...

  14. ...but then ‘ceased to be’ valid...

  15. Reference is unimportant ‘Aether’ ‘DNA’ ‘Phlogiston’ ‘Black hole’

  16. Validity depends on the presence of ‘interpreters’

  17. How construct validity is sold • Construct validity is an evaluative, integrated judgement of the degree to which test score interpretations are justified in the light of empirical evidence and theoretical rationales (and, possibly, social consequences that follow from test use)

  18. What construct validity really is • Somebody’s evaluative, integrated and fluctuating judgement of the degree to which test score interpretations, that may have nothing to do with measurement, are justified in the light of time-dependent empirical evidence and that person’s theoretical rationales (and, possibly, that person’s guesses about social consequences that follow from test use as well as his or her valuation of these outcomes)

  19. Why all this sophistication misses the point

  20. Construct validity is an evaluative, integrated judgement of the degree to which test score interpretations are justified in the light of empirical evidence and theoretical rationales (and, possibly, social consequences that follow from test use) • However, validity is... • a property,not a judgment • a property of instruments,not of inferences • a function of truth,not of evidence • the object of validation research,not its result

  21. VALIDITY

  22. A simple alternative: • A test is valid for measuring an attribute if and only if variation in the attribute causally produces variation in the measurement outcomes

  23. Attribute structure

  24. Attribute structure

  25. Score structure Attribute structure

  26. Score structure Response process Attribute structure

  27. IQ-scores 82 134 70 115 99 Response process g

  28. X IQ-scores 82 134 70 115 99 Response process f(X| ) g 

  29. Substantive theory Formal model X IQ-score patterns Response process f(X| ) g 

  30. Substantive theory Formal model X IQ-score patterns ? Response process f(X| ) ? g  ?

  31. Where to look for validity • Traditionally, evidence for validity is sought in external relations: relations between test scores and other test scores • In criterion validity the evidence comes from correlations with a criterion (or with the criterion) • In construct validity, the evidence comes from correlations with lots of other variables (MTMMs)

  32. .09 .15 .56 Attractiveness Extraversion Working memory .55 Masculinity .40 Race Visual memory .35 Annual income IQ-scores Job performance .30 .37 .41 Annual income Sex Numerical ability .50 SES .20 .78 Physique Length Genetic differences • But even if we knew all correlations between all conceivable tests, the validity problem would remain

  33. Where to look for validity • Validity is not a matter of external relations between the test scores and other test scores • It is a matter of which processes take attribute differences into response differences • For many tests we have no idea of what happens between item administration and item response • This is the reason that the validity problem has proven hard to crack

  34. Where to look for validity • Ingredients for validity: • A theory on the structure of the attribute • A theory on the processes that take levels of the attribute into observed score patterns • A formalmodel to test the theory against data • The question of validity then becomes: is this theory true?

  35. Example: The balance scale test Weight item Distance item What happens when the blocks are removed ? Conflict Weight item

  36. Example: The balance scale test • Theory on the structure of the attribute: • Cognitive development involves an ordered series of discrete transitions between stages • Theory on the processes that take levels of the attribute into observed score patterns: • Children in different developmental stages use different cognitive rules to solve balance scale items, which results in different response patterns • Statisticalmodel to test the theory against data • Developmental stages are conceptualized as latent classes with theoretically driven response vectors

  37. Balance scale Test scores X 001100 111100 001100 110011 110011 Response process Rule 1 Rule 2 Rule 3 P(X=x| ) Developmental stages Latent classes

  38. The question of validity: Is this theory of response behavior correct?

  39. How does this relate to other issues? • The validity concept is usually applied to many questions simultaneously: • Does the test measure the intended attribute? • How well do the test scores predict other attributes? • Is the use of the test legally defensible? • Will using the test improve the human condition? • which are put under one umbrella; I only deal with (1) • (2-...) are better left to psychometrics, law, politics, etc.

  40. Does this mean that other issues are unimportant? • No. Interpretations, uses, and consequences matter a great deal • But they are not thereby issues of validity • Moreover, they usually belong in the public sphere, not in the domain of validity theory

  41. Bottom line • To find out what you measure, you have to find out how your instrument works - there is no other way • If you know how your instrument is supposed to work, and you know how it works, you have a definite answer to the validity problem • However, if don’t know how your instrument is supposed to work, and you don’t know how it works, you are in trouble

  42. Validity is... ...measuring the right thing

More Related