1 / 53

Advising on test validity Denny Borsboom University of Amsterdam

Advising on test validity Denny Borsboom University of Amsterdam. or. Things that keep me awake at night. Overview Rocks and hard places The psychometric orthodoxy The validity problem What I think of validity What I advise on validity Even more miscellaneous issues. . . . . flying.

charissa
Download Presentation

Advising on test validity Denny Borsboom University of Amsterdam

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advising on test validity Denny Borsboom University of Amsterdam

  2. or

  3. Things that keep me awake at night

  4. Overview Rocks and hard places The psychometric orthodoxy The validity problem What I think of validity What I advise on validity Even more miscellaneous issues

  5. . . . . flying litter environmentalism attitude relevance others

  6. tell the researcher to do a PCA and be done with it! do what you can to further real scientific progress!

  7. The Psychometric Orthodoxy Make up a number of items you think are related to a “construct” Compute Cronbach’s a Run a principal components analysis If the scree plot drops steeply, and a > .75, use sumscore for research Plug sumscore into experimental designs, ANOVAs, behavior genetic analyses, fMRI studies, etc. Publish results Worry about validity

  8. Disclaimer: • The psychometric orthodoxy works perfectly for mundane goals, like: • getting publishable results • predicting all sorts of things • building carreers in psychology • That is not what I am concerned about

  9. validity:does the test really measureenvironmentalism?

  10. The construct validity doctrine • To study validity, one should: - compute correlations with similar variables - compute correlations with dissimilar variables - examine group differences - etc. • Results will typically be inconclusive

  11. The question of validity • What does it mean ‘to really measure’ something? • Does it mean more than ‘to just measure something’? • And: who is taking care of the measurement problem in the first place?

  12. we assume tests are valid and take it from there methodology mountain validity? why don’t we ask the methodologist?! substantive psychology ville

  13. Four questions • what do our models assume? • do these assumptions make sense in psychology? • what are we really doing? • should this keep me awake at night?

  14. Four questions • what do our models assume? <- common causes • do these assumptions make sense in psychology? <- no • what are we really doing? <- something else • should this keep me awake at night? <-?

  15. 1 2 3 X1 X2 X3    1 Measurement models

  16. Number of firemen

  17. Number of firemen Number of paramedics

  18. Number of firemen Number of paramedics Number of spectators

  19. Number of firemen Number of paramedics Number of spectators Correlation Correlation

  20. Number of firemen Number of paramedics Number of spectators Correlation Correlation Size of fire

  21. Number of firemen Number of paramedics Number of spectators No correlation Size of fire

  22. Number of firemen Number of paramedics Number of spectators Local Independence Size of fire

  23. I make friends easily Correlation Correlation I feel comfortable around people I am the life of the party

  24. I make friends easily Extraversion Reflective measurement model I feel comfortable around people I am the life of the party

  25. Reflective measurement models • Are an instantiation of a common cause structure • So: what causal process links ‘environmentalism’ to my decision to fly or not to fly? • And: what element of that process is the same one that causes me to throw litter in the trashcan?

  26. Reflective measurement Temperature

  27. Reflective measurement with one item • What makes one thermometer a valid measurement instrument for temperature? • Its outcomes causally depend on temperature • The specification of this causal link is the most important problem in assessing validity

  28. Essence attribute test score causal process

  29. How plausible is this... ...for environmentalism and flying? ...for intelligence and IQ-scores? ...for personality and the Big Five? ...for depression and DSM-diagnoses? ...

  30. The Psychometric Orthodoxy Make up a number of items you think are related to a “construct” Compute Cronbach’s a Run a principal components analysis If the scree plot drops steeply, and a > .75, use sumscore for research Plug sumscore into experimental designs, ANOVAs, behavior genetic analyses, fMRI studies, etc. Publish results Worry about validity

  31. So what are we really doing?

  32. significant others KLM attitude flying self-efficacy litter annual income educational level job performance Sex annual income numerical ability SES physique genetic differences length

  33. significant others KLM attitude flying self-efficacy litter annual income educational level job performance sex annual income numerical ability SES shower genetic differences length

  34. environmentalism significant others KLM attitude flying self-efficacy litter annual income educational level job performance sex annual income numerical ability SES shower genetic differences length

  35. We are constructing variables outof other variables, and labelingthem as ‘constructs’

  36. Advice implications? • So: I think that psychology’s measurement story is implausible in many cases • I do not believe that it is true for environmentalism and flying • Should this play a role in my methodological advice?

  37. NO

  38. Reasons: • I do not represent a majority position • I do not know for sure that I’m right • I am uncertain what the alternative should be • This is not the researcher’s problem until the scientific community makes it his or her problem

  39. Catharsis • So what I do instead is: try to solve the researcher’s problem (not mine) • Try to push the scientific and methodological literature in the direction I think should be labelled ‘forward’ • Wait for alternative ideas to catch on, and the consensus to change

  40. Message • When you are advising, you are a window between the methodological literature and your client • If the methodological literature thinks that constructs are o.k., and your client agrees, then you are not in a position to advertise your hangups • Researchers should not suffer from your problems

  41. But...

  42. Example 1 • A researcher wants to do an Anova to see whether people score higher on ‘optimism’ than they do on ‘extraversion’ • Two different scales, used to measure two different attributes, thrown into an RM anova • This is nonsense and will always be nonsense

  43. Example 2 • An organization wants to estimate the proportion of alternative healers that are involved in malpractice • They have a very small, very biased sample • This is not a responsible course of action

  44. Example 3 • An fmri researcher wants to interpret correlations in very small subgroups (n=8) • She wants to satisfy a reviewer and conclude that the correlation is higher in group A than in group B • Pragmatically, I understand; scientifically, I think it’s nonsense

More Related