1 / 34

Composite scores

Composite scores. Paul K. Crane, MD MPH Dan M. Mungas, PhD. Disclaimer. Funding for this conference was made possible, in part by Grant R13 AG030995 from the National Institute on Aging.

jeniferj
Download Presentation

Composite scores

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Composite scores Paul K. Crane, MD MPH Dan M. Mungas, PhD

  2. Disclaimer • Funding for this conference was made possible, in part by Grant R13 AG030995 from the National Institute on Aging. • The views expressed do not necessarily reflect the official policies of the Department of Health and Human Services; nor does mention by trade names, commercial practices, or organizations imply endorsement by the U.S. Government. • Drs. Harvey and Crane have no conflicts of interest to report.

  3. Outline • Neuropsychological practice and the utility of z scores • Why composite scores? Drinking from the fire hose • Z scores head to head with IRT scores • Conclusions

  4. Neuropsychological practice • Often focused on patterns of cognitive deficits across different domains • Useful in differential diagnosis of the cognitively impaired individual • May emphasize a premorbid estimate of ability • Multiple determinants, including occupation, educational attainment, military rank (for vets) • Vocabulary preserved in early AD, so it may be used as well

  5. Scores and communication • Neuropsychological batteries contain many tests, each with a different scoring metric • Familiarity permits experts to understand what a Mattis score of 130 and 4 errors on the Clock and a Trails B time of 142 seconds imply about the examinee’s cognitive functioning • Difficult to communicate these scores to less experienced colleagues

  6. Clinical use of z scores • Z scores facilitate short-hand communication • Relatively easy to calculate • Requires some average score and some standard deviation to calculate; age-specific or race-specific or education-specific norms? • May not matter much within an individual, unless different tests have much different demographic impacts • Makes it much easier for individuals with less experience with the tests to identify domains with deficits

  7. Rationale for composite scores • Summary scores are very helpful for analyses • Better measurement properties together than any instrument on its own • Avoid problems from multiple hypotheses • True signal scenario with multiple bad tests: a few show p<0.05, a few don’t, looks like the work of chance • True signal scenario with one good test: may work • Depends on measurement properties, and whether the items pooled together measure the thing intended (dimensionality and validity issues)

  8. Logical next step in the z score story • Very simple extension to average the z scores of the tests within a domain and use that average z score in analyses • Commonly done, even considered relatively sophisticated by study sections in 2008 • But: may not be the best thing to do from a psychometrics perspective

  9. Assumptions of z scores • Each item / scale / test has equal weight on the overall score • Is letter fluency with 3 letters 1 test or 3 tests? This matters (1/n influence on total score, or 3/n influence on total score) • The scale determined by the standard deviation • Highly variable items / scales / tests are weighted less • Less variable items / scales / tests are weighted more • Is this what we would want? • Wouldn’t we want to incorporate information about the relative difficulty of different tests?

  10. Linearity • Hidden in z scores is an assumption that 1 SD difference in scores has the same meaning related to underlying domain measured by the test in all regions (Usually “ability” for neuropsychological tests) • A z score is a transformed sum score • Tests constructed without modern psychometrics tend to have a common structure: most of the items are in the middle

  11. Global cognitive tests

  12. Curvilinear scaling

  13. Curvilinearity in a longitudinal study • Where you start on the curve matters a great deal in how much change there appears to be

  14. Linear scaling 1 Low ability High ability

  15. Linear scaling 2 High ability (difficult items) Low ability (easy items) XXXXXXXXX XXXX XXXXX XX

  16. Linear scaling 3 Low ability High ability XXXXXXXXX XXXX XXXXX XX B0 A0

  17. Linear scaling 4 Low ability High ability XXXXXXXXX XXXX XXXXX XX B0 A1 B1 A0

  18. Linear scaling 5 Low ability High ability XXXXXXXXX XXXX XXXXX XX B0 A1 B1 A0 11 “at risk” points 1 “at risk” point

  19. Linear scaling 6 Low ability High ability XXXXXXXXX XXXX XXXXX XX +2 -2… -1 +1 0 …+3 Mean=7, SD=5

  20. Same example with a different population Low ability High ability XXXXXXXXX XXXX XXXXX XX -6 -5-4-3-2-1 0 +1 +2 +3 Mean=13, SD=2

  21. Same issue with Fluency • Let’s say in a population the mean of /F/ is 12 in 1 minute, SD 3 • Using a z score implies difference in implication between 3 and 6 words is the same as the difference in implication between 33 and 36 words (1 SD unit difference) • 3: really awful. 6: pretty bad. • 33 and 36: both really good. Certainly 33 and 36 not qualitatively as different as 3 and 6 are • Similarly, difference between 3 and 12 (awful and average) is the same as between 36 and 45 (really good and superb) (3 SD units)

  22. Bias in the rate of change

  23. Zero in z scores • Average score is 0 for each test • Weights for scores different from 0 determined by the variability (in the form of the SD), not the relative difficulty of the test • Is this what we would want?

  24. Summary: issues with z scores • Dimensionality: Should we lump these items / scales / tests together? • Equal weighting: Should each item / scale / test receive equal weight in the overall composite score? • Scales based on variability: Is it appropriate to base the scale on the observed SD in the population? • Equal difficulty: Are all of the items / scales / tests equally difficult? • Linearity: Is the relationship with the underlying construct measured by the test the same across the entire spectrum?

  25. Z scores vs. IRT scores • IRT scores offer more flexibility; linear scaling • Weighting based on relative difficulties of different tests • (Different handling of demographic heterogeneity) • Facilitates specific attention to measurement error / precision

  26. 2 head to head studies: Study 1 • FH 2005, in press at JINS • Executive functioning battery added to SENAS • Subset had MRI evaluations • Compared IRT to z scores head to head in terms of strength of relationship with neuroimaging parameters • Demographic heterogeneity

  27. Ability … … i i i i 1 n n+1 n+m Demographics Items Items with without DIF DIF Composite Score MRI Conceptual model

  28. Findings • Strength of relationship of executive functioning composite with MRI was similar for IRT scores as for composite z score • Accounting for heterogeneity in ages using adjusted z scores decreased strength of relationship • Accounting for ethnicity / language, education, and gender did not impair strength of relationship • Accounting for heterogeneity using IRT and DIF did not impact strength of relationship

  29. Study 2 • Convenience sample with three known groups: AD, impaired cognition with no dementia, and normal cognition • Neuropsychological battery administered, including several measures of executive functioning

  30. Digits backwards from the CASI • I think it’s items in the CASI • How to score these items? (not clear) • Does it matter? (absolutely)

  31. Digits backwards Score 1: more credit for 4 digits than 3 digits Score 2: equal credit for 4 digits and 3 digits Score 3: more credit for 4 digits than 3 digits, lots of points for both Score 4: more credit for 4 digits than 3 digits, LOTS of points for both

  32. Digits backwards Score 1: more credit for 4 digits than 3 digits Score 2: equal credit for 4 digits and 3 digits Score 3: more credit for 4 digits than 3 digits, lots of points for both Score 4: more credit for 4 digits than 3 digits, LOTS of points for both

  33. Strength of relationship with cognitive impairment IRT scores were at least as good as z scores Demographic adjustment in the z score framework was a bad idea Demographic adjustment in the IRT framework was not as bad an idea

  34. Conclusions • Many theoretical reasons latent trait scores (such as IRT) would be preferred to classical test theory scores (such as z scores) • Here 2 specific examples of relative validity of executive functioning composites • Theoretically and practically better approach to demographic heterogeneity

More Related