1.16k likes | 1.18k Views
This paper discusses psychometric analyses of ADNI data, including the ADNI neuropsychological battery, latent variable approaches, SEM and IRT methods, and specific analyses of memory and executive functioning. The paper also explores ways to handle categorical data in SEM.
E N D
Psychometric analyses of ADNI data Paul K. Crane, MD MPH Department of Medicine University of Washington
Disclaimer • Funding for this conference was made possible, in part by Grant R13 AG030995 from the National Institute on Aging. • The views expressed do not necessarily reflect official policies of the Department of Health and Human Services; nor does mention by trade names, commercial practices, or organizations imply endorsement by the U.S. Government.
Outline • ADNI neuropsychological battery • Latent variable approaches • SEM and IRT • ADAS-Cog in ADNI • Memory in ADNI • Executive functioning in ADNI
Handout • There is a handout that summarizes these tests and provides the variable names for the variables in the dataset
Alternate Word Lists There are also two versions of the Rey AVLT that are alternated
Summary • Repeated administration of a rich neuropsychological battery at 6 month intervals for 2 (AD) or 3 (NC, MCI) years • How do we drink from that fire hose?
Strategies for analyzing these data • Pick a couple of tests and ignore the others • ADAS-Cog and MMSE • CDR and CDR-SB • Modifications of those tests • ADAS-Tree • ADAS-Rasch • Composite scores for specific domains • Z score • Something fancier using latent variable approach
Outline • ADNI neuropsychological battery • Latent variable approaches • SEM and IRT • ADAS-Cog in ADNI • Memory in ADNI • Executive functioning in ADNI
Latent variable approach • “Items” not intrinsically interesting, only as indicators of the underlying thing measured by the test • Many nice properties follow
Parallel development 1: SEM • “Measurement part” of the model specifies how latent constructs are modeled • “Structural part” of the model specifies relationships between latent constructs and each other, and between latent constructs and other covariates
http://sites.google.com/site/lvmworkshop/home/downloads-general/2010-downloadshttp://sites.google.com/site/lvmworkshop/home/downloads-general/2010-downloads
Bunch of indicators limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl
“Memory” limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 Memory ldeltotal avdel30min Underlying single factor with many indicators avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl
limmtotal LM story avtot1 avtot2 avtot3 Rey word list avtot4 avtot5 avtotb avtot6 Memory ldeltotal avdel30min avdeltot cot1scor ADAS word list cot2scor cot3scor cot4totl mmballdl MMSE words mmflagdl mmtreedl
limmtotal LM story avtot1 HC volume avtot2 avtot3 Rey word list avtot4 avtot5 avtotb avtot6 Memory ldeltotal avdel30min avdeltot This is what we care about! cot1scor ADAS word list cot2scor cot3scor cot4totl mmballdl MMSE words mmflagdl mmtreedl
limmtotal avtot1 HC volume avtot2 avtot3 avtot4 avtot5 avtotb avtot6 Memory* ldeltotal avdel30min avdeltot This is what we care about! cot1scor … and typically don’t care whether memory is modeled this way or with all that secondary structure cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl
limmtotal LM story avtot1 HC volume avtot2 avtot3 Rey word list avtot4 avtot5 avtotb avtot6 Memory ldeltotal avdel30min avdeltot This is what we care about! cot1scor And sometimes we care about it for 600,000 SNPs, or for voxels – we need to move outside of an SEM package for some of the analyses we want to do ADAS word list cot2scor cot3scor cot4totl mmballdl MMSE words mmflagdl mmtreedl
Parallel development 2: IRT • Models are nested within SEM • Single factor confirmatory factor analysis model • Initially worked out with binary indicators • Extended 1960s to polytomous items (Samejima) • It’s only the measurement part • Attention to the quality of measurement and the quality of scores
Typical SEM example Indicator 1 Indicator 2 Construct Indicator 3 Indicator 4
Depression Beck Zung Depression CESD PHQ-9
A closer look at PHQ-9 • A 9 item depression scale • Standard scores totalled up • Typical SEM model would take that total score and treat it as a continuous indicator by using a linear link (single loading parameter) PHQ-9
SEM and IRT, then and now • SEM was initially about total scores as indicators of constructs measured in common across tests • IRT was initially about item level data that had to satisfy assumptions • More recently: merging the strengths of the two approaches • Computational rather than conceptual advances, or maybe computational advances have fueled conceptual advances
Categorical data in SEM • Runmplus code: • That little code snippet tells Mplus to treat all of the elements in the local `vlist’ as categorical data • Mplus default is WLSMV • Other appropriate ways of handling • Major reason Mplus is dominant SEM software used at FH
What about IRT? • Array of tools to address measurement precision • Explicit focus on measurement properties and measurement precision differentiates it from SEM
Pretend for a moment that a single factor model was appropriate… • Item response theory (IRT) developed middle of last century • Lord and Novick / Birnbaum (1968) • Polytomous extension Samejima 1969 • Lord 1980 • Hambleton et al. 1991 • XCALIBRE, Parscale, Multilog • All variations on a single factor CFA model
Comments on that test • Essentially linear test characteristic curve • Immaterial whether the standard score or the IRT score is used in analyses • No ceiling or floor effect • People at the extremes of the thing measured by the test will get some right and get some wrong • Pretty nice test!
Comments on that test • Essentially linear test characteristic curve • Immaterial whether the standard score or the IRT score is used in analyses • No ceiling or floor effect • People at the extremes of the thing measured by the test will get some right and get some wrong • Pretty nice test! • But that’s what we said about the last one and it had twice as many items!
Why might we want twice as many items? • Measurement precision / reliability • CTT: summarized in a single number: alpha • IRT: conceptualized as a quantity that may vary across the range of the test • Information • Mathematical relationship between information and standard error of measurement • Intuitively makes sense that a test with 2x the items will measure more precisely / more reliably than a test with 1x the items
Comments about these information and SEM curves • Information curves look more different than the SEM curves • Inverse square root relationship • TIC 100 SEM 0.10 (1/10) • TIC 25 SEM 0.20 (1/5) • TIC 16 SEM 0.25 (1/4) • TIC 9 SEM 0.33 (1/3) • TIC 4 SEM 0.50 (1/2) • Trade off between test length and measurement precision
These were highly selected “tests” • It would be possible to design such a test if we started with a robust item pool • Almost certainly not going to happen by accident / history • What are more realistic tests?
Comments on these TCCs • Same number of items but very different shapes • Now it may matter whether you use an IRT score or a standard score in analyses • Both ceiling and floor effects
Comments on the TICs and SEMs • Comparing the red test and the blue test: the red test is better for people of moderate ability (more items close to where they are) • For people right in the middle, measurement precision is just as good as a test twice as long • Items far away from your ability level don’t help your standard error • The blue test is better for people at the extremes (more items close to where they are)
Where do information curves come from? • Item information curves use the same parameters as the item characteristic curves (difficulty level, b, and strength of association with latent trait or ability, a) (see next slides) • Test information is the sum of all of the item information curves • We can do that because of local independence