Psychometric analyses of ADNI data

Psychometric analyses of ADNI data Paul K. Crane, MD MPH Department of Medicine University of Washington

Disclaimer • Funding for this conference was made possible, in part by Grant R13 AG030995 from the National Institute on Aging. • The views expressed do not necessarily reflect official policies of the Department of Health and Human Services; nor does mention by trade names, commercial practices, or organizations imply endorsement by the U.S. Government.

Outline • ADNI neuropsychological battery • Latent variable approaches • SEM and IRT • ADAS-Cog in ADNI • Memory in ADNI • Executive functioning in ADNI

ADNI Neuropsychological Battery

Handout • There is a handout that summarizes these tests and provides the variable names for the variables in the dataset

ADNI Neuropsychological Battery

MCI

Alternate Word Lists There are also two versions of the Rey AVLT that are alternated

Summary • Repeated administration of a rich neuropsychological battery at 6 month intervals for 2 (AD) or 3 (NC, MCI) years • How do we drink from that fire hose?

Strategies for analyzing these data • Pick a couple of tests and ignore the others • ADAS-Cog and MMSE • CDR and CDR-SB • Modifications of those tests • ADAS-Tree • ADAS-Rasch • Composite scores for specific domains • Z score • Something fancier using latent variable approach

Outline • ADNI neuropsychological battery • Latent variable approaches • SEM and IRT • ADAS-Cog in ADNI • Memory in ADNI • Executive functioning in ADNI

Latent variable approach • “Items” not intrinsically interesting, only as indicators of the underlying thing measured by the test • Many nice properties follow

Parallel development 1: SEM • “Measurement part” of the model specifies how latent constructs are modeled • “Structural part” of the model specifies relationships between latent constructs and each other, and between latent constructs and other covariates

http://sites.google.com/site/lvmworkshop/home/downloads-general/2010-downloadshttp://sites.google.com/site/lvmworkshop/home/downloads-general/2010-downloads

Bunch of indicators limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 ldeltotal avdel30min avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl

“Memory” limmtotal avtot1 avtot2 avtot3 avtot4 avtot5 avtotb avtot6 Memory ldeltotal avdel30min Underlying single factor with many indicators avdeltot cot1scor cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl

limmtotal LM story avtot1 avtot2 avtot3 Rey word list avtot4 avtot5 avtotb avtot6 Memory ldeltotal avdel30min avdeltot cot1scor ADAS word list cot2scor cot3scor cot4totl mmballdl MMSE words mmflagdl mmtreedl

limmtotal LM story avtot1 HC volume avtot2 avtot3 Rey word list avtot4 avtot5 avtotb avtot6 Memory ldeltotal avdel30min avdeltot This is what we care about! cot1scor ADAS word list cot2scor cot3scor cot4totl mmballdl MMSE words mmflagdl mmtreedl

limmtotal avtot1 HC volume avtot2 avtot3 avtot4 avtot5 avtotb avtot6 Memory* ldeltotal avdel30min avdeltot This is what we care about! cot1scor … and typically don’t care whether memory is modeled this way or with all that secondary structure cot2scor cot3scor cot4totl mmballdl mmflagdl mmtreedl

limmtotal LM story avtot1 HC volume avtot2 avtot3 Rey word list avtot4 avtot5 avtotb avtot6 Memory ldeltotal avdel30min avdeltot This is what we care about! cot1scor And sometimes we care about it for 600,000 SNPs, or for voxels – we need to move outside of an SEM package for some of the analyses we want to do ADAS word list cot2scor cot3scor cot4totl mmballdl MMSE words mmflagdl mmtreedl

Parallel development 2: IRT • Models are nested within SEM • Single factor confirmatory factor analysis model • Initially worked out with binary indicators • Extended 1960s to polytomous items (Samejima) • It’s only the measurement part • Attention to the quality of measurement and the quality of scores

Typical SEM example Indicator 1 Indicator 2 Construct Indicator 3 Indicator 4

Depression Beck Zung Depression CESD PHQ-9

A closer look at PHQ-9 • A 9 item depression scale • Standard scores totalled up • Typical SEM model would take that total score and treat it as a continuous indicator by using a linear link (single loading parameter) PHQ-9

PHQ-9 measurement properties

IRT approach to PHQ-9

SEM and IRT, then and now • SEM was initially about total scores as indicators of constructs measured in common across tests • IRT was initially about item level data that had to satisfy assumptions • More recently: merging the strengths of the two approaches • Computational rather than conceptual advances, or maybe computational advances have fueled conceptual advances

Categorical data in SEM • Runmplus code: • That little code snippet tells Mplus to treat all of the elements in the local `vlist’ as categorical data • Mplus default is WLSMV • Other appropriate ways of handling • Major reason Mplus is dominant SEM software used at FH

What about IRT? • Array of tools to address measurement precision • Explicit focus on measurement properties and measurement precision differentiates it from SEM

Pretend for a moment that a single factor model was appropriate… • Item response theory (IRT) developed middle of last century • Lord and Novick / Birnbaum (1968) • Polytomous extension Samejima 1969 • Lord 1980 • Hambleton et al. 1991 • XCALIBRE, Parscale, Multilog • All variations on a single factor CFA model

4 items each at 0.5 increments

Comments on that test • Essentially linear test characteristic curve • Immaterial whether the standard score or the IRT score is used in analyses • No ceiling or floor effect • People at the extremes of the thing measured by the test will get some right and get some wrong • Pretty nice test!

2 items each at 0.5 increments

Comments on that test • Essentially linear test characteristic curve • Immaterial whether the standard score or the IRT score is used in analyses • No ceiling or floor effect • People at the extremes of the thing measured by the test will get some right and get some wrong • Pretty nice test! • But that’s what we said about the last one and it had twice as many items!

Why might we want twice as many items? • Measurement precision / reliability • CTT: summarized in a single number: alpha • IRT: conceptualized as a quantity that may vary across the range of the test • Information • Mathematical relationship between information and standard error of measurement • Intuitively makes sense that a test with 2x the items will measure more precisely / more reliably than a test with 1x the items

Test information curves for those two tests

Standard errors of measurement for the two tests

Comments about these information and SEM curves • Information curves look more different than the SEM curves • Inverse square root relationship • TIC 100  SEM 0.10 (1/10) • TIC 25  SEM 0.20 (1/5) • TIC 16  SEM 0.25 (1/4) • TIC 9  SEM 0.33 (1/3) • TIC 4  SEM 0.50 (1/2) • Trade off between test length and measurement precision

These were highly selected “tests” • It would be possible to design such a test if we started with a robust item pool • Almost certainly not going to happen by accident / history • What are more realistic tests?

Test characteristic curves for 2 26-item dichotomous tests

Comments on these TCCs • Same number of items but very different shapes • Now it may matter whether you use an IRT score or a standard score in analyses • Both ceiling and floor effects

TICs

SEMs

Comments on the TICs and SEMs • Comparing the red test and the blue test: the red test is better for people of moderate ability (more items close to where they are) • For people right in the middle, measurement precision is just as good as a test twice as long • Items far away from your ability level don’t help your standard error • The blue test is better for people at the extremes (more items close to where they are)

Where do information curves come from? • Item information curves use the same parameters as the item characteristic curves (difficulty level, b, and strength of association with latent trait or ability, a) (see next slides) • Test information is the sum of all of the item information curves • We can do that because of local independence

I(θ) = D2a2*P(θ)*Q(θ)

Psychometric analyses of ADNI data

Psychometric analyses of ADNI data

Presentation Transcript

Integrated Analyses of Safety Data needed!

Analyses of qualitative data

ADNI Genetics Update

Psychometric assessment

ADNI Biomarker Core

Psychometric testing

Ongoing microRNA data analyses

Group analyses of fMRI data

Problems in Data Analyses

The Wraparound Fidelity Assessment System Psychometric Analyses to Support Refinement of the

PSYCHOMETRIC TESTING

Group analyses of fMRI data

Validation analyses of IEAF-2001 data

Rural Analyses of Commuting Data

Psychometric Examination

Data analyses

Data Analyses

ADNI Clinical Core

Update of China-ADNI