490 likes | 595 Views
The Psychometrics Behind Neurocognitive Evaluation for Concussion. Philip Schatz, PhD Department of Psychology Saint Joseph ’ s University schatzSJU@gmail.com. Disclosures. Consulting/support: International Brain Research Foundation Department of Defense
E N D
The Psychometrics Behind Neurocognitive Evaluation for Concussion Philip Schatz, PhD Department of Psychology Saint Joseph’s UniversityschatzSJU@gmail.com
Disclosures • Consulting/support: • International Brain Research Foundation • Department of Defense • Sports Concussion Center of New Jersey • ImPACT Applications, Inc. • Disclaimer: • No role in the conceptualization, design, collection or • analysis of data, manuscript preparation or decision to • submit for publication.
Overview • Basics of correlation and variance • Psychometric properties of concussion tests, in context of: • common psychological tests • other tests • Psychometric properties of a two-factor theory of concussion
Reliability vs.Validity Highly Reliable and Valid Highly Reliable but Not Valid Neither Reliable or Valid
Psychometric Issues • Reliability in a nutshell: • Test-retest reliability assumes: • Fluctuations/changes are due to deficiencies in measure • Human behavior does not deviate from Time 1->Time 2 • We are measuring traits and not states
Psychometric Issues • Test-retest reliability assumes: • Fluctuations/changes are due to deficiencies in measure • Human behavior does not deviate from Time 1->Time 2 • We MAY BE measuring states and not traits • Broglio, et al., 2007: 118 student “volunteers” completed: • ImPACT, HeadMinders, CogSport, MACT • One test session • 40 subjects (34%) had invalid baselines • Cameron, Schatz (unpublished thesis): 90 student “volunteers” completed ImPACT back-to-back • One test session • 18 subjects (20%) had invalid baselines • An additional 15 subjects (21%) had “red flag” scores (<1.5 SD)
Psychometric Issues • Reliability in a nutshell: • Random error: • Situational fluctuations or changes in mood or environment • sleep, fatigue, diet, metabolism • distractions, noise, equipment
Psychometric Issues • Random error: • Situational fluctuations or changes in mood or environment • sleep, fatigue, diet, metabolism • Athletes sleeping <7 hrsperformed worse on 3/4 ImPACT composite scores, and endorse more symptoms (McClure, et al, In Review, AJSM) • distractions, noise, equipment • Athletes in Group Setting scored significantly worse than athletes tested in Individual setting on: • Verbal: 83.4 vs 86.5 (p=.003) • Visual: 71.6 vs76.7 (p=.0001) • Motor: 35.6 vs38.4 (p=.0001) • RT: 0.61 vs0.57 (p=.001) • (Moser et al., 2001, AJSM)
Psychometric Issues • Reliability in a nutshell: • Systematic error: • Factors that consistently effect measurement across sample • practice effects • increased exposure, familiarity with measure, device
Psychometric Issues • Evidence of Systematic error on ImPACT?: • e.g., practice effects • No significant differences on Any Composite Score: • Back-to-back Administrations (Cameron, Schatz: MS thesis) • Pre-Season->Mid-Season->Post-Season (Miller et al., 2007) • Significant improvement on: • Processing Speed at 30 days, 1 year (Schatz, Ferris, 2013, Elbin et al, 2011) • Vis Memory, RT at 1 year (Elbin et al, 2011)
Psychometric Issues • Reliability in a nutshell: • Can we measure or distinguish between specific types of “error” • Can we measure or distinguish between specific “error” at X1 versus X2?
Psychometric Issues • Reliability in a nutshell: • Can we measure or distinguish between specific types of “error” • Can we measure or distinguish between specific “error” at X1 versus X2? • Cameron(unpublished MS thesis): 90 student “volunteers” completed ImPACT back-to-back • Using Iverson’s RCI cut-offs: • 8% showed significant decreases at T2 on Verbal Mem • 8% showed significant decreases at T2 on Visual Mem • 7% showed significant decreases at T2 on Motor Speed • 7% showed significant increasesat T2 on Reaction Time • 26% showed significantly worse performance at T2 on 1 composite score
Psychometric Issues: Reliability • How do we measure reliability?Pearson’s r?Intra-class correlations? • “There is literally no such thing as the reliability of a test, unqualified; the coefficient has meaning only when applied to specific populations”Streiner and Norman, 1995
Psychometric Issues: Reliability • How do we measure reliability? • Pearson’s r: • general measure of strength of linear relationship considered a weak measure of reliability when • group means are similar but • there is variation in individual scores • does not allow for correlation of multiple trials • “inter-class” correlation, does not account for variation within trials • cannot detect “systematic error” (e.g., practice effects; Weir, 2005)
Psychometric Issues: Reliability • How do we measure reliability? • Pearson’s r: Example • considered a weak measure of reliability when group means are similar but there is variation in individual scores • Back-to-back administrations of ImPACT • Similar Group Means: 94.5 to 92.7 • Similar Standard Deviation: 4.8 to 5.6 • t(48)=1.22. p=.23 • r=.01
Psychometric Issues: Reliability • How do we measure reliability? • Intra-Class Correlation Coefficient (ICC): • originally developed for analysis of “inter-judge” (inter-rater) effects • large differences between “judges” will result in low coefficients • indicates proportion of variability in the measure (e.g., mean) that is due to variation between individuals • as applied to test-retest reliability • ICC is used to analyze “trial-to-trial” consistency • Thus, reflective of the reliability of the measure
Psychometric Issues: Reliability Five published articles on reliability of ImPACT, listed chronologically: Iverson, G., Lovell, M. R., & Collins, M. W. (2003). Interpreting change on ImPACT following sport concussion. ClinNeuropsychol. Broglio, S. P., Ferrara, M. S., Macciocchi, S. N., Baumgartner, T. A., & Elliott, R. (2007). Test-retest reliability of computerized concussion assessment programs. J Athl Train Schatz, P. (2009). Long-term test-retest reliability of baseline cognitive assessments using ImPACT. Am J Sports Med Elbin, R. J., Schatz, P., & Covassin, T. (2011). One-Year Test-Retest Reliability of the Online Version of ImPACT in High School Athletes. Am J Sports Med Schatz, P., Ferris. C. (2013). One-month test-retest reliability of the ImPACT test battery. Arch ClinNeuropsych
Psychometric Issues: Reliability • Update to Broglio’s 2007 study: • Nakayama (MSU Dissertation) replicated Broglio’s 2007 study using only ImPACT. • Nakayama used ACSM standard for “athletically active” • 75mod-150vig min/wk cardio, 2-3 days/wk resistance training • <3% of subjects had Invalid results (vs. 34% for Broglio) • Higher ICCs across all Composite scores
Working Memory: Reliability Data Test-retest reliability of other Working Memory measures: ImPACT(VrM) 1 month ICC .79 Schatz, Ferris, 2013 ImPACT (VrM) 45 days ICC .76Nakayama, 2013 ImPACT(VrM) 1 year ICC .62 Elbin, et al, 2011 CogSport (WM):1 year ICC .51 Collie, et al., 2001 ImPACT (VrM) 2 years ICC .46 Schatz, 2010 ANAM (CPT) 1 week ICC .32 Segalowitz, et al 2007 CogSport (WM) 1 hour ICC .24 Collie, et al., 2001 Digit Span 60 days r .70 Barr, et al., 2003 WMS (LM) 11 months r .70 Tulsky, et al., 2003 WMS (VR) 11 months r .62 Tulsky, et al., 2003 WMS (PA) 11 months r .57 Tulsky, et al., 2003 RAVLT 1 year r .55 Snow, et al, 1988 RVDLT-R 1 month r .45 Benedict, 1997
Reaction Time: Reliability Data Test-retest reliability of Reaction Time measures: ImPACT (RT) 1 month ICC .77 Schatz, Ferris, 2013 ImPACT (RT) 1 year ICC .76 Elbin, et al, 2011 CogSport (RT) 1 week ICC .76 Collie, et al., 2001 ImPACT (RT) 2 years ICC .68 Schatz, 2010 ImPACT (RT) 45 days ICC .68 Nakayama, 2013 Analog (RT) 1 year ICC .65 Eckner et al, 2011 CogState (RT): 1 year ICC .51 Eckner et al, 2011 ANAM (RT) 1 week ICC .46 Segalowitz, et al 2007 CPT-II (child) 6 months ICC .65 Zabel, et al., 2009 BOT* (adults) 1 session ICC .53 Mercer, et al, 2009 CANTAB* (kids) 10 weeks ICC .37 Fisher et al., 2011 Laser 1 session r .99 Matsumura, et al., 2013 *Bruininks-Oseretsky Test of Motor Proficiency, Cambridge Neuropsychological Test Battery
PsychoMotor Speed: Reliability Data Test-retest reliability of other Pro. Speed/Coding measures: ImPACT (PS) 1 month ICC .88 Schatz, Ferris, 2013 ImPACT (PS) 45 days ICC .86Nakayama, 2013 ImPACT (PS) 1 year ICC .82 Elbin, et al, 2011 CogSport (CM) 1 week ICC .76 Collie et al, 2003 ImPACT (PS) 2 years ICC .74 Schatz, 2010 ANAM (CDS) 1 week ICC .54 Segalowitz, et al 2007 SDMT: 10 days r .74 Hinton-Bayre, et al. 1997 Digit Symbol 60 days r .73 Barr, et al., 2003 Tapping: 6 months r .71 Ruff, Parker, 1993 Trails B: 60 days r .65 Valovich2006, Barr 2003 BVMT-R 55 days r .60 Benedict, 1997
Other Tests: Reliability Data Test-retest reliability of other/common tests: Systolic BP 3 months r .50 Diastolic BP 3 months r .53 Heart Rate 4 visits ICC .56 Heart Rate 1 week ICC .74 Gluc. Metabimmediate r .77 BESS 60 days r .70 Field Sobriety/Blood ETOH: Actual BAC immediate r .97 Saliva ETOH 10 mins r .90 1-leg Stand immediate r .61 Arrest Decis. immediate r .54 Est. BAC immediate r .68
“State-Trait” Issues Test-retest reliability of other constructs: (Adult) Manifest Anxiety 1 week .67-.90 Children’s Manifest Anxiety 1 week .54-.76 Trait Anxiety-Adult 1 hour .84(M) .76(F) State Anxiety-Adult 1 hour .33(M) .16(F) Trait Anxiety-Adult 20 days .86(M) .76(F) State Anxiety-Adult 20 days .54(M) .27(F) Trait Anxiety-College 30 days .73(M) .86(F) State Anxiety-College 30 days .51(M) .36(F) Trait Anxiety-Children 30 days .65(M) .71(F) State Anxiety-Children 30 days .31(M) .47(F)
Two-Factor Theory • Rationale • Verbal Memory: • Information presented visually • Can be encoded verbally • Visual Memory: • Information presented visually • Can not be easily encoded verbally • Reaction Time: • Speed of responses: SimpleChoice->Complex Choice • Visual Motor Speed: • Speed of information processing • Confusion in interpretation • Simplified by using “Memory”, “Speed”?
Two-Factor Theory • Factor analysis: • Reduce a larger number of variables to a smaller number of factors • Analogy: see bumps under covers on bed, hear laughing • one “cluster” of bumps moves in one direction • the other “cluster” moves in another direction • identify them as “Child 1” and “Child 2” • each “Child” is a unique “Factor” • Can also be used to select a subset of variables from a larger set, basedon which variables have the highest correlations with the principalcomponents (or factors)
Two-Factor Theory Factor analysis results: Baseline Group (N=22k) Concussion Group (N=560)
Two-Factor Theory Factor analysis results (data from Schatz & Sandel, 2012) Baseline Group Concussion Group
Two-Factor Theory Factor analysis results (data from Schatz & Sandel, 2012)
Two-Factor Theory Calculated Z-scores, using normative data (Mean, SD) for both baseline and post-concussion scores: Baseline: Z= Athlete’s Score – Baseline Mean Baseline SD Post-concussion: Z= Athlete’s Post-concussion Score – Baseline Mean Baseline SD Averaged Verbal/Visual, Visual Motor/Reaction Time
Two-Factor Theory Calculated Z-scores, using normative data (Mean, SD) for both baseline and post-concussion scores:
Validity The extent to which a test measures what it is intended to measure. Traditionally achieved using a criterion group (e.g., clinical, diagnosed) and a control group (e.g., absence of diagnosis) Expressed in terms of “sensitivity” and “specificity” Highly Reliable and Valid
Validity Calculating sensitivity Correct “positive” hits = 81.9% (e.g., the probability that a test result will be positive when a concussion is present)
Validity Calculating specificity Correct “negative” hits = 89.4% (e.g., the probability that a test result will be negative when a concussion is not present)
Validity Data Sensitivity of “concussion” measures: Sensitivity ImPACT(online-72h)91% Schatz, Sandel, 2013 ImPACT(desktop-72h) 82% Schatz, et. al., 2005 PnP, Posture, Sym 96% Broglio, et al., 2007 ImPACT, Posture, Sym 92% Broglio, et al., 2007 ImPACT (desktop-24h) 79% Broglio, et al., 2007 HeadMinder CRI 79% Broglio, et al., 2007 Symptoms 68% Broglio, et al., 2007 Posture 62% Broglio, et al., 2007 Pencil/Paper (battery) 44% Broglio, et al., 2007 BESS, SAC, PnP 56% McCrea, et al., 2005 PnP battery (Day 2) 23% McCrea, et al., 2005
Validity Data Sensitivity/Specificity measures: Sens Spec. ImPACT(ONL-72hr)91% 69% Schatz, Sandel, 2013 ImPACT(DT-72hr) 82%89% Schatz, et. al., 2005 SAC (immediate) 94% 76% McCrea, et al., 2001 RapScrCon, Tr B (24h) 70% 74% DeMonte, et al, 2010 Full Battery (Day 2) 56% 79% McCrea, et al., 2005 ANAM/SOT* 50% 96% Register-Mihalik et al, 2012 Symptoms (Day 2) 27% 100% McCrea, et al., 2005 Symptom Clusters (D2) 47% 77% Lau, et al., 2011 BESS (Day 2) 24% 91% McCrea, et al., 2005 PnP Battery (Day 2) 23% 93% McCrea, et al., 2005 SAC (Day 2) 22% 89% McCrea, et al., 2005 *Sensory Organization Test
Validity Data Sensitivity/Specificity of common medical conditions: Sens Spec. ImPACT(online)91% 69% Schatz, Sandel, 2013 ImPACT(desktop) 82%89% Schatz, et. al., 2005 Oxidative Stress (Alz) 88% 70% Lopez, et al., 2013 HBP (Hypertension) 84% 82% Nascimento, et al., 2011 Mammogram (1yr) 82% 91% Hofvind, et al., 2012 Echocardiogram 77% 61% Tanaka, et al., 2010 Stress Echo 76% 87% Sicari, et al., 2007 Prostate Exam 75% 44% Ojewola, et al, 2013 PSA Test (>4) 72% 46% Rashid, et al, 2012 Cholesterol ‘At-Risk’ 71% 76% Gelsky, et al., 1994 Rapid Strep Test 65% 97% Gurol, et al., 2010
Two-Factor Theory Applied to Validation Data (Schatz & Sandel, 2012): Two-Factor versus composite score and sub-scale score sensitivity and specificity.
“Psychometric” Issues? • Is concussion testing falling under a unique level of scrutiny? • Is there an ulterior motive for the criticism of the psychometric properties of computer-based concussion tests and not other tests? • Would a more reliable instrument be valid (e.g. crystallized intelligence) • Is it necessary to focus solely on one measure (e.g., ImPACT), as part of a more comprehensive assessment, when: • other measures have equal or worse psychometrics • lone measures are not recommended for concussion diagnosis/management
Collaborators: Tracey Covassin, Ph.D. Mickey Collins, Ph.D. RJ Elbin, Ph.D. Robin Karpf, M.D. Anthony Kontos, Ph.D. Mark Lovell, Ph.D. Rosemarie Moser, Ph.D. Summer Ott, Psy.D. Gary Solomon, Ph.D. Student Collaborators: Nicole Cameron Charles Ferris Timothy Kelley Stacey Robertshaw Natalie Sandel