Continuous heterogeneity

Continuous heterogeneity Shaun Purcell Boulder Twin Workshop March 2004

MZ 1.03 0.87 0.98 DZ 0.95 0.57 1.08 Raw data VS summary statistics Zyg T1 T2 1 1.2 0.8 1 -1.3 -2.2 2 0.7 1.9 2 0.2 -0.8 .. ... ...

Raw data VS summary statistics Zyg T1 T2 1 1.2 0.8 1 -1.3 -2.2 2 0.7 1.9 2 0.2 -0.8 .. ... ...

Raw data VS summary statistics Zyg T1 T2 age 1 1.2 0.8 12.3 1 -1.3 -2.2 10.3 2 0.7 1.9 8.7 2 0.2 -0.8 14.5 .. ... ... ...

Data Mean Variance Bivariate normal distribution

Introducing Definition variables • Zygosity as a definition variable • “Rectangular” file data.raw 1 1 0.361769 -0.35641 2 1 0.888986 1.46342 3 1 0.535161 0.636073 ... 1 2 0.234099 0.0848318 2 2 -0.547252 -0.22976 3 2 -0.307926 -0.253692 ...

M, necessary for the means model H will be specified as a definition variable Optional: request individual fit statistics for each pair A single group for both MZ & DZ twins No need to specify number of pairs Points to a “REctangular” data file Zygosity is a “Definition” variable A model for the means [ twin1 | twin 2] Multiply A component by 1/H 1 x 1 matrix H represents each pair’s zygosity !Using definition variables Group1: Defines Matrices Calc NGroups=2 Begin Matrices; X Lower 1 1 free Y Lower 1 1 free Z Lower 1 1 free M full 1 1 free H Full 1 1 End Matrices; Begin Algebra; A= X*X'; C = Y*Y'; E = Z*Z'; End Algebra; Ma X 0 Ma Y 0 Ma Z 1 Ma M 0 Options MX%P=rawfit.txt End Group2: MZ & DZ twin pairs Data NInput_vars=4 NObservations=0 RE file=data.raw Labels id zyg t1 t2 Select t1 t2 zyg / Definition zyg / Matrices = Group 1 Means M | M / Covariances A + C + E | (H~)@A + C _ (H~)@A + C | A + C + E / Specify H -1 End

Output from zyg.mx RE FILE=DATA.RAW Rectangular continuous data read initiated NOTE: Rectangular file contained 500 records with data that contained a total of 2000 observations LABELS ID ZYG T1 T2 SELECT T1 T2 ZYG / DEFINITION ZYG / NOTE: Selection yields 500 data vectors for analysis NOTE: Vectors contain a total of 1500 observations NOTE: Definition yields 500 data vectors for analysis NOTE: Vectors contain a total of 1000 observations

Output from zyg.mx Summary of VL file data for group 2 ZYG T1 T2 Code -1.0000 1.0000 2.0000 Number 500.0000 500.0000 500.0000 Mean 1.5000 -0.0140 0.0240 Variance 0.2500 0.5601 0.5211 Minimum 1.0000 -2.1941 -1.9823 Maximum 2.0000 2.1218 2.7670

Output from zyg.mx MATRIX H This is a FULL matrix of order 1 by 1 1 1 -1 MATRIX M This is a FULL matrix of order 1 by 1 1 1 4 MATRIX X This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 1 MATRIX Y This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 2 MATRIX Z This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 3 Specify H -1

Output from zyg.mx Your model has 4 estimated parameters and 1000 Observed statistics -2 times log-likelihood of data >>> 2134.998 Degrees of freedom >>>>>>>>>>>>>>>> 996 • Fixing X to zero Your model has 3 estimated parameters and 1000 Observed statistics -2 times log-likelihood of data >>> 2154.626 Degrees of freedom >>>>>>>>>>>>>>>> 997

Continuous moderators • Traits often best defined continuously • Many environmental moderators also likely to be continuous in nature • Age • Gestational age • Socio-economic status • Educational level • Consumption of food / alcohol / drugs • How to test for G x E interaction in this case?

Continuous moderators Heritability • Problems? • Stratification of sample  reduced sample size • Modelling proportions of variance • implicitly assumes equality of variance w.r.t moderator • Logical to assume a linear G  E interaction • linearity at the level of effect, not variance • No obvious statistical test for heterogeneity 100% 0% Age (yrs) 4 6 8 10

Biometrical G  E model • At a hypothetical single locus • additive genetic value a • allele frequency p • QTL variance 2p(1-p)a2 • Assuming a linear interaction • additive genetic value a + M • allele frequency p • QTL variance 2p(1-p)(a + M)2

Interaction Equivalently…  2 1 1  1 1 -  M M Biometrical G  E model No interaction a 0 -a M AA Aa aa

Model-fitting approach to GxE A C E A C E c a e a c e Twin 1 Twin 2

Model-fitting approach to GxE A C E A C E c a+XM e a+XM c e Twin 1 Twin 2 Continuous moderator variableM Can be coded 0 / 1 in the dichotomous case

Individual specific moderators A C E A C E c a+XM1 e a+XM2 c e Twin 1 Twin 2

E x E interactions A C E A C E c+YM1 c+YM2 a+XM1 a+XM2 e+ZM1 e+ZM2 Twin 1 Twin 2

ACE - XYZ - M A C E A C E c+YM1 c+YM2 a+XM1 a+XM2 e+ZM1 e+ZM2 m+MM1 m+MM2 Twin 1 Twin 2 M M Main effects and moderating effects statistically and conceptually distinct

Model-fitting approach to GxE C Component of variance A E Moderator variable

Turkheimer et al (2003) • 320 twin pairs recruited at birth from urban hospitals • G : additive genetic variance • E : SES • parental education, occupation, income • X : IQ • Wechsler; Verbal, Performance, Full

C E A Full scale IQ Verbal IQ Non-Verbal IQ

Standard model • Means vector • Covariance matrix

Allowing for a main effect of X • Means vector • Covariance matrix

! Basic model + main effect of a definition variable G1: Define Matrices Data Calc NGroups=3 Begin Matrices; A full 1 1 free C full 1 1 free E full 1 1 free M full 1 1 free ! grand mean B full 1 1 free ! moderator-linked means model H full 1 1 R full 1 1 ! twin 1 moderator (definition variable) S full 1 1 ! twin 2 moderator (definition variable) End Matrices; Ma M 0 Ma B 0 Ma A 1 Ma C 1 Ma E 1 Matrix H .5 Options NO_Output End

G2: MZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 1 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | A*A' + C*C' _ A*A' + C*C' | A*A' + C*C' + E*E' / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 Options NO_Output End

G3: DZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 2 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | H@A*A' + C*C' _ H@A*A' + C*C' | A*A' + C*C' + E*E' / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 End

MATRIX A This is a FULL matrix of order 1 by 1 1 1 1.3228 MATRIX B This is a FULL matrix of order 1 by 1 1 1 0.3381 MATRIX C This is a FULL matrix of order 1 by 1 1 1 1.1051 MATRIX E This is a FULL matrix of order 1 by 1 1 1 0.9728 MATRIX M This is a FULL matrix of order 1 by 1 1 1 0.1035 Your model has 5 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> 3123.925 Degrees of freedom >>>>>>>>>>>>>>>> 795

MATRIX A This is a FULL matrix of order 1 by 1 1 1 1.3078 MATRIX B This is a FULL matrix of order 1 by 1 1 1 0.0000 MATRIX C This is a FULL matrix of order 1 by 1 1 1 1.1733 MATRIX E This is a FULL matrix of order 1 by 1 1 1 0.9749 MATRIX M This is a FULL matrix of order 1 by 1 1 1 0.1069 Your model has 4 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> 3138.157 Degrees of freedom >>>>>>>>>>>>>>>> 796

Continuous heterogeneity model • Means vector • Covariance matrix

! GxE - Basic model G1: Define Matrices Data Calc NGroups=3 Begin Matrices; A full 1 1 free C full 1 1 free E full 1 1 free T full 1 1 free ! moderator-linked A component U full 1 1 free ! moderator-linked C component V full 1 1 free ! moderator-linked E component M full 1 1 free ! grand mean B full 1 1 free ! moderator-linked means model H full 1 1 R full 1 1 ! twin 1 moderator (definition variable) S full 1 1 ! twin 2 moderator (definition variable) End Matrices; Ma T 0 Ma U 0 Ma V 0 Ma M 0 Ma B 0 Ma A 1 Ma C 1 Ma E 1 Matrix H .5 Options NO_Output End

G2: MZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 1 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | (A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _ (A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 Options NO_Output End

G3: DZ Data NInput_vars=6 NObservations=0 Missing =-999 RE File=f1.dat Labels id zyg p1 p2 m1 m2 Select if zyg = 2 / Select p1 p2 m1 m2 / Definition m1 m2 / Matrices = Group 1 Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | H@(A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _ H@(A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) / !twin 1 moderator variable Specify R -1 !twin 2 moderator variable Specify S -2 End

Practical 1 • The script: mod.mx • The data: f1.dat ID zygosity trait_twin_1 trait_twin_2 mod_twin_1 mod_twin_2 • Any evidence for G × E for this trait ? • i.e. does the A latent variable show heterogeneity with respect to the moderator variable • If so, in what way? • i.e. how would you interpret/describe the effect?

Practical 1 : f1.dat MZ pairs (trait) Moderator distribution DZ pairs (trait) All twin 1’s v.s. moderator

nomod.mx a 1.3078 a2 ~ 1.7 c 1.1733 c2 ~ 1.4 e 0.9749 e2 ~ 0.95 a2+c2+e2 = 4.05 i.e. % variance is 42%, 35% and 23%

Parameter estimates: mod.mx

Plotting VCs • For the additive genetic VC, for example • Given a,  and a range of values for the moderator variable • For example, a = 0.5,  = -0.2 and M ranges from -2 to +2

Specific test of G×E

Other tests All made against the full model ACE-XYZ-M, -2LL = 3024.689

Confidence intervals • Easy to get CIs for individual parameters • Additionally, CIs on the moderated VCs are useful for interpretation • e.g. a 95% CI for (a+M)2, for a specific M

Define two extra vectors in Group 1 P full 1 13 O Unit 1 13 Matrix P -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 • Add a 4th group to calculate the CIs CIs Calc Matrices = Group 1 Begin Algebra; F= ( A@O + T@P ) . ( A@O + T@P ) / G= ( C@O + U@P ) . ( C@O + U@P ) / I= ( E@O + V@P ) . ( E@O + V@P ) / End Algebra; Interval @ 95 F 1 1 to F 1 13 Interval @ 95 G 1 1 to G 1 13 Interval @ 95 I 1 1 to I 1 13 End;

Calculation of CIs F= ( A@O + T@P ) . ( A@O + T@P ) / • E.g. if P were then ( A@O + T@P ) equals or or Finally, the dot-product squares all elements to give

Confidence intervals on VCs A E C

Other considerations • Simple approach to test for heterogeneity • easily adapted, e.g. for ordinal data models • Extensions / things to watch for… • scalar v.s. qualitative heterogeneity • v. low power • the environment may show shared genetic influence with the trait • nonlinear effects in both mediation and moderation

rGE G Moderating G  E Main effect E X

Turkheimer et al, 2003 IQ SES V(IQ) SES

Continuous heterogeneity