540 likes | 804 Views
Marjo-Riitta J ä rvelin , MD, MSc , PhD, Paediatrician Professor and Chair in Life-Course Epidemiology. Identifying causal pathways in longitudinal analysis using structural equation modelling. NFBC 1966 – 1986 Northern Finland Birth Cohorts. Ralph and Eve Seelye Charitable Trust,
E N D
Marjo-Riitta Järvelin, MD, MSc, PhD, Paediatrician Professor and Chair in Life-Course Epidemiology Identifying causal pathways in longitudinal analysis using structural equation modelling NFBC 1966 – 1986 Northern Finland Birth Cohorts Ralph and Eve Seelye Charitable Trust, Liggins Institute Trust EurHealthAging
Main points of presentation • Life course epidemiology: Why longitudinal approaches? General issues (philosophy), re-cap of study designs • Statistical models: Practical approach – how do you plan your analyses? Introduction into methodology (examples: weight, BP) • In Practice - FTO gene and obesity - Gene clusters (encoding nicotinic acetylcholine receptor subunits / dopamine metabolism), life course and smoking behaviour • Potential biases: Missing data, measurement error
Life course epidemiologyAnalytical issues • Life course epidemiology involves the study of how health is related to factors operating at different stages earlier in life or across generations. • Essentially, aim is to relate a ‘distal’ outcome to various exposures that are temporally ordered and also may: • belong to different dimensions (biological, social ...hierarchical... ) • change over time (when repeated observations are involved) • be causally related • Approach not that new but possible only when data are available over relevant periods. However, completeness, quality and coverage are variable – methodological challences (new ”dawn” of longitudinal epuidemiology)
TO SUMMARISE : Determinants of health over the lifecourse (DEVELOPMENTAL PLASTICITY) Sustainable communities and places Healthy Standard of Living Prevention Maternal genotype Environments postnatal childhood adult prenatal Health/Disease Adverse trait GENOTYPE Accumulation of positive and negative effects on health and wellbeing Examples malnutrition stress Disease Smoking Drinking disease health education diet smoking exercise alcohol marital status housing socioeconomic status Health behaviour Paternal genotype KEY Q: WHAT ARE THE RELATIVE ROLES OF GENETIC AND ENVIRONMENTAL FACTORS? DoHAD = Developmental Origins of Health and Disease
Tens of different phenotypes have been associated with deviant foetal growth (birth weight, BW) and other pregnancy related factors by now - analyses are demanding Development and Disabilities [CP, epilepsy, intelligence] Prediction - Paula’s interest! Musculo-skeletal, dental health Health Behaviour, personality, cognitive function Asthma, atopy, lung function, infections, Immune system Behavioural disorders ADHD Foetal growth/ maternal health Birth weight – as a marker Metabolic disease and intermediate disease markers ; BP, LIPIDS Schizophenia, mental disorders Reproduction, abortions, PCOS, males
Longitudinal settings – key issuesfrom analyses point of view 1) Design of the Study - nature of the data (binary/continuous; accuracy) 2) Longitudinal outcome measures • Linear mixed models to deal with correlated measurements and to allow for individual variation [Growth, blood pressure models, for example] 3) Longitudinal exposures/several exposures over the life-course • ’Life-course epidemiology’
Statistical models (life-course) Aim is to : • relate a distal outcome to factors arising at earlier ages and/or earlier generations Standard multivariable regression approach (example next): • regress the distal outcome on all these factors. This: • gives estimates of effect for each factor, holding the others constant (re-cap – standard linear regression) • not adequate approach to address the aim if we are interested in the web of relations surrounding that exposure Multivariate joint regression approach: • specify the joint distribution(interrelationships) of all the variables in the diagram. That is, define a multivariate (as opposed to multivariable) model that corresponds to the causal diagram (Greenland & Brumback, 2005)
Oulu STUDIES ON BLOOD PRESSURE - Northern Finland 1966 and 1986 Birth Cohort (NFBC) Programme Whole population in the area in 1966 604 000 in 1986 630 000 Study populations 1) Women (parents) and births with expected dates of delivery for year 1966 (N=12,231) and between (thesis in 1969) 2) 1 July 1985 and 30 June 1986 (N= 9479) ~ 1300 km
NFBC 1966 AND 1986 – milestones in data collection NFBC1966 n=12231 96% NFBC 1986 N=9479 99% 12-16gw birth 1y 7 8 14-16 24-29 31 46 (clinics ongoing) • Profs. P Rantakallio, A-L Saukkonen, A-L Hartikainen, M-R Jarvelin
Example: Association between birth weight and adult SBP at age 31 years in the NFBC1966 A multivariable regression approach (standard)
Statistical models (life-course) Aim is to : • relate a distal outcome to factors arising at earlier ages and/or earlier generations. Standard multivariable regression approach: • regress the distal outcome on all these factors. This: • gives estimates of effect for each factor, holding the others constant • not adequate approach to address the aim if we are interested in the web of relations surrounding that exposure. Multivariate joint regression approach (example): • specify the joint distribution (interrelationships) of all the variables in the diagram (”spider diagram”). That is, define a multivariate (as opposed to multivariable) model that corresponds to the causal diagram (Greenland & Brumback, 2005). (in the next slide BP=Blood Pressure, BMI=Body Mass Index)
1. ”Spider diagram” challenge - life-course analyses of FTO using path analysis (SEM) Physical Activity at Age 14 Years Physical Activity at Age 31 Years Family SES at Age 14 Years GENE- FTO Maternal Pre-Pregnancy BMI Maternal Age SES at Age 31 Years Diet at Age 31 Years Family SES at Birth BMI at Birth BMI at Age 14 Years DISTAL PHENOTYPE –BP,BMI.. Parity Maternal Smoking at the 2nd Month of Pregnancy Alcohol Use at Age 31 Years Smoking at Age 31 Years Gender Maternal Blood Pressure During Pregnancy Smoking at Age 14 Years Alcohol Use at Age 14 Years GestationalAge Prenatal Birth Childhood Adolescence Adulthood
Multivariate joint regression models Two approaches: a) Structural Equation Models(SEMs, Bollen, 1989; Skrondal & Rabe-Hensketh,2003): general family of multivariate models that include path analysis, factor analysis, latent growth models, . . . b) Chain Graph models (Cox & Wermuth, 1996; Edwards, 2000): In specific settings the two approaches overlap .
How to begin with the analyses? - Think of relevant variables - Build your model piece by piece - Simple example first of complex model Maternal smoking Alcohol use Smoking SES BMI at 14y SES Parity Birth weight BMI AT 31Y BLOOD PRESSURE Gestational age Maternal BMI Gender Submodel 2 Genetic effects Submodel 1
Example: Association between birth weight and adult SBP in the NFBC1966A path model approach Consider a modelwhereone of the explanatoryfactors, adultBMI, is also an intermediateoutcome: Gender SBP (mmHg) 31y BMI 31y (kg/m2) BW (kg)
A pathmodelapproachModelspecification The algebraicspecificationcorresponding to thisdiagram is a set of simultaneousequations. Assuminglinearrelations: BMI31= α1 + β11gender + β12BW + e1 SBP31= α2 + β21gender + β22BW + β23BMI31+ e2
A path model approachResults (β, unit= kg/m2 for BMI at 31y or mmHg for SBP at 31y) BMI31 is an ‘endogenous’variable: it is a dependent and also an explanatoryvariable.
A pathmodelapproachGraphicalresults with β (unit= kg/m2 for BMI ormmHg for SBP) 12.20 Gender 0.97 0.77 BMI 31y (kg/m2) SBP (mmHg) 31y 0.47 BW (kg) -6.35 Birth weight and gender have both a direct and an indirect effect on adult SBP
A pathmodelapproachDirect and indirecteffects • Birthweight and genderhavebothdirect and indirect • effects on adultSBP. • Theirindirecteffectscanbequantifiedbymultiplying the • regression coefficientsalong the indirectpathway. • • indirecteffect of BW: • 1 kg in BW →BMI at 31→ SBP at 31: 0.47× 0.77= 0.37 • • directeffect of BW: • 1 kg in BW →SBP at 31: -6.35 • Theseshouldbeadded to makeup the total, i.e. marginal, • effect-5.98 (0.37+(-6.35)).
Standard multivariable regression vs. pathanalysis • Multivariable regression provides a directeffectestimate of the association conditional on all the othervariables in the model(past and future, no ordertime-wise) • Causalitynotaddressed, i.e. no information on possiblemediation (indirecteffects) on the causalpathway.
Another ”look” with full data - path analysis approach: Blood pressure levels in adulthood - draw a figure!Web of variables during the life course – which variables to choose? SES Maternal smoking Smoking Alcohol use SES BMI at 14y BMI at 31y Parity Birth weight Maternal age Maternal BMI Gestational age BLOOD PRESSURE Gender Genetic effects Prenatal Birth Adolescence Adulthood
More Complex analyses: MODELLING STRATEGY - EXAMPLENorthern Finland Birth Cohort 1966 - To identify sensitive periods growth and relative impact of growth and other factors (e.g. Genetic factors) • Population-basedbirthcohort, N=12231 • Recruitment • Pregnant mothers living in the provinces of Oulu and Lapland • Expected delivery date in 1966 • Data collection: • Maternal background and pregnancy data • Follow-ups at 1y, 14y and 31y • Clinicalexamination and postalquestionnaires at 31y including DNA samples (N=5753) http://kelo.oulu.fi/NFBC
A morecomplexsetting: Analyticalstrategy • Select relevantvariablesandorderthemalong the life course • Select outcomes (intermediate and distal) based on yourhypothesis and data (chronologicalorder etc.) MaternalBMI Smoking at 31y Family SES Alcoholuse at 31y Gender Parity BMI growthvelocity 11-15y BMI growthvelocitybirth-AP BMI growthvelocity AP-AR BP at 31y BMI at 31y Birthweight Maternalage Gestationalage Diet at 31y Maternalsmoking SES at 31y MaternalBP Physicalactivityat 31y Genetic effects Adulthood Adolescence Childhood Infancy Prenatal Birth AP=adipositypeak; AR=adiposityrebound
Typical change in infant/child BMI BMI at AP BMI at AR Age at AP Age at AR AP = adiposity peakAR = adiposity rebound
A morecomplexsetting: Analyticalstrategy • Select relevantvariablesandorderthemalong the life course • Select outcomes (intermediate and distal) based on yourhypothesis and data (chronologicalorder etc.) MaternalBMI Smoking at 31y Family SES Alcoholuse at 31y Gender Parity BMI growthvelocity 11-15y BMI growthvelocitybirth-AP BMI growthvelocity AP-AR BP at 31y BMI at 31y Birthweight Maternalage Gestationalage Diet at 31y Maternalsmoking SES at 31y MaternalBP Physicalactivityat 31y Genetic effects Adulthood Adolescence Childhood Infancy Prenatal Birth AP=adipositypeak; AR=adiposityrebound
A more complex setting: Analytical strategy 3) Test associations in your submodels (Chi square tests, correlation coefficients, regression analyses) and specify if the associatios are linear / non-linear (nature of assoc) 4) Think of biologically plausible pathways and combine the submodels into one pathway model 5) Run path analyses and use different goodness of fit indices to evaluate the model fit 6) Omit variables and paths that do not seem necessary, allow some variables to correlate, add relevant paths etc to improve model fit 7) Rerun the modified model
Summary and conclusions • StrongevidencethatBW is inverselyassociated with adult BP, taking into accountpostnatalgrowth and severalotherfactorsalong the causalpathway • Postnatalgrowthespeciallyfrom AR onwards is positivelyassociated with adult BP • BMI growthbetween AP-AR (in females) and AR-11y (in males) alsonegativelydirectlyassociated with adult BP, i.e. slowgrowthduringtheseperiods is associated with higheradult BP regardless of growthlater in life
Model estimation Inference is based on the multivariate likelihood function; the maximum likelihood approach Software: in Stata, MPlus, LISREL, Amos, SAS and R (but for less general models).
PathmodelAssessment • i) With no missing values: same results by fitting separate • univariate regression models • ii) Goodness of fit can be judged using several indices and • criteria: • • Chi-square test on the correlation matrix • • SRMR: Standardized root mean square residual • • RMSEA:Root Mean Square Error of Approximation • • CFI: Comparative Fit Index • iii) However model could be biased e.g. because of: • • unaccounted confounding factors (Hernan et al, 2002) • • model misspecification: e.g. due to interactions, • non-linearities. • • poor data quality • iv) Points above valid more generally
Interpretation of results • Direct, indirect(i.e. mediation) and totaleffects(coveredby the pathmodelexamplebefore) • The resultsareofteninterpreted in terms of standardized regression weights (orcoefficients) because • oftentotaleffectsaremultiplicationsoverseveralpaths and differentscales • easier to comparedifferenteffectswhenallare in the samescale • Standardizing the coefficientsequals the procedure of firststandardizingall the variables to the samescale(e.g. mean 0, SD 1) and getting the results from analysing standardized variables ”SCALING”..
Standardized regression weights • For continuos covariates: bSTDYX=b*SD(X)/SD(Y) = the change in Y in Y SD units for a standard deviation change in X • For binary covariates: bSTDY=b/SD(Y) = the change in Y in Y SD units when X changes from 0 to 1
Standardized regression weightsExample Height Y: Mean(Y)=164.7, SD(Y)=6.3 Weight X: Mean(X)=64.9, SD(X)=11.9 height=a + b*weight +e b=0.17: • one kg increase in weight increases height by 0.17cm bSTDYX=0.17*11.9/6.3=0.32 • a SD change in X (11.9 kg) increases Y by 0.32 Y SD units, i.e. 0.32*6.3cm=2.02cm
Model estimation Inference is based on the multivariate likelihood function; the maximum likelihood approach Software: in Stata, MPlus, LISREL, Amos, SAS and R (but for less general models).
Potential biases • Ourinterpretation of resultsobtainedfrom a multivariate • modeldependson the appropriateness of the assumed • structure and the quality of the available data. • Wecannotinterpret the estimatedeffects as causalwithout • consideringwhether: • • conceptualmodel is correct; need to askquestionslike • Arethereanyunaccountedconfoundingfactors? • Are the measures of effectspecified on the correctscale? • • the quality of the data is satisfactory: • Are the data affectedby: • 1) measurementerror? • 2) systematicmissingness?
Missing data bias Rubin’s classification (1987): MCAR: missing completely at random; MAR: missing at random; MNAR: missing not at random If missingness is assumed to be MAR,one approach is Multiple Imputation (MI). Its aim is to integrate the ‘substantive’ model likelihood over the missing values. In practice MI consists of an imputation step and an analytical step which are repeated m times (for stability and assessment of precision).
Advantages of a multivariate approach - • allows the joint estimation of complex relationships • assumptions underlying these relationships -although mostly untestable- are all explicit • allows dealing with measurement error directly (also can deal with misclassification within the same framework) • allows dealing with missing values directly (assumption of MAR) • assuming model is correct, gives estimates of direct and indirect effects Disease mechanisms.... With reservations
Disadvantages of a multivariate approach • heavily structured • estimated direct and indirect effects may be grossly biased (and difficult) • other approaches (e.g. marginal structural models - Hernan et al, Epidemiology, 2004) make fewer parametric assumptions (especially regarding unmeasured confounders) and therefore are more robust (but could be less efficient if the equivalent SEM were correct)
Summary, message... These analytic strategies open new ways of understanding better disease mechanisms Need for a very careful interplay of: 1) subject-knowledge 2) data gathering across different sources – time periods 3) model specification and fitting to deal with: Structure: Quality: temporal associations measurement error ‘causal’ association missing values proxy variables 4) sensitivity analyses on the less developed sections 5) comparisons across different studies – REPLICATION!
Smoking and Blood Pressure • Severalstudies show lower BP in smokers (Leone 2011. Cardiol Res Pract 2011: 264894) • BUT, in the long run, smoking increasesarterialstiffnessthuspartlycontributing to rising BP (Leone 2011. Cardiol Res Pract 2011: 264894)
Pathways leading to smoking behaviour – reference with blood pressure CHRNA - GENE CLUSTER ENCODES NICOTINIC ACETYLCHOLINE RECEPTOR SUBUNITS TTC12-ANKK1-DRD2 – DOPAMIN METABOLISM, LINKED WITH NICOTININ USE, DEPENDENCIES
To catch-up: many types of changes in genome - single-nucleotide polymorphisms (SNPs), tandem repeats, copy number of variation (CNV), inversions, deletions A = adenine T = thymine C = cytosine G = guanine ATG CTG.. “sentences”= genes Gene -> proteins DNA molecule 1 differs from DNA molecule 2 at a single base-pair location (a C/T polymorphism) Sugar-phosphate backbone; rangs are nucleotide base pairs (C combines with G, A with T)
after adjustment for sex, BMI, PCs after further adjustment for multiple testing, power issue
Shared genetics between smoking and SBP Prenatal family SES Maternal marital status at birth SES at 31 Family SES at 14 High Novelty seeking Maternal smoking during pregnancy Smoking at 14 Smoking at 31 SBP at 31 TTC12-rs10502172[G] CHRNA3-rs1051730[A] SEX(F vs M)
Conclusions and Future aspects • Some evidence for an association between variants in the CHRNA5-CHRNA3-CHRNB4 and SBP (in smokers) • Replication needed – CARTA – consortium; Mendelian randomization approach • Lifecourse analyses
1. ”Spider diagram” challenge - life-course analyses of FTO using path analysis (SEM) Physical Activity at Age 14 Years Physical Activity at Age 31 Years Family SES at Age 14 Years GENE- FTO Maternal Pre-Pregnancy BMI Maternal Age SES at Age 31 Years Diet at Age 31 Years Family SES at Birth BMI at Birth BMI at Age 14 Years DISTAL PHENOTYPE -BMI Parity Maternal Smoking at the 2nd Month of Pregnancy Alcohol Use at Age 31 Years Smoking at Age 31 Years Gender Maternal Blood Pressure During Pregnancy Smoking at Age 14 Years Alcohol Use at Age 14 Years GestationalAge Prenatal Birth Childhood Adolescence Adulthood
Direct, indirect and total effects of FTO on adult BMI (standardized beta values) • Direct effect: 0.04 • Indirect effects of the FTO variant to adult BMI: • FTO-mat.BMI-BMI31: 0.03*0.095=0.003 • FTO-mat.BMI-BBMI-BMI14-BMI31: 0.03*0.155*0.08*0.529=0.002 • FTO-BBMI-BMI14-BMI31: 0.018*0.08*0.529=0.001 • FTO-BMI14-BMI31: 0.026*0.529=0.014 • Total indirect: 0.003+0.002+0.001+0.014=0.020 • Total effect:0.02+0.04=0.06 0.03 0.026 0.04 0.16 0.155 0.175 0.095 0.018 0.529 0.08 0.25