Univariate Twin Analysis- Saturated Models for Continuous and Categorical Data

Univariate Twin Analysis- Saturated Models for Continuous and Categorical Data September 2, 2014 Elizabeth Prom-Wormley & HermineMaes ecpromwormle@vcu.edu 804-828-8154

Overall Questions to be Answered • Does the data satisfy the assumptions of the classical twin study? • Does a trait of interest cluster among related individuals?

Family & Twin Study Designs • Family Studies • Classical Twin Studies • Adoption Studies • Extended Twin Studies

569 total MZ pairs 351 total DZ pairs The Data • Please open twinSatConECPW Fall2014.R • Australian Twin Register • 18-30 years old, males and females • Work from this session will focus on Body Mass Index (weight/height2) in females only • Sample size • MZF = 534 complete pairs (zyg = 1) • DZF = 328 complete pairs (zyg = 3)

A Quick Look at the Data

Classical Twin Studies Basic Background • The Classical Twin Study (CTS) uses MZ and DZ twins reared together • MZ twins share 100% of their genes • DZ twins share on average 50% of their genes • Expectation- Genetic factors are assumed to contribute to a phenotype when MZ twins are more similar than DZ twins

Classical Twin StudyAssumptions • MZ twins are genetically identical • Equal Environments of MZ and DZ pairs

Basic Data Assumptions • MZ and DZ twins are sampled from the same population, therefore we expect :- • Equal means/variances in Twin 1 and Twin 2 • Equal means/variances in MZ and DZ twins • Further assumptions would need to be tested if we introduce male twins and opposite sex twin pairs

“Old Fashioned” Data Checking Nice, but how can we actually be sure that these means and variances are truly the same?

Univariate AnalysisA Roadmap • 1- Use the data to test basic assumptions (equal means & variances for twin 1/twin 2 and MZ/DZ pairs) • Saturated Model • 2- Estimate contributions of genetic and environmental effects on the total variance of a phenotype • ACE or ADE Models • 3- Test ACE (ADE) submodels to identify and report significant genetic and environmental contributions • AE or CE or E Only Models 10

Saturated Twin Model

mMZ1 mDZ1 mMZ2 mDZ2 Saturated Code Deconstructed meanMZ<- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=meVals, labels=c("mMZ1","mMZ2"), name=”meanMZ" ) meanDZ<- mxMatrix( type="Full", nrow=1, ncol=ntv, free=TRUE, values=meVals, labels=c("mDZ1","mDZ2"), name=”meanDZ" ) mean MZ = 1 x 2 matrix mean DZ = 1 x 2 matrix

T1 T1 T2 T2 T1 T1 T2 T2 Saturated Code Deconstructed covMZ<- mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=TRUE, values=cvVals, lbound=lbVals, labels=c("vMZ1","cMZ21","vMZ2"), name=”covMZ" ) covDZ <- mxMatrix( type="Symm", nrow=ntv, ncol=ntv, free=TRUE, values=cvVals, lbound=lbVals, labels=c("vDZ1","cDZ21","vDZ2"), name=”covDZ" ) covMZ = 2 x 2 matrix covDZ = 2 x 2 matrix

Time to Play... Continue with the FiletwinSatConECPW Fall2014.R

Estimated Values • 10 Total Parameters Estimated mMZ1, mMZ2, vMZ1,vMZ2,cMZ21 mDZ1, mDZ2, vDZ1,vDZ2,cDZ21 • Standardize covariance matrices for twin pair correlations (covMZ & covDZ)

Fitting Nested Models • Saturated Model • likelihood of data without any constraints • fitting as many means and (co)variances as possible • Equality of means & variances by twin order • test if mean of twin 1 = mean of twin 2 • test if variance of twin 1 = variance of twin 2 • Equality of means & variances by zygosity • test if mean of MZ = mean of DZ • test if variance of MZ = variance of DZ

Estimated Values

Estimates

Stats No significant differences between saturated model and models where means/variances/covariances are equal by zygosity and between twins

Working with Binary and Ordinal Data Elizabeth Prom-Wormley and HermineMaes Special Thanks to Sarah Medland

Transitioning from Continuous Logic to Categorical Logic • Ordinal data has 1 less degree of freedom compared to continuous data • MZcov, DZcov, Prevalence • No information on the variance • Thinking about our ACE/ADE model • 4 parameters being estimated • A/ C/ E/ mean • ACE/ADE model is unidentified without adding a constraint

Two Approaches to the Liability Threshold Model • Traditional • Maps data to a standard normal distribution • Total variance constrained to be 1 • Alternate • Fixes an alternate parameter (usually E) • Estimates the remaining parameters

Time to Look at the Data! Please open BinaryWarmUp.R

Observed Binary BMI is Imperfect Measure of Underlying Continuous Distribution Mean (bmiB2) = 0.39 SD (bmiB2) = 0.49 Prevalence “low” BMI = 60.6% We are interested in the liability of risk for being in the “high” BMI category

It’s Helpful to Rescale Raw Data (Unstandardized) mean=0.49, SD=0.39 -Data not mapped to a standard normal -No easy conversion to % -Difficult to compare between groups Since the scaling is now arbitrary Standard Normal (Standardized) mean=0, SD=1 Area under the curve between two z-values is interpreted as a probability or percentage

Binary Review Threshold calculated using the cumulative normal distribution (CND) -We used frequencies and inverse CND to do our own estimation of the threshold qnorm(0.816) = 0.90 - Threshold is the Z Value that corresponds with the proportion of the population having “low BMI”

Moving to Ordinal Data!

Getting a Feel for the DataOpen twinSatOrd.R Calculate the frequencies of the 5 BMI categories for the second twins of the MZ pairs CrossTable(mzDataOrdF$bmi2)

Estimating MZ Twin 2 Thresholds by Hand T1 = qnorm(0.124) T1 = -1.155 T2 = qnorm(0.124 + 0.236) T2 = -0.358 T3 = qnorm(0.124 + 0.236 + 0.291) T3 =0.388 T4 = qnorm(0.124 + 0.236 + 0.291 + 0.175) T4 = 0.939 Estimate Twin Pair Correlations for the Liabilities Too!

Translating Back to the SEM Approach in OpenMx

Handling Ordinal Data in OpenMx 1- Determine the 1st threshold 2- Determine displacements between 1st threshold and subsequent thresholds 3- Add the 1st threshold and the displacement to obtain the subsequent thresholds

Ordinal Saturated Code DeconstructedDefining Threshold Matrices covDZ covMZ μMZT1 0 μDZT1 0 μ MZT2 0 μ DZT2 0 1 1 LT2 LT1 LT1 LT2 Variance Constraint 1 1 1 1 Threshold Model threM <- mxMatrix( type="Full", nrow=nth, ncol=ntv, free=TRUE, values=thVal, lbound=thLB, labels=thLabMZ, name="ThreMZ" ) threD <- mxMatrix( type="Full", nrow=nth, ncol=ntv, free=TRUE, values=thVal, lbound=thLB, labels=thLabDZ, name="ThreDZ" )

Ordinal Saturated Code DeconstructedDefining Threshold Matrices- ThreMZ 1- Determine the 1st threshold 2- Determine displacements between1st thresholds and subsequent thresholds

Double Check- Moving from Frequencies to Displacements

Ordinal Saturated Code DeconstructedEstimating Expected Threshold Matrices threMZ <- mxAlgebra( expression= Inc %*% ThreMZ, name="expThreMZ" ) Inc <- mxMatrix( type="Lower", nrow=nth, ncol=nth, free=FALSE, values=1, name="Inc" ) = % * % 3- Add the 1st threshold and the displacement to obtain the subsequent thresholds

Ordinal Saturated Code DeconstructedEstimating Correlations & Fixing Variance corMZ <- mxMatrix( type="Stand", nrow=ntv, ncol=ntv, free=TRUE, values=corVals, lbound=lbrVal, ubound=ubrVal, labels="rMZ", name="expCorMZ" ) corDZ <- mxMatrix( type="Stand", nrow=ntv, ncol=ntv, free=TRUE, values=corVals, lbound=lbrVal, ubound=ubrVal, labels="rDZ", name="expCorDZ" )

How Many Parameters in this Ordinal Model? • MZ correlation- rMZ • DZ correlation- rDZ • Thresholds • t1MZ1,t2MZ1,t3MZ1,t4MZ1 • t1MZ2,t2MZ2,t3MZ2,t4MZ2 • t1DZ1,t2DZ1,t3DZ1,t4DZ1 • t1DZ2,t2DZ2,t3DZ2,t4DZ2

Problem Set 1 • Open twinSatOrdA.R and twinSatOrd.R • What do these scripts do? • Looking at the scripts only: How are they similar? How are they different? • What do these differences in the scripts reflect regarding conceptual differences in the two models? • Run either script and double check against your previously hand-calculated values of thresholds. Report your results. If you can’t get it to match up, don’t panic…do e-mail. • Run twinSatOrd.R • Is testing an ACE model with the usual model assumptions justified? Why or why not?

Univariate Analysis with Ordinal DataA Roadmap • 1- Use the data to test basic assumptions inherent to standard ACE (ADE) models • Saturated Model • 2- Estimate contributions of genetic and environmental effects on the liability of a trait • ADE or ACE Models • 3- Test ADE (ACE) submodels to identify and report significant genetic and environmental contributions • AE or E Only Models

Questions?

Univariate Twin Analysis- Saturated Models for Continuous and Categorical Data

Univariate Twin Analysis- Saturated Models for Continuous and Categorical Data

Presentation Transcript

Univariate Analysis

Categorical Data Analysis

Chapter 16 – Categorical Data Analysis

Introduction to Categorical Data Analysis

Categorical Data Analysis

Univariate Data

Analysis of Categorical Data

Univariate and Bivariate Analysis

Univariate Twin Analysis

Categorical Data Analysis

Univariate Analysis

Categorical Models

Categorical Data Analysis PGRM 14

Univariate Data

Univariate Analysis

Categorical Data Analysis

“Categorical Data Analysis 2x2 Chi-Square Tests and Beyond (Multiple Categorical Variable Models)”

Categorical Data Analysis

Categorical Data Analysis Review for Final

Non-parametric Methods for Clustering Continuous and Categorical Data