Statistical Analysis Overview I Session 1

Statistical Analysis Overview ISession 1 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill

Overview: Statistical analysis overview I • Linear models • Nesting • Longitudinal models • Mixed Model ANOVA • Multivariate Repeated Measures • Two Level Hierarchical Linear Models • Latent Growth Curve Models

Overview: Linear Models Most commonly used statistical models: 1. t-test, Analysis of Variance/Covariance-- comparing means across groups 2. Correlations, Multiple Regression – estimating associations among continuous variables.

Linear Models • General Model: Yij = B0 + B1 X1ij + B2 X2ij+ … + eij • Assumptions • One source of random variability (eij) • Normally distributed error terms • Homogeneity of variance • Independence of observations

Linear Models • Equivalence of models • T-test and ANOVA (1-way with 2 groups) • Regression and ANOVA • t-test: Yij = B0 + B1 X1ij + eij • X1ij } 1 if in first group, 0 if in second group • One-way ANOVA (2 groups): Yij = B0 + B1 X1ij + eij • X1ij } 1 if in first group, 0 if in second group

Linear Models • One-way ANOVA (p groups): Yij = B0 + B1 X1ij + B2 X2ij+ … + Bp-1 Xp-1ij + eij • X1ij } 1 if in first group, 0 otherwise, • X2ij } 1 if in second group, 0 otherwise, • etc for p-1 groups (last group is reference cell) • Regression (p predictors): Yij = B0 + B1 X1ij + B2 X2ij+ … + Bp Xpij + eij • X1ij: first continuous predictor • X2ij: second continuous predictor • etc for the p predictors in the model

Linear Models • One-way ANCOVA (2 level factor and 1 covariate): Yij = B0 + B1 X1ij + B2 X2ij + eij • X1ij } 1 if in first group, 0 otherwise, • X2ij: continuous predictor • Separate slopes ANCOVA (2 level factor and 1 covariate): Yij = B0 + B1 X1ij + B2 X2ij + B3 X3ij + eij • X1ij } 1 if in first group, 0 otherwise, • X2ij: first continuous predictor • X3ij = X1ij * X2ij : } 0 if not in first group value of first continuous predictor if in first group

Linear Models • Two-way ANOVA (2 2-level factors and interaction): Yij = B0 + B1 X1ij + B2 X2ij + B3 X3ij + eij • X1ij } 1 if in first group on first factor, 0 otherwise, • X2ij } 1 if in first group on second factor, 0 otherwise, • X3ij = X1ij * X2ij : } 1 if in first level of the fist and second factor, 0 otherwise

Linear Models – General Issues • Design parameterization • Showed Reference Cell Coding • Effect Coding often preferable (use -.5 and .5 instead of 0 and 1) • Centering variables • Whenever an interaction is included, you should center your data so main effects are interpretable • Easiest – subtract sample mean from all values • Nested data- correlated observations

Correlations among Observations • Many sources of nesting • Repeated measures over time • Clustering of students in a classroom, therapy group, etc • Clustering of individuals in a family • Consequence of nesting • Standard errors are under-estimated when observations within cluster are positively correlated • P-values are too small when standard errors are under-estimated

Nesting • Longitudinal models provide the easiest nested model to understand • Obvious that repeated assessments of individuals are not independent • Present various approaches to modeling longitudinal data

Analytic methods to address nesting • Mixed-model repeated measures • Multivariate repeated measures • Hierarchical linear models • Latent growth curves

Overview: Additional Assumption for Repeated Measures Analyses General assumptions • An adequate model to describe • Individual patterns of change (within cluster patterns of change) • Individual differences in developmental patterns (between cluster patterns of change) • Both models must include • Important covariates & relevant interactions • Represent correlations in nested factors (Type I error rate control)

General statistical assumptions • Same outcome measured in the same metric over time • Interval or ratio measurement a • Normally distributed variables a • Homogeneity of variance a • Monotonic assessment • Must be able index amount of change • Unit change must be uniform across scale and age • Standard score – not great, but can be used • If same outcome over time • Identical items not required aspecial methods needed if assumption not met

Longitudinal Data

Traditional Growth Curve Analysis "Univariate" Analysis (“Mixed Model”) General model for one grouping variable and linear change related to age. Yijk = b0k + b1k Ageijk + aik Personik + eijk for i=1,...,n individuals, j=1,...,p occasions, k=1,...,r groups; with 2 fixed effect variables - Group and Age & 3 random variables - Y, Person, E;

“Univariate” Growth Curves

Mixed-Model ANOVA • Advantages • Estimates individual intercepts • Corrections are available to avoid inflating test statistics • Disadvantages • Assumes all slopes are identical • Deletions of individuals with missing data if apply corrections • Cannot easily accommodate repeated measures of predictors or multiple levels of nesting

Profile Analysis or Multivariate Repeated Measures Analysis Transforms model into separate analyses of between- and within-factors General model for one grouping variable and linear change related to age Yijk = p0ik + p1ik Ageijk + eijk (individual growth curve) E(Yjk) = b0k + b1k Ageijk (population growth curve) for i=1,...,n individuals, j=1,...,p occasions, k=1,...,r groups;

Yijk = p0ik + p1ik Ageijk +eijk E(Yjk) = b0k + b1k Ageijk where Yijkrepresents the j-th assessment of the i-th individual in the k-th group, p0ik is the intercept for the i-th subject in the k-th group b0kis the intercept for the k-th group - the unweighted mean of the p0ikwithin the k-th group p1ikis the slope for the regression of Y on Age for the i-th individual in the k-th group b1k is the slope for the regression of Y on Age for the k-th group - the unweighted mean of the p1ikwithin the k-th group

Profile Analysis

Profile Analysis • Advantages • Estimates individual intercepts and slopes • Standard errors are not inflated with moderate to large sample sizes • Disadvantages • Case wise deletion of individuals with missing data • Forced to use categorized nesting variable • Cannot easily accommodate repeated measures of predictors or multiple levels of nesting

Hierarchical Linear Model ("Mixed-Effects Linear Model") General model for one between-subjects categorical factor and linear change related to age. Yijk = (b0k + p0ik) + (b1k + p1ik) Ageijk + eijk or Yijk = p0ik + p1ik Ageijk + eijk (Level 1 or individual growth curve) E(Yjk) = b0k + b1k Ageijk (Level 2 or population growth curve) for i=1,...,n individuals, j=1,...,p occasions, k=1,...,r groups; with 1 fixed effect variables - Group & 4 random variables - Y, Individual's mean level, Individual's change over Age, E;

Yijk = (b0k + p0ik) + (b1k + p1ik) Ageijk + eijk where Yijk represents the j-th assessment of the i-th individual in the k-th group, b0k is the intercept for the k-th group- estimated as weightedmean of p0ik, p0ik is the increment to the intercept for the i-th individual in the k-th group b1kis the slope for the regression of Y on Age for the k-th group- estimated as weighted mean of p1ik, p1ik is the increment to the slope for the i-th individual in the k-th group eijkrepresents the random error of the j-th assessment of the i-th individual in the k-th group

Hierarchical Linear Model

Hierarchical Linear ModelAdvantages • Accommodate multiple levels of nesting • Slopes and intercepts of individual growth curves can vary • Increased precision • Permits missing or “mistimed” data ignorably missing data purposefully missing data designs inconsistently timed data 5. Allows repeated measures of predictors 6. Flexible specification of growth patterns 7. Fixed-effect parameter estimates fairly robust

Hierarchical Linear ModelsDisadvantages • Assumes that an infinite number of individuals were observed, but a "large" number is sufficient. Unclear what is large enough 2. Models can get very complicated 3. No direct tests of mediation

SECCYD Example: Maternal Sensitivity • Goal: determine whether maternal sensitivity between 6m and first grade varies as a function of • maternal education, • maternal depression • child gender.

Analysis Data 6 15 24 36 54 G1 Time-varying Maternal sensitivity N 1272 1240 1172 1161 1040 1004 M 3.07 3.13 3.12 3.27 3.23 3.22 sd .59 .55 .59 .53 .56 .58 Maternal Depression % 18% 17% 18% 16% 18% 14% Time-Invariant Maternal Education M (sd) 14.3 (2.49) Child Gender % male 51%

Model Y ij = p0i + p1i Ageij+ p2ik Ageij2 + b1Depij + b2 Depij x Ageij + b3 Depij x Age2j + eijk (individual component of growth curve) b0 + b4 AGEij + b5 AGEij2 + b6Medi + b7 Medi x Ageij + b7 Medi x AGEij2 + b8Malei + b9Malei x Ageijk + b10 Malei x AGEij2 + (group component of growth curve).

Results • Maternal Education – Mothers with more education show more sensitivity, and show less reduction in sensitivity after children enter schools • Gender – mothers more sensitive with girls during early childhood, but show increasing levels of sensitivity with boys over time • Maternal depression – Depressed mothers show less sensitivity during early childhood, but show modest gains when children enter school

Continuous Predictors Mother's Sensitivity for Mothers with High School Degree versus Bachelor’s Degree

Categorical Predictors Mother's Sensitivity for Male versus Female Children

Categorical Predictors Mother's Sensitivity for Mothers with and without Clinical Levels of Depressive Symptoms

Analytic issues-repeated measures • Time-varying (within-subjects) and time-invariant (between-subjects) data • Analysis data – one record per subject or one record per subject per assessment (software issue) • Plotting results • Interpreting interactions

Latent Growth Curves HLM Level 1 corresponds to LISREL measurement model for Y HLM: Yip = pop + p1p time I + eip LGC: Yp = [1tp ] p+ ep = 0 + [1tp ] p+ ep ( endogenous variable Y) = tY +lY h+ e p where Yp is vector of observed values for person p h = p the vector of latent growth curve parameters for person p e p is individual-specific vector of unknown measurement error and unlike the usual practice of LISREL analysis, t Y & lY parameter matrices are constrained to contain only known values tY = 0 lY = [1tp ] - this passes the Level 1 growth curve parameters into the LISREL endogenous constraintsLatent Growth Curves

Latent Growth Curves HLM Level 2 corresponds to LISREL structural model HLM: p = Xb + r LGC:p = m + ( 00 ) p + [p - m] which has the form of a reduced LISREL structural model h = a + b h + z z = [p - m] a = m: the group growth curve parameters b = (00)

Latent Growth Curve Model(same as HLM individual curve)

Latent Growth Curves: Advantages • Allows individual intercepts and slopes to vary. • Allows for error in predictors • Easily handles error heterogeneity and correlated errors • Permits latent variables with multiple indicators • Can examine patterns of change on more than one dimension. • Easily estimates direct and indirect (intervening) effects

Latent Growth Curves Disadvantages • Does not easily accommodate more than one level of nesting • Easy-to-use software requires time-structured data (M-Plus) • Number of estimated parameters gets large quickly • Less power for testing interactions or moderating effects Equivalence: HLM and LGC can be shown to be interchangeable when data are time structured

Latent Growth CurvesExample – SECCYD Maternal Sensitivity • Goal - describe developmental patterns in maternal sensitivity with target child from six months to first grade • Analysis- Structural Equation Model • Quadratic individual growth curve • Maternal education and gender as predictors • AMOS with FIML - due to missing data

SECCYD – Maternal Sensitivity Bold indicates sign. at p<0.05

SECCYD-LGC Analysis of Maternal Sensitivity • Maternal education related to higher levels of sensitivity over time (intercept). • Mothers are more sensitivity with girls in general (intercept), but show nonlinear increases in sensitivity toward boys (quadratic slope).

Conclusions • Growth curve analyses can provide an appropriate and powerful analytic tools for examining longitudinal or other types of nested data • Careful selection of analytic methods and models is needed

Statistical Analysis Overview I Session 1