300 likes | 658 Views
Statistical Analysis Overview I Session 2 . Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill. Overview: Statistical analysis overview I-b. Nesting and intraclass correlation Hierarchical Linear Models 2 level models 3 level models.
E N D
Statistical Analysis Overview ISession 2 Peg Burchinal Frank Porter Graham Child Development Institute, University of North Carolina-Chapel Hill
Overview: Statistical analysis overview I-b • Nesting and intraclass correlation • Hierarchical Linear Models • 2 level models • 3 level models
Nesting • Nesting implies violation of the linear model assumptions of independence of observations • Ignoring this dependency in the data results in inflated test statistics when observations are positively correlated • CAN DRAW INCORRECT CONCLUSIONS
Nesting and Design • Educational data often collected in schools, classrooms, or special treatment groups • Lack of independence among individuals -> reduction in variability • Pre-existing similarities (i.e., students within the cluster are more similar than a students who would be randomly selected) • Shared instructional environment (i.e., variability in instruction greater across classroom than within classroom) • Educational treatments often assigned to schools or classrooms • Advantage: To avoid contamination, make study more acceptable (often simple random assignment not possible) • Disadvantage: Analysis must take dependencies or relatedness of responses within clusters into account
Intraclass Correlation (ICC) • For models with clustering of individuals • “cluster effect”: proportion of variance in the outcomes that is between clusters (compares within-cluster variance to between-cluster variance) • Example – clustering of children in classroom. ICC describes proportion of variance associated with differences between classrooms
Intraclass Correlation • Intraclass correlation (ICC) – measure of relatedness or dependence of clustered data • Proportion of variance that is between clusters • ICC or r = s2b / (s2b + s2w) • ICC = 0 } no correlation among individuals within a cluster = 1 } all responses within the clusters are identical
Nesting, Design, and ICC • Taking ICC into account results in less power for given sample size • less independent information • Design effect = mk / (1 + r (m-1)) • m= number of individuals per cluster • K=number of clusters • r =ICC • Effective sample size is number of clusters (k) when ICC=1 and is number of individuals (mk) when ICC=0
ICC and Hierchical Linear Models • Hierarchical linear models (HLM) implicitly take nesting into account • Clustering of data is explicitly specified by model • ICC is considered when estimating standard errors, test statistics, and p-values
2 level HLM • One level of nesting • Longitudinal: Repeated measures of individual over time • Typically - Random intercepts and slopes to describe individual patterns of change over time • Clusters: Nesting of individuals within classes, families, therapy groups, etc. • Typically - Random intercept to describe cluster effect
2 level HLM Random-intercepts models • Corresponds to One-way ANOVA with random effects (mixed model ANOVA) • Example: Classrooms randomly assigned to treatment or control conditions • All study children within classroom in same condition • Post treatment outcome per child (can use pre-treatment as covariate to increase power) • Level 1 = children in classroom Level 2 = classroom ICC reflects extent the degree of similarity among students within the classroom.
2 Level HLMRandom Intercept Model • Level 1 – individual students within the classroom • Unconditional Model: Yij = B0j + rij • Conditional Model: Yij = B0j + B1 Xij + rij • Yij= outcome for ith student in jth class • B0j= intercept (e.g., mean) for jth class • B1= coefficient for individual-level covariate, Xij • rij= random error term for ith student in jth class, E ( rij) = 0, var (rij) = s2
2 Level HLMRandom Intercept Model • Level 2 – Classrooms • Unconditional model: B0j = g00 + u 0j • Conditional model: B0j = g00 + g01 Wj1 + g02 Wj2 + u 0j • B0j j= intercept (e.g., mean) for jth class • g00 = grand mean in population • g01 = treatment effect for Wj, dummy variable indicating treatment status -.5 if control; .5 if treatment • g02 coefficient for Wj2, class level covariate • u 0j = random effect associated with j-th classroom E (uij) = 0, var (uij) = t00
2 Level HLMRandom Intercept Model • Combined (unconditional) • Yij = g00 + u 0j + rij • Yij = B0j + rij • B0j = g00 + u 0j • Combined (conditional) • Yij = g00 + g01 Wj + g02 Wj2 + B1 Xij + u 0j + rij • Yij = B0j + B1 Xij + rij • B0j = g00 + g01 Wj + g02 Wj2 + u 0j • Var (Yij ) = Var ( u 0j + rij ) = (t00 + s2) • ICC = r = t00 / (t00 + s2)
Example2 level HLM Random Intercepts • Purdue Curriculum Study (Powell & Diamond) • Onsite or Remote coaching • 27 Head Start classes randomly assigned to onsite coaching and 25 to remote coaching • Post-test scores on writing • Onsite: n=196, M=6.70, SD=1.54 Remote: n=171, M=7.05, SD=1.64
Example2 level HLM Random Intercepts • Level 1: Writingij = B0j + B1 Writing-preij + rij B1 =.56, se=.05, p<.001 E ( rij) = 0, var (rij) = 1.67 • Level 2: B0j = g00 + g01 Onsitej + u 0j g00 (intercept- remote group adjusted mean) = 3.74, se =.31 g01(Onsite-Remote difference) = -.37, se=.17, p=.03 E (uij) = 0, var (uij) = .137 • ICC = t00 / (t00 + s2) = .137 / (.137 + 1.66) = .076
2 Level HLM - Longitudinal (random-slopes and –intercepts models) • Corresponds NOT to One-way ANOVA with random effects • Example: Longitudinal assessment of children’s literacy skills during Pre-K years • Level 1 = individual growth curve Level 2 = group growth curve
Level 1- Longitudinal HLM • Level 1 – individual growth curve • Unconditional Model: Yij = B0j + B1j Ageij + rij • Conditional Model: Yij = B0j + B1j Ageij + B2 Xij + rij • Yij= outcome for ith student on the jth occasion • Ageij = age at assessment for ith student on the jth occasion • B0j= intercept for ith student • B1j= slope for Age for ith student • B2= coefficient for tiem-varying covariate, Xij\ • rij= random error term for ith student on the jth occasion E ( rij) = 0, var (rij) = s2
Level 2 – Longitudinal HLM • Level 2 – predicting individual trajectories • Unconditional model: B0j = g00 + u 0j B1j = g10 + u 1j • Conditional model: B0j = g00 + g01 Wj1 + g02 Wj2 + u 0j B1j = g10 + g11 Wj1 + g12 Wj2 + u 1j • B0j= intercept for ith student B1j= slope for Age for ith student • g00 = intercept in population g10 = slope in population • g01 = treatment effect on intercept for Wj, student -level covariate g11 = treatment effect on slope for Wj, student -level covariate
Level 2 – Longitudinal HLM • Level 2 – predicting individual trajectories • Unconditional model: B0j = g00 + u 0j B1j = g10 + u 1j • Conditional model: B0j = g00 + g01 Wj1 + u 0j B1j = g10 + g11 Wj1 + u 1j • u 0j = random effect for individual intercept u 0j = random effect for individual slope • E (u0j) = 0, var (u0j) = t00 E (u1j) = 0, var (u1j) = t11 • cov (u 0j, u 1j) = t10 var (u 0j, u 1j)=t00 t01 t10 t00 • level 1 and 2 error terms independent cov (rij, T) = 0
Example – Longitudinal HLM • Purdue Curriculum Study (Powell & Diamond) Level 1 – estimating individual growth curves for children in one treatment condition (Remote) • Level 2 – estimating population growth curves for Remote condition
Example • Level 1: blendingij = B0j + B1j Ageij + rij estimated s2 = 10.34 • Level 2: B0j = g00 + g01 Wj1 + u 0j B1j = g10 + u 1j Estimated results Intercept g00 = 11.86 (se=.48), t00 = 10.03** season g01 = 2.43* (se=.70) Slope g10 = 1.51* (se=.60), t11 = 4.24** t10 = -1.45**
3 level HLM • 2 levels of nesting • Examples • Longitudinal assessments of children in randomly assigned classrooms • Level 1 – child level data • Level 2 – child’s growth curve • Level 3 – classroom level data • Two levels of nesting such as children nested in classrooms that are nested in schools • Level 1 – child level data • Level 2 – classroom level data • Level 3 – school level data
3 level Model-Random Intercepts • Children nested in classrooms, classrooms nested in schools • Level 1 child-level model Yijk = pojk + eijk • Yijk is achievement of child I in class J in school K • pojk is mean score of class j in school k • eojk is random “child effect” • Classroom level model pojk = B00k + r0jk • B00k is mean score for school k • r0jk is random “class effect” • School level model B00k = g000 + u00k • g000 is grand mean score • u00k is random “school effect”
3 level Model-Random Intercepts • Children nested in classrooms, classrooms nested in schools • Level 1 child-level model Yijk = pojk + eijk • eojk is random “child effect”, E (eijk) = 0 , var(eijk) = s2 • Within classroom level model pojk = B00k + r0jk • r0jk is random “class effect”, E (r0jk ) = 0 , var(r0jk ) = tp Assume variance among classes within school is the same • Between classroom (school) B00k = g000 + g01 trt + u00k E (u00k) = 0 , var(u00k) = tb
Partitioning variance • Proportion of variance within classroom • s2 / (s2 + tp + tb) • Proportion of variance among classrooms within schools tp/ (s2 + tp + tb) • Proportion of variance among schools tb/ (s2 + tp + tb)
3 Level HLM – level 2 longitudinal and level 3 random intercepts • Typically – treatment randomly assigned at classroom level, children followed longitudinally (e.g., Purdue Curriculum Study) • (within child) Level 1: Yijk = p0j k + p1j k Ageijk + rijk E (eijk) = 0 , var(eijk) = s2 • (between child ) Level 2: p0jk = b00k + r 0jk; p1j k= b10k + r 1jk E (r0jk ) = 0 , var(r0jk ) = tp0 E (r1jk ) = 0 , var(r1jk ) = tp1 • (between classes) Level 3: B00k = g00 + u00k; B10k = g10 + u10k E (u00k) = 0 , var(u00k) = tb E (u10k) = 0 , var(u10k) = tb
Example Purdue Curriculum Study • Level 1 – individual growth curve • Level 2 – classroom growth curve • Level 3 – treatment differences in classroom growth curves
Threats • Homogeneity of variance – at each level • Nonnormal data with heavy tails • Bad data • Differences in variability among groups • Normality assumption • Examine residuals • Robust standard error (large n) • Inferences with small samples
3 Level HLMLongitudinal assessments of individual in clustered settings