1.58k likes | 1.96k Views
Multilevel Regression Models. sean f. reardon 17 june, 2004. Outline. What/Why Multilevel Regression? The Basic Multilevel Regression Model Growth Models A Taste of Advanced Topics. Part I.A. What are multilevel data and multilevel analysis?.
E N D
Multilevel Regression Models sean f. reardon 17 june, 2004
Outline • What/Why Multilevel Regression? • The Basic Multilevel Regression Model • Growth Models • A Taste of Advanced Topics
Part I.A. What are multilevel data and multilevel analysis?
Multilevel data are data where observations are clustered in units Observations within the same unit may be more similar than observations in separate units, on average What effect does this have on estimation and statistical inference? What are multilevel data?
Examples of multilevel data with contextual clustering • Observations of students, clustered within schools • Observations of siblings, clustered within families • Observations of individuals, clustered within countries, states, or neighborhoods
Examples of multilevel data with intra-person clustering • Repeated test scores, clustered within students • Multiple measures of a latent construct, clustered within persons
Other examples of multilevel data • Patients, clustered within doctors • Coefficient estimates, clustered within studies (meta-analysis) • Widget sizes, clustered within factories • And so on…
What is multilevel regression analysis? • Also called • Hierarchical Linear Models • Mixed Models • Multilevel Models • Growth Models • Slopes-as-Outcomes Models
Multilevel Regression Models • A form of regression models • Used to answer questions about the relationship of context to individual outcomes • Used to estimate both within-unit and between-unit relationships (and cross-level interactions) • e.g., within- vs. between-school relationships between SES and achievement
Assumptions of OLS • Linearity • Errors are normally distributed • Errors are homoskedastic • Errors are uncorrelated/independent • Knowing the error term for one observation is not informative of the error term of any other observation
Some Example Data • Data from Early Childhood Longitudinal Study-Kindergarten Cohort (NCES, 1998-2004) • Longitudinal study of 21,000 kindergarten students in K class of 1998-99 • Followed through fifth grade (2003-04)
ECLS-K data • Subsample • 399 kindergarten students • sampled from 17 schools • Math Score: • Fall kindergarten math test scores • Administered 2-3 months into school year • Age • Age in months at time of math assessment • Ranges from 60-79 months
What is the relationship between age and math scores? • Note: this is NOT a growth model • It is a cross-sectional model • A growth model requires repeated measures, so we can observe intra-individual growth
OLS Regression:Math on Age . reg math age Source | SS df MS Number of obs = 422 -------------+------------------------------ F( 1, 420) = 32.38 Model | 1765.41947 1 1765.41947 Prob > F = 0.0000 Residual | 22896.5737 420 54.5156517 R-squared = 0.0716 -------------+------------------------------ Adj R-squared = 0.0694 Total | 24661.9932 421 58.5795563 Root MSE = 7.3835 ------------------------------------------------------------------------------ math | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .4956666 .0871016 5.69 0.000 .3244572 .666876 _cons | -10.91381 6.049008 -1.80 0.072 -22.80391 .9762943 ------------------------------------------------------------------------------
OLS Regression:Math on Age Next look at the residuals from this model. Are they homoscedastic? Normally distributed? Independent?
Residuals look correlated with each other within schools • Formal test of this dependence – ANOVA
Random Effects ANOVA One-way Analysis of Variance for resid: Residuals Number of obs = 422 R-squared = 0.1638 Source SS df MS F Prob > F ------------------------------------------------------------------------- Between s_id1 3750.169 17 220.59817 4.65 0.0000 Within s_id1 19146.405 404 47.392091 ------------------------------------------------------------------------- Total 22896.574 421 54.386161 Intraclass Asy. correlation S.E. [95% Conf. Interval] ------------------------------------------------ 0.13487 0.05204 0.03287 0.23687 Estimated SD of s_id1 effect 2.718128 Estimated SD within s_id1 6.884191 Est. reliability of a s_id1 mean 0.78517 (evaluated at n=23.44)
Consequences of Non-Independence of Residuals • The computation of standard errors in OLS depends on the assumption of independence of errors • If errors are not independent, then standard errors will, in general, be too small (so the probability of Type I errors is larger than it should be)
Two extreme examples • n individuals observed from each of K schools (total of nK observations) • if Yik= Yjk for all i and j in school k, then knowing k completely determines Y, so there are really only K unique observations • In this case, we can just treat each school as a single observation (with outcome Y.k), and use OLS on the sample of K schools
Two extreme examples • n individuals observed from each of K schools (total of nK observations) • if YikYjk for all i and j and k, then knowing k tells us nothing about Y, so there are really nK unique observations • In this case, there is no dependence of the errors, so we can use OLS on the sample of nK students.
When do we need multilevel regression? • In the intermediate case, where knowing the school gives us some, but not complete information about Y. • e.g., test scores vary both within and between schools • e.g., individuals vary within and between neighborhoods • e.g., mood varies both within individuals (over time) and between individuals
Part II.A. Farewell OLS
What we know so far • Two observations within the same unit may be more similar than two observations chosen at random • If the regression model does not explain all of the between-unit differences (and it is unlikely that they will), we will have correlated errors within units • This is a violation of the independence of residuals assumption in OLS • At a minimum, this results in incorrect standard errors (too small)
How do we allow dependence in the regression model? • We want a model that explicitly allows the level of the outcome variable to vary across level-two units • For example, we want to let the mean reading score differ across schools • So let’s write a model that allows this
Some notation • i indexes level-one units (people within schools, observations within persons) • j indexes level-two units (e.g., schools, if we have students nested within schools) • We will use r to denote a level-one residual, and u to denote a level-two residual
Farewell OLS: Our first multilevel model • Instead of : • Let’s write:
Farewell OLS: Our first multilevel model Outcome for observation i in unit j
Farewell OLS: Our first multilevel model Outcome for observation i in unit j Intercept
Farewell OLS: Our first multilevel model Outcome for observation i in unit j Value of X for observation i in unit j Intercept Coefficient
Farewell OLS: Our first multilevel model Outcome for observation i in unit j Residual term specific to unit j Value of X for observation i in unit j Intercept Coefficient
Farewell OLS: Our first multilevel model Residual term specific to observation i in unit j Outcome for observation i in unit j Residual term specific to unit j Value of X for observation i in unit j Intercept Coefficient
Farewell OLS: Our first multilevel model Residual term specific to observation i in unit j Outcome for observation i in unit j Residual term specific to unit j Value of X for observation i in unit j Intercept Coefficient
What is uj? • A residual term • Specific to unit j • Common to all observations in unit j • Subscript j, no subscript i • Interpretation: the difference between the overall intercept and the intercept in unit j
What is rij? • A residual term • Specific to observation i in unit j • Has a mean of 0, so any part of ij that is common to all observations within j has been removed • So the rij’s may be independent • Not guaranteed to be independent
Features of this model • Note that: ij = uj + rij • We also have: Var(ij) = Var(uj + rij) = Var(uj)+ Var(rij) + 2*Cov(uj,rij) = Var(uj)+ Var(rij) • We will come back to variance decomposition later
Features of this model • The level of Yij – after adjusting for Xij – may vary across the units • We have made no assumptions yet about the distribution of the uj’s or the rij’s. • The relationship between X and Y does not depend on j (1 does not depend on j)
So how do we estimate this model? • We want an estimate of 1 , the relationship between Xij and Yij. • Two approaches: • Fixed Effects estimator • Random Effects estimator
The fixed effects estimator • We have ‘absorbed’ the level-two error terms (the uj’s) into the intercept • Now each aggregate unit has its own intercept; so between-unit variation is accounted for in the intercepts • This solves the dependence problem with the rij’s (they may still not be independent, but not because of unexplained variation between-level-two units)
The fixed effects estimator • Three methods of obtaining the fixed effects estimator 1 from this model: • Dummy variables for each unit • Change or difference scores • Deviations from mean unit values • All three are mathematically equivalent • All can be estimated via OLS, with some adjustment of the degrees of freedom