1 / 32

Modeling (estimating) the covariance of the data.

Modeling (estimating) the covariance of the data. Up to this point, we have concentrated on reducing (simplifying) the outcome data. Now, we turn our attention to methods where the data is kept in its longitudinal (repeated measures) form.

gene
Download Presentation

Modeling (estimating) the covariance of the data.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling (estimating) the covariance of the data. • Up to this point, we have concentrated on reducing (simplifying) the outcome data. • Now, we turn our attention to methods where the data is kept in its longitudinal (repeated measures) form.

  2. Repairing Ordinary Least Squares to Account for Correlation Getting consistent estimates of var( ).

  3. Dental Data

  4. Dental Data

  5. Want to model the effect of gender and age on dental distance. • Yij distance of the jth measurement on the ith child, • Xij1 is the age of ith child, jth measurement, • Xij2 is the gender of ith child, jth measurement (0 is male, 1 is female) - same for all j measurements.

  6. Reviewing Notation and the Linear Model • Yij will represent a response variable, the jth measurement of unit i. • xij = (1, xij1, xij2,..., xijp) will be a vector of length p+1 of explanatory variables observed at the jth measurement. • j = 1,ni. i=1,m. • E(Yij)=ij, Var(Yij)=vij

  7. Nesting Observations (measurement within individual) • Set of repeated measures for unit i are collected into a ni-vector Yi=(Yi1,Yi2,...,Yini). • Yi has mean, E(Yi)=i and ni x ni covariance matrix Var(Yi)=Vi. • The jk element of Viis the covariance between Yij and Yik, that is Cov(Yij,Yik)=vijk. • Cov(Yij,Yi’k)ii’=0, i.e., measurements on different subjects are uncorrelated.

  8. Combining all observations into a big data set. • We will lump the responses of all units into one big vector Y=(Y1,...Ym) which is an N-vector (total number of observations): • Most of the course will focus on regression models of the sort:

  9. Combining, cont. • We can write the model for the ith person as • and for the entire data as:

  10. The Variance-Covariance of Yi - Most General vi11 vi12 vi13 vi14 ... vi1ni vi12v22 vi23 vi24 ... vi2ni Vi = . . . . . . . . . . . . vini1 vini2 vini3 .... vinini • More on this later

  11. The Variance-Covariance of Y V1 00 0 ... 0 0 V2 0 0 ... 0 0 0 V3 0 ... 0 V(Y) = . . . . . . . . . . . . 0 0 0 0 ... Vm

  12. Ignoring the correlation during estimation of ’s and using least squares • What does least squares regression do? It minimizes the squared differences between the observed and predicted outcomes, Yij where • That is, it minimizes the residual sum of squares.

  13. Ordinary Least Squares • The LS solution is (in matrix form): • Does the fact that the data is may be correlated (E[eijeik] 0) make this an erroneous estimate? • First question to ask is if this estimate is biased, i.e., does,

  14. Expected values of coefficient estimates • Note that EY is the expectation of a vector or simply the list of expectations of the components of the vector.

  15. Expectation of Vectors • Now looking at vector of observations made on an individual. E(Yi) = E(Yi1,...,Yini) = (xi1T,..., xiniT)=Xi. • Finally, look at the vectors stacked together. E(Yi) = E(Y1,Y2, ... , Ym) = (X1, X2,..., Xm) = XN x (p+1) (p+1) x 1

  16. The Answer, Finally!! • That is, the estimator is unbiased, regardless of the correlation among measurements made on the same person, i. • From a simple estimation view, ignoring correlation in the linear model might be OK. • It’s when inference is made about , correlation needs to be considered.

  17. var( ) • So, if one knows V, then the variance of the estimated coefficients is known. • However, one never knows V so you have to estimate V. • Estimating V will require some assumption about how the Yij’s are correlated (the form of the Vi’s). • , in OLS, the observations are assumed to be independent and so V is assumed to look like:

  18. var( ), cont. 2 0 0 0 ... 0 0 2 0 0 ... 0 V(Yi) = . . . . . . . . . . . . 0 0 0 0 ... 2 • Typically, in OLS, the observations are assumed to be independent and so Vi is assumed to look as above.

  19. OLS estimate of var( ) • It still remains to estimate 2, which is now the variance of the Yij’s. Note that: • We don’t know the eij’s, but we can estimate them from the residuals:

  20. OLS estimate of var( ), cont. • To get a estimate of 2, get an average (sort of) of the squared residuals: • Then, this is plugged back Vi to get:

  21. OLS estimate of var( ), cont. • Finally, plug this back into (***) to get the estimated variance of the coefficients • This is a matrix where the diagonal elements are the estimated variances of the coefficients. The SE( ) are the square-roots of these estimated variances.

  22. Problem!! • Ignoring the correlation, as in the standard errors returned by OLS, can give biased estimates of . • This can lead to erroneous confidence intervals and tests. • However, we can still use OLS if we can just repair the estimates of . • That means getting good estimate of V.

  23. Estimating V • First, we will take the case that measurements are time-structured and the same for each person (ni=n) - the dental example. • For this type of data, one can assume the most general structure about V (unstructured). • The goal is to estimate the individual components of Vi (the vijk) using the residuals of OLS.

  24. Form of the Vi. • In order to estimate the V, we will assume that the individual covariance matrices (the Vi) are all the same: V1 = V2 = ... = Vm. • That is equivalent to saying the vijk = vjk, or the variance and covariances of equivalent observations are the same for every individual. • That way, we will only have to estimate one Vi of in order to build V. • Thus, we can call each V1, V2, ..., Vm = V0.

  25. The Variance-Covariance of Y V0 00 0 ... 0 0 V0 0 0 ... 0 0 0 V0 0 ... 0 V(Y) = . . . . . . . . . . . . 0 0 0 0 ... V0

  26. Estimating the components of V0. • If we observed the errors, then it would be easy to estimate the vjk. • We don’t observe the errors, but we can still estimate the covariance of any two measurements on the same person using the residuals.

  27. Estimating V0 using Matrix Formulation • Let r be a matrix of residuals where each row is the residuals, time-ordered, for each person r11 r12 ... r1n r12r22 ... r2n r = . . . . . . . . . rm1 rm2 ... rmn • Then,

  28. Summary Algorithm • Perform OLS regression to get estimates and residuals • Make matrix r. • Estimate V0. • Create estimate of V. • Use estimate of V to get estimate of

  29. Applied to Dental Data - Sort of(Splus) Simple linear model lm.dental <- lm(distance ~ age + gender + age * gender, data = dental) > summary(lm.dental) Call: lm(formula = distance ~ age + gender + age * gender, data = dental) Residuals: Min 1Q Median 3Q Max -5.616 -1.322 -0.1682 1.33 5.247 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 17.3727 1.7080 10.1712 0.0000 b0 age 0.4795 0.1522 3.1515 0.0021 b1 gender -1.0321 2.2188 -0.4652 0.6428 b2 age:gender 0.3048 0.1977 1.5421 0.1261 b3 Residual standard error: 2.257 on 104 degrees of freedom Multiple R-Squared: 0.4227 F-statistic: 25.39 on 3 and 104 degrees of freedom, the p-value is 2.108e-012 Correlation of Coefficients: (Intercept) age gender age -0.9800 gender -0.7698 0.7544 age:gender 0.7544 -0.7698 -0.9800

  30. Applied to Dental Data - Sort of(Splus) Correcting for Correlation > gee.dental <- gee(distance ~ age + gender + age * gender, id = child, data = dental) > summary(gee.dental) Coefficients: Estimate Naive S.E. Naive z Robust S.E. Robust z (Intercept) 17.3727273 1.7080306 10.171204 0.7252063 23.9555672 b0 age 0.4795455 0.1521635 3.151515 0.0631326 7.5958453 b1 gender -1.0321023 2.2187969 -0.465163 1.3777851 -0.7491025 b2 age:gender 0.3048295 0.1976661 1.542143 0.1168673 2.6083390 b3

  31. Applied to Dental Data - Sort of(STATA) Simple linear model . gen inter = age*gender . fit distance age gender inter Source | SS df MS Number of obs = 108 -------------+------------------------------ F( 3, 104) = 25.39 Model | 387.935027 3 129.311676 Prob > F = 0.0000 Residual | 529.757102 104 5.09381829 R-squared = 0.4227 -------------+------------------------------ Adj R-squared = 0.4061 Total | 917.69213 107 8.57656196 Root MSE = 2.2569 ------------------------------------------------------------------------------ distance | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .4795455 .1521635 3.15 0.002 .1777996 .7812913 gender | -1.032102 2.218797 -0.47 0.643 -5.43206 3.367855 inter | .3048295 .1976661 1.54 0.126 -.0871498 .6968089 _cons | 17.37273 1.708031 10.17 0.000 13.98564 20.75982 ------------------------------------------------------------------------------

  32. Applied to Dental Data - Sort of -(STATA) Correcting for Correlation . xtgee distance age gender inter, i(child) robust GEE population-averaged model Number of obs = 108 Group variable: child Number of groups = 27 Link: identity Obs per group: min = 4 Family: Gaussian avg = 4.0 Correlation: exchangeable max = 4 Wald chi2(3) = 148.77 Scale parameter: 4.905158 Prob > chi2 = 0.0000 (standard errors adjusted for clustering on child) ------------------------------------------------------------------------------ | Semi-robust distance | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .4795455 .0643352 7.45 0.000 .3534507 .6056402 gender | -1.032102 1.404031 -0.74 0.462 -3.783952 1.719748 inter | .3048295 .1190935 2.56 0.010 .0714105 .5382486 _cons | 17.37273 .739021 23.51 0.000 15.92427 18.82118 ------------------------------------------------------------------------------

More Related