240 likes | 352 Views
Multilevel Data in Outcomes Research. Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage Estimates” versus Fixed Effects Example of CA State CABG data. What are multilevel data?.
E N D
Multilevel Data in Outcomes Research • Types of multilevel data common in outcomes research • Random versus fixed effects • Statistical Model Choices • “Shrinkage Estimates” versus Fixed Effects • Example of CA State CABG data
What are multilevel data? • Gathering individual observations into larger groups does not create clustered data • Individual observations from a simple, random sample are never multilevel • Multilevels are a result of sampling/design • Usually from stages/levels in obtaining the individual units of observation • Repeated measures is a type of multilevel data
Other Names for Multilevel Data • Hierarchical models • Clustered data (but different from cluster analysis) • Components of Variance models • Contextual Models • Micro and macro level data
Multilevel Data in Outcomes Research • Two levels: • Hospitals and patients • Physicians and patients • Three levels: • Hospitals, physicians, and patients • Physicians, patients, and repeated measures • Four levels: • National Health Interview Survey
National Health Interview Survey • Highest level: Select Primary Sampling Units (MSA’s, counties, groups of counties) • Next level: Stratify PSU’s by Census blocks and select Secondary Sampling Units (clusters of households) • Next level: Select Households within SSU’s • Lowest level: Interview individuals in the households (some all, others a sample)
Characteristics of Multilevel Data • Measurements within level are correlated (eg, measures on same person are more alike than measurements across persons) • Variables can be measured at each level • Standard statistical models and tests are incorrect • The variance of the outcome can be attributed to each level
Two Parts of Multilevel Data VarianceOutcome = Patient Satisfaction Score Level 2: Physicians MD1: mean=81 MD2: mean=58 MD3: mean=74 55 61 68 74 75 79 81 85 77 Level 1: Patients Variance in the patient score divides into two parts: (1) the variance between physicans = 2B (2) the variance within the physicians = 2W So the total variance = 2B + 2W
Intraclass Correlation Coefficient (ICC) The intraclass correlation coefficient (ICC) is a measure of the correlation among the individual observations within the clusters It is calculated by the ratio of the between cluster variance to the total variance: 2B / (2B + 2W )
Intraclass Correlation Coefficient (ICC) MD1: mean=81 MD2: mean=58 MD3: mean=74 58 58 74 74 74 74 81 81 81 Take extreme case where each MD’s patients have the same score = no variance within the physicians. So, ICC = 2B / 2B + 2W = 2B / 2B + 0 = 1 = perfect correlation within the clusters.
Intraclass Correlation Coefficient (ICC) MD1: mean=71 MD2: mean=68 MD3: mean=74 58 78 54 94 84 64 81 61 71 A different case where each MD’s patients have very different scores = most of the variance is within the physicians (ie, between patients, not physicians). ICC is close to 0.
Implications of ICC for Analysis • When the ICC is close to 0, most of the variation is explained by patient level measures • Less difference between results from ordinary regression and multilevel models • May be less important to use a statistical model that allows variables for physician characteristics
Implications of ICC for Analysis • When the ICC is close to 1, most of the variation is explained by physician level measures • Using a statistical model that removes physician effects leaves little variation to explain • Important to use a statistical model that allows variables for physician characteristics
Methods of Analyzing Multilevel Data • Regression model ignoring higher level variables • Regression model with an indicator variable for each level 2 unit (minus one) • Conditional regression model • Regression model with generalized estimating equations (GEE model) • Random or mixed effects regression model
Choice of Analysis Model: Three Main Considerations • What is the research question? • How many observations are there at each level of the data? • How important is controlling unmeasured confounding at the higher level?
Fixed versus Random Effects • Effects are random when the units are a sample of a larger population • have variation because sampled; another sample would give different data • Effects are fixed if they represent all possible members of a population: • eg, male/female; treatment groups; all the regions of the U.S.
Fixed versus Random Effects • Effects treated as fixed or random depending on the research question • Random effects: generalize from the sample to a larger population • Random effects: reduce variation due to small sample size by fitting a distribution • Fixed effects: Control for unmeasured confounding at the higher level
Methods of Analyzing Multilevel Data Fixed Effects Models: - Regression model with an indicator variable for each level 2 unit (minus one) - Conditional regression model Random Effects Models: - Regression model with generalized estimating equations (GEE) - Random or mixed effects regression model
What are “shrinkage estimates”? • Also called Bayesian or empiric Bayesian estimates (Iezzoni text) or Best linear unbiased prediction estimates (SAS) • Can only be obtained from a random effects (not GEE) regression model • Variance of the higher level variable is modeled as if from a specified distribution (usually normal, but other possible)
A Simple Random Effects Model • A simple random effects model is: yij = + j+ eij, where = overall mean, j = difference for MD, and eij = individual error • Model says there is random variation from the mean score at the level of MD’s plus variation at the level of patients • Bayesian estimates are the individual j’s obtained from the overall distribution
Example of Shrinkage Estimates • In Patient Outcomes Research Team study of patient satisfaction with MD treatment for diabetes, raw mean patient scores by MD ranged from 53.4 to 87.1 • The random effects shrinkage estimates of the mean patient scores by MD ranged from 60.4 to 78.6 • Random effects shrinkage estimates are closer to the overall mean
Controversy in Outcomes Research • Report Cards rank hospitals or physicians • Data used has at least two levels (hospitals or physicians and their patients) • Controversy is over the choice of statistical model for evaluating variation at the hospital or physician level
Methods of Analyzing Hospital (or MD) Mortality Variance • Ignore hospital, run ordinary regression then predict average for each hospital • Remove hospital effect with indicator variables for hospitals (fixed effects model) then predict average for each hospital • Run random effects regression and obtain the Bayesian/shrinkage estimates for each hospital
Shrinkage estimates and CA State CABG Data • Unadjusted estimate for each hosptial is estimated as from a normal distribution • More weight is given to hospitals with more CABG patients • Hospitals with smaller numbers move closer to the mean in modeling a normal distribution • Estimates somewhat software dependent
Shrinkage Estimates: Software • Obtaining shrinkage estimates involves some software choices • Not all software provides them • STATA by itself doesn’t provide them • Different likelihood methods of fitting models • STATA add-on GLLAMM (free download) • SAS • For linear outcome, PROC MIXED • For non-linear, PROC NLMIXED and GLIMMIX • Some other software for multilevel data