Death and Missing Data in Longitudinal Studies: Quality of Life at the End of Life

Death and Missing Data in Longitudinal Studies: Quality of Life at the End of Life Paula Diehr Maximising return from cohort studies: prevention of attrition and efficient analysis London 6-25-2006

Charge • “The use of imputation to deal with attrition in cohort studies” • I will concentrate primarily on what to do about death in longitudinal studies • In my cohorts of older or sicker adults more than half the missing values are missing due to death • Taking care of the deaths first often helps deal with the other missing data

My MO • First step: create a meaningful graph • Organize the data • A place for every observation that could have been made (if the person hadn’t died) • Do something about the deaths • assign a valid value • Impute the (remaining) missing data • Graph • Analyze

Outline • ADHC example (very simple) • C3 example (more issues) • Death • Organization • Missing data • Analysis

Example 1: ADHC Diehr and Johnson. Accounting for missing data in end-of-life research. Palliative Care 2005; 8:S50-S57.

Example: ADHC • Adult Day Health Care study • RCT (ADHC vs Usual Care) • 939 Frail Veterans • At risk of nursing home placement • 1 year study: data at 0, 6, 12 months • Findings: ADHC expensive, ineffective • Frail veterans didn’t fail • Why?

Health Variable • Utility (sort-of) • 0 to 100 • 100 is perfect health • (0 is dead, but will let dead be missing at first)

Accounting • 939 persons • 3*939=2817 observations if complete • 502 observations were missing • 302 missing because of death • 200 missing for other reasons • 60% of missing were due to death

In ADHC Example: • Complete case data too optimistic – significant improvement (65% complete) • Available data even more optimistic • Accounting for the deaths showed significant decline (84% complete) • Imputing remaining missing values showed significant decline (100% complete) (ITT)

Example 2: C3 Study Complementary Comfort Care Bill Lafferty, P.I. NCI

Study Design • RCT • Effect of massage or meditation on QOL and Sx in patients at the end of life • QOL and Sx assessed ~ every week until death • In progress • 3 years of data collection • First 100 cases (DSMB ok)

Outcome Variables • Quality of Life (QOL) • Symptoms (SX) • Health Rating (Hlthrat)

QOL (pqol) How would you rate your overall quality of life during the past 7 days? 0 is NO QUALITY OF LIFE to 10 is PERFECT QUALITY OF LIFE • Note: if 0 had been “dead”, this would be a “preference-rated / utility / rating scale” variable and dead would have the value zero. Missed opportunity.

Health rating (Hlthrat) • 0=worst possible health you can imagine and still be alive • 10 = as near perfect health as you can imagine • Baseline only

2-Death Everyone is expected to die in C3.

Approaches to Handle Death • Ignore • Set death to a “low” value, perform sensitivity analysis to see if final results change (arbitrary) • Impute the values after death as if person was still alive (immortal cohort) • Joint modeling of survival and health • Health conditional on being alive • Transformation approach

Transformation Approach • Transform the outcome variable that has no value for death to another variable that does have a natural value for death. • Dichotomize, assign deaths to “low” category. • Transform to a probability • Probability of being healthy • Dead have probability 0

Probability Transformations • Probability (QOL > 7 now | QOL now) • Dichotomize (good QOL > 7 or bad QOL <7 now) • Probability (QOL > 7 next week | QOL now) • Probability (Hlthrat > 7 now | QOL now) • Diehr et al, J Clin Epidemiology, 2005

Ordinal • OK if dead is worst QOL • State worse than death • OK if nonparametric analysis (ordinal) • Mean is meaningless • Without deaths? • With deaths • Mean Difference or change or AUC is meaningless

Dichotomize to Good QOL yes/no • Dead = 0 • OK if death is not good QOL • Mean interpretable, any analysis OK • AUC=weeks with good QOL • Change meaningful • Loses information? • Bad cutpoint? • Assume death is bad QOL

Pr (Good QOL 1 week later|QOL now) • Estimated from transition pairs • Dead have 0 probability of high QOL 1 week later • Mean interpretable, any analysis OK • AUC = # good QOL weeks starting 1 week after b/l • change, difference • Assume is death part of the QOL construct (dead people have bad QOL). Probably ok.

QOLt = Pr (Good health now |QOL now) • Dead have 0 probability of being healthy now. • Mean interpretable, any analysis OK • AUC = Healthy weeks starting at B/L • change, difference OK • Assume death part of the health construct. (Dead people not healthy). This seems obvious • Dead vs. 0

Transformation modifies relative spacing • QOL, all distances are the same • 10-9 = 1 • 2-1 = 1 • QOLt different • 75-66=9 • 8-5 = 3 • Break between 6 and 7=1, 100, 20, 10 • Use QOLtfor this analysis

Transform to prob(healthy) • “Healthy” = Hlthrat score of 7 or more • Logit(healthy0) = -3.323 + .442* QOL0 • QOL = original coding • QOLt = transformed to Prob(healthy) • QOLtd = QOLt with deaths set to zero • QOLtdi = QOLtd with missing imputed

SX • Memorial Symptom Assessment Scale (MSAS) • In the past week did you have: • Difficulty concentrating, Pain, Lack of energy, Cough, Changes in skin, Dry mouth, Nausea, Feeling drowsy, Numbness/tingling in hands and feet, Difficulty sleeping, Feeling bloated, Problems with urination, Vomiting, Shortness of breath, Diarrhea, sweats, mouth sores, problems with sexual interest, itching, lack of appetite, dizziness, difficulty swallowing, change in the way food tastes, weight loss, hair loss, constipation, swelling of arms or legs, “I don’t look like myself”, other (!) • Feeling sad, worrying, feeling irritable, feeling nervous

Sx Scoring (MSAS) • First 22: • 0 did not occur; • 1.6 a little bit, • 2.4 somewhat, • 3.2 a lot, • 3.8, occurred but did not bother me at all, • 4.0 bothered me very much • Last 4: • 0 did not occur, • 1 occurred rarely, • 2 occasionally, • 3 frequently, • 4 almost constantly • Total score is average value (high is bad, 4 is max) • “Continuous”, low value is good

Transform SX to SXt • Transformation can be done for continuous variables

3-organization

Longitudinal Data-- Ideal • Rectangular File • Spread sheet • A QOL value in every cell • ADHC • 939 rows (1 row for each person) • 3 columns (0, 6, 12 months) • C3 • 300 rows (1 row for each person) • 3*52 = 156 columns, (1 column for each week)

ADHC was not ideal • We set dead to zero • We imputed the missing • Complete 3 x 937 array

C3 not ideal • Deaths • Missing data • Unscheduled weeks • Recruited over time • persons will have unequal number of weeks • Each person has a different schedule • When did the missing interviews “not happen”?

Tidy Dataset • Person’s potential f/u = weeks from enrollment to end of data collection • Bin (cell, column) for each week of potential f/u • First enrollee will have 52*3 bins • Enrollee 2.5 years later will have 52/2=26 bins • Deaths: Set value in bins from death to the end of this person’s potential follow-up to zero

Person 34 • 50-year old man • Referred from Hospice • Dying of cancer, frequent severe pain • QOLbase = 10 • SXbase = .75 • Lived 135 days (19 weeks) • Potential f/u 463 days (66 weeks) • (from his enrollment to end of data collection) • 328 days dead (47 weeks)

Person 34 QOL (original coding)

Person 34 QOLt (transformed)

Person 34 QOLtd (set dead to zero)

4- missing data and imputation

Influence of the deaths • Complete case analysis gives no weight to deaths • Transforming and setting deaths to 0 may give too much weight to deaths, because after death a person has no missing data • May need to impute other missing data as well • Can remove later as sensitivity analysis • Only during potential follow-up

Missing • All methods are based on untestable assumptions • Multiple imputation for cross-sectional missing • Software • Longitudinal, jury’s still out • No software • C3 data surely not MAR • (unless accounting for death makes them MAR?) • Gain some intuition

CHS Subjects who return from being missing • Y0 Y1 _ _ (Y4) _ Y6 Y7 • Y4 is “like” a missing value • 10 times as likely to be missing as Y1 or Y7 • This person had other missing data • Like healthier subset of missing? • Impute Y4 in various simple ways • Compare observed to imputed value of Y4 • Engels and Diehr. Journal of Clinical Epidemiology 2003; 56:968-976.

Findings • Most imputed values were biased too healthy • Best were: (before+after)/2, LOCF, NOCB, regression on baseline data • Most imputed values were under-dispersed • Best were: NOCB, LOCF • Conclusion: use the person’s own longitudinal data to impute missing data

Imputation of Missing • Everyone has a favorite method • I prefer imputation by a simple method, using the person’s own longitudinal data • Knowing person died helps • Scatterplot of QOLtd by several f(time) for each person who died • Log of “time until death” looked the best for all subjects.

Imputation of Missing Data(weeks with no entry) • Separate regression for each person. • Set QOLtdi = a + b* ln(days before death) if QOLtd is missing • Other approaches • Modeling • Multiple imputation

Person 34 QOLtdi (impute missing)

Different N Interpretation

Death and Missing Data in Longitudinal Studies: Quality of Life at the End of Life