480 likes | 693 Views
Survival Analysis; Evidence-Based Medicine. M2 Medical Epidemiology. Problems with naïve analyses Ubiquity of censored data Methods of adjustment for censoring Survival curves and their interpretation. Survival Rates and Censoring. Problems with Naïve Survival Analyses.
E N D
Survival Analysis; Evidence-Based Medicine M2 Medical Epidemiology
Problems with naïve analyses Ubiquity of censored data Methods of adjustment for censoring Survival curves and their interpretation Survival Rates and Censoring
Problems with Naïve Survival Analyses • The table below was obtained several decades ago by averaging ages at death from death certificates of physicians who had practiced in different specialties. • The investigators concluded that there was a dose-response effect associating specialties with higher exposure to radiation with shorter life expectancy. • Why was their conclusion nonsense?
Problems with Naïve Survival Analyses The following headlines, the second of which is from the Champaign-Urbana News-Gazette, feature a researcher who has repeatedly reached very surprising conclusions about smoking and mortality.
These include: Women are at greater risk of smoking-related death than men. Filter cigarettes are more dangerous than unfiltered cigarettes. Several years ago other investigators, using a similarly designed study that received even more publicity, reached the conclusion that left-handers have life expectancy 9 years below that of right-handers. Numerous “plausible” explanations were advanced to explain this. These conclusions were reached by the same method used by the researchers above. Other similar studies have erroneously concluded that effective medical innovations have no impact. Can you explain all this silliness by a single flaw in the design of the research? Problems with Naïve Survival Analyses
Death certificate studies such as those above can be grossly biased, because they don't start with a defined cohort. By selecting into the study population only those people who have already died, they systematically exclude the people who live longest. If the age distributions of two groups differ to start with, the resulting comparisons can lead to ridiculous scientific conclusions due to selection bias. Problems with Naïve Survival Analyses
Thus, only studies that begin with defined cohorts are usually valid. But the cohorts must usually be assembled over a period of time, since people don’t all get sick at once, and members of the cohorts may drop out of the study, or become lost to follow-up, for numerous reasons, and some suffer the outcome of interest earlier than others. Hence, some members of the cohort may be observed for much shorter periods than others. Methods that don’t take these differences in observation periods into account are subject to substantial measurement bias, because the process of monitoring for the outcome differs between individuals. Problems with Naïve Survival Analyses
One way to take observation times into account is to calculate incidence density rates using person-years. This method works well when incidence density is stable over the period of study. However, usually this assumption is false. In the study of long-term survival, we all know that the incidence density of death (the mortality rate) rises with age, which increases over time. Studies of surgical outcomes must deal with perioperative mortality, which is often much higher than later mortality. Mortality rates from cancer change substantially after treatment; 5-year survival for some cancers is regarded as cure. The rate of complications of certain diseases, such as diabetes, increases greatly with duration of the disease. For these situations, calculating overall incidence density rates using person-years pools information from different times in an inappropriate way. Other methods are necessary. Problems with Naïve Survival Analyses
The term “survival data” is used in Medicine and Public Health to describe “time to event” data, where the event is any occurrence of importance to health. Time to death, renal failure, second myocardial infarction, first asthma attack after change in therapy, or time to recovery all are “survival data” in this technical sense.(CAUTION) Survival data in Medicine and Public Health are also usually "censored data," because for some subjects we know only that they have survived at least a certain period of time, but we don’t know when death or other outcome will occur. We stop most clinical trials or vaccine field trials before most subjects die or experience an unfavorable outcome. Otherwise, it would take to long to get an answer and researchers couldn’t get tenure. Ubiquity of Censored Data
To interpret censored survival data well, we must avoid pooling information from different subjects inappropriately avoid pooling information from different times inappropriately take censoring into account without introducing bias into the analysis. We use one of two methods actuarial (Cutler-Ederer) Product-limit (Kaplan-Meier) Both stem from the same fundamental approach: the fundamental equation of survival analysis. Methods of adjustment for censoring
Probability of surviving 2 Periods (example 2 years) • Probability of surviving 1st year (e.g. 80%) • Probability of surviving 2nd year ( not 2 years. Only the 2nd year)i.e. of those alive after 1 year what is the probability of surviving the 2nd year. (e.g. 70%) • Then probability of surviving 2 years is 80% X 70% = 56% • Can we just divide the number surviving 2 years by the starting number? NO
So all we need is: • Percent surviving each time period. • We get that by calculating the percent dying during time period. • Example We start with 90 patients. During first year 20 withdraw and 16 die. • Probability of dying during 1st is 16 dividing by 90 or 70 ? • Half way.
Actuarial method • Number dying during period divided by number alive at beginning of period minus half of the withdrawn. • 16/80= 20% so 80% survive
2nd year • We are starting with 90-20-16=54 • During 2nd year 8 are lost to follow up and 15 die. • Probability of dying in the 2nd year is 15/(54-4)=30%. So 70% survive the 2nd year. • So probability of surviving 2 years is 80% X 70% = 56%
Survival data, converted from chronological to biological time: Methods of adjustment for censoring
Fundamental equation of survival analysisSuppose we select a set of times, symbolized by t1, t2, ... , tk. These represent not calendar time, but durations from a clinically defined starting point such as diagnosis or treatment. Suppose that patients are observed for different durations after this starting point, usually over different intervals of calendar time, as in the previous slide. We are interested in the probabilities that a patient survives until each of the given times t1, t2, ... , tk after the starting point. Why? these are useful measures of prognosis in clinical practice, both for their own sakes, and as complements of CI’s of death at t1, t2, ... , tk we may also use them for comparing cohorts with different exposures in observational epidemiological studies for comparing treatment effects in experimental clinical trials Methods of adjustment for censoring
To estimate these probabilities, we use the Fundamental Equation of Survival AnalysisPr{surviving through time tj} = 1 - Pr{death by time tj} = Pr{surviving through time t1} Pr{surviving time t2|survival through t1} Pr{surviving through t3|survival through t2} Pr{surviving through t4|survival through t3} ... ... ...X Pr{surviving through tj|survival through tj-1} Methods of adjustment for censoring
Thus, the probability of surviving a given duration is expressed as the product of: probability of surviving an initial interval with conditional probabilities of surviving successive subsequent intervals having survived all previous intervals Each of these terms may be separately estimated by pooling data from relevant persons with possibly non-concurrent experiences! Methods of adjustment for censoring
Methods of adjustment for censoring Notation Ox = # alive at beginning of interval x Dx = # dying during interval x Wx = # withdrawn from study or lost to follow-up during interval x
Cutler-Ederer (Actuarial) Approach Intervals specified in advance.Pr{dying during interval x} = Dx /(Ox -Wx/2) Pr{surviving during interval x} = 1 - Pr{dying during interval x} Methods of adjustment for censoring
Kaplan-Meier • Keep track of withdrawals all the time. • Don’t touch the curve until someone dies. • Probability of dying is number dying at this point divided by number still available at the time of death.
Example • You start with 15 patients. • You are notified about withdrawals. • On July 3rd you are notified about 2 deaths (on the same day!) • You look at the number withdrawn up to that point and you find there have been 5. • You divide 2 by 15 minus 5= 20%
Contd • On July 3rd you take your line straight down from 100% to 80%. • So probability of dying is number dying at any point divided by number alive at beginning of previous period minus all withdrawals during that period.
Next • Now we have only 8 patients. • On December 23 1 patient dies. • Between July 3rd and December 23rd 2 patients are withdrawn. • Divide 1 by 8 minus 2 = 1/6= 16.7% • Probability of surviving the 2nd period is 83.3% • Probability of surviving 2 time periods is 80% X 83.3% =66.6%. • So on December 23rd you take the line straight down from 80% to 66.6%
Where do you read at ? • End of line
Product-Limit (Kaplan-Meier) Approach Intervals are determined by times at death. infinitesimally small intervals around each death time, and, in between, intervals during which no deaths occur. Pr{surviving intervals between deaths) = 1 Pr{dying at the xth death time} =Dx/Ox Methods of adjustment for censoring
Methods of adjustment for censoring Kaplan-Meier (product-limit) and Cutler-Ederer (actuarial) survival plots of the same data. Which is which?
Actuarial methods of adjustment for censoring Estimated chance that someone who starts the interval will die within the interval = qx = Dx/(Ox-Wx/2) Estimated chance that someone who starts the interval will survive through it = px = 1-qx Chance of surviving from the beginning of the study to the end of the interval = Px = pxpx-1 px-2 ... p1 = px Px-1
Actuarial Method of adjustment for censoring q1 = D1/(O1-W1/2) = 27/(146-(3/2)) = .1869 p1 = 1-q1 = 1-.1869 = .8131 Px = pxpx-1 px-2 ... p1 = px Px-1 P1 = p1 = .8131
Actuarial Method of adjustment for censoring P1 = p1 = .8131 q2 = D2/(O2-W2/2) = 18/(116-(10/2)) = .1622 p2 = 1-q2 = 1-.1622 = .8378 Px = pxpx-1 px-2 ... p1 = px Px-1 P2 = p2 p1=.8378x.8131 = .6812
Cox Proportional Hazards • Car going at constant 20 MPH through varying traffic, curves etc. • Risk of accident varies instantaneously according to traffic, road condition etc. • Another car going through exact same roads and traffic but at 40 MPH. • Risk of accident is twice(?) as much at every instant.
Proportional hazards • Hazard varies over time but the ratio of the hazard remains constant. • Sir David Cox in 1972 introduced a method to calculate proportional hazard without calculating the actual time dependent hazard. • This proportional hazard can be “adjusted” for covariates (Cox Regression).Output: HR Hazard Ratio (similar to OR) • Breslow introduced a way to estimate hazard at any particular time.
Survival curves always start at 1.0=100% on the vertical axis, and must decline. The only issue is how fast they decline. Further, if one follows patients long enough, all curves describing actual survival (in contrast to some other outcome that doesn't affect everyone eventually) end at zero. The issue is therefore not where they end, but how much higher one curve is relative to another, or the area between the curves. This is no surprise, it’s just the cumulative incidence issue in another form, since survival “rates” are just complements, with respect to 1, of cumulative incidences. Survival Curves and their Interpretation
Trends in survival curves may be much less accurate towards the right end than at the beginning, because fewer people contribute to the computation at the rightend, most subjects having been observed for shorterintervals. However, this problem of unreliability may be somewhat mitigated by the tendency of the true survival curve to flatten out in many real situations. Note that it’s not as much the height at the end that’s less accurate as it is the slope at the end. This point is important in understanding prognostic estimates made near the ends of the curves, as described below. Survival Curves and their Interpretation
Later Prognosis Survival curves can be used to estimate the outlook for a patient who has already survived a certain length of time, by dividing the height of the curve later by its present height. Thus, if a patient who has survived a myocardial infarction for 2 years wants to know the chances of surviving another year, divide the 3-year survival rate by the 2-year survival rate. This gives the estimated fraction, of those who survived the first 2 years, who will make it through another year. Survival Curves and their Interpretation
From the blurry curve below, can you determine roughly the chance that someone who has already survived for three years will survive for two more? Survival Curves and their Interpretation
Reiterating a previous point, survival analysis is applied to the development of any irreversible outcome, not just mortality. It is also frequently applied to the first occurrence of a reversible outcome as well. Survival curves are sometimes plotted with a logarithmic vertical scale, especially when the mortality rate is roughly constant. In that case the survival curves look like straight lines. Watch the scale or you can be badly misled. Survival Curves and their Interpretation
In interpreting survival curves, the choice of starting point is critical as well as the shapes of the curves. For instance, if you evaluate a screening program by starting at time of diagnosis, and compare survival from diagnosis of a screened and unscreened group, then screening will always look good. Why? Survival Curves and their Interpretation
Because survival for the screened group is being measured from an earlier point in the disease process than for an unscreened group. This is called "lead-time bias," a measurement bias. It may be that an apparent survival advantage in the screened group simply reflects the extent by which screening moved up the date of diagnosis of the disease, rather than any impact of early detection and treatment on true survival. Beware this trap! Survival Curves and their Interpretation ...screening will always look good. Why?
The figure above compares three survival curves, but gives no indication of how reliable these curves are. They might be from large samples or very small samples, and be statistically very stable or highly variable. We can't tell. Survival Curves and their Interpretation
The graph to the right is more informative. With each curve, at the end of each interval, is the number who survived the interval without a recurrence (Ox-Dx-Wx), shown as a fraction of the number (Ox) who reached the start of the interval without a recurrence. We see from the curves that they are based on only a few patients. Specifically, we see that even though things look encouraging after two years, there is very little information in these data about that period of time. Survival Curves and their Interpretation
This plot gives information about variability in a different form, by using standard error bars for each survival rate. Just as means, proportions, or any other statistic, a survival rate has a standard error that reflects how variable the statistic is from sample to sample under the same conditions. Survival Curves and their Interpretation
Survival Curves and their Interpretation The standard error bars give us more direct information than the sample sizes as to how precisely the survival rate at each time is estimated by the given set of data. The error bars below show the survival rates are quite imprecise.
Survival Curves and their Interpretation • The figure below tries to combine the best features of the previous two, by including both • the number of individuals observed to survive each interval, and • standard error bars for the survival rates plotted at the end of each interval. • This makes the figure "busy,” but more informative than the others we have seen.
Survival Curves and their Interpretation The plot below compares survival of lung cancer patients diagnosed during three successive decades. Visually, the increase in long-term survival looks quite noticeable. What special feature of this plot makes the visual impression exaggerate the beneficial trend?
Survival Curves and their Interpretation The literature is also replete with plots of cumulative probabilities of events over time, such as the plot below. These are obtained by the same method as survival plots. The only difference is that, rather than plot the survival probability, the researchers subtract it from one first.