640 likes | 655 Views
This lecture explores key functions and parametric distributions in survival analysis, including survival and hazard functions, median survival, and common parametric distributions. It also discusses censoring and truncation using a cognitive function recovery study in rats.
E N D
Lecture 2: Key Functions and Parametric Distributions Survival Function Hazard Function Median Survival Common Parametric Distributions
But First • Let’s think a little more about censoring and truncation using an example… • An investigator is interested in determining if treatment with amoxetine leads to recovery of cognitive function in rats with brain lesions that mimic Parkinson’s disease. • The outcome of interest is time to “complete” recovery of cognitive function • i.e. time it takes to return to baseline cognitive function after treatment with amoxetine.
Amoxetine and Cognitive Function • Collect baseline measure of cognitive function • Time to correctly perform water radial arm maze (WARM) task • Induce cognitive impairment • Treat 4 week old rats with N-(2-chloroethyl)-N-ethyl-bromo-benzylamine (DSP-4) • causes noradrenergic lesions in the locus coeruleus. • Treat lesioned animals with Amoxetine • daily dose for 4 weeks (ages 4 to 8 weeks) • 0, 0.3, 1.0, or 3.0 mg/kg • Measures cognitive performance post treatment • weekly for 16 weeks (ages 8 to 24 weeks) • Endpoint: time it takes reach >75% baseline cognitive function
Describe the type of censoring • Rat survives to 24 weeks of age but never achieves complete cognitive recovery • Rat does not achieve complete cognitive recovery at 12 weeks but does by 13 weeks • Rat that dies at 16 weeks but has not yet achieve complete cognitive recovery
Describe the type of censoring • Rat doesn’t develop brain lesions due to misplaced DSP-4 treatment and shows complete cognitive recovery at 8 weeks • Rat shows complete cognitive recovery 8 at weeks
Time to Event Outcomes • Modeled using “survival analysis” • Define X = time to event • X is a random variable • Realizations of X are denoted x • X>0 • Key characterizing functions • Survival functions • Hazard rate (or function) • Probability density function • Mean residual life
PDF, survival function, hazard rate, and mean residual life • S(x) • f(x)
PDF, survival function, hazard rate, and mean residual life • h(x) • mrl(x)
Survival Function • S(x) = the probability of an individual surviving to time x • Basic properties:
Types of time to event data • Continuous t • Observe actual time • Discrete t • Interval censoring • Grouping into intervals Where p(xj) is the probability mass function, P(X = xj)
Example of Discrete Time to Event • Discrete Uniform (3 times possible)
Hazard Rate • A little harder to conceptualize • Instantaneous failure rate or conditional failure rate • Interpretation: probability that a person at time t experiences the event in the interval (x, x+Dx) given survival to time x.
Hazard Rate • Relationship between h(x), S(x) and pdf (continuous):
Hazard Function • Useful for conceptualizing how the chance of an event changes over time • i.e. consider hazard ‘relative’ over time • Examples: • Treatment related mortality • Early on, high risk of death • Later on, risk of death decreases • Aging • Early on, low risk of death • Later on, high risk of death
Shapes of Hazard Functions • Increasing • Natural aging and wear • Decreasing • Early failures due to device or transplant failures • Bathtub • Populations followed from birth • Hump Shaped • Initial risk of event, followed by decreasing chance of event
R Code for Hazard Function Shapes #Examples of hazard function shapes weibull.hazard<-function(x,alp,lam) { h<-alp*lam*x^(alp-1) return(h) } loglogistic.hazard<-function(x,alp,lam) { h<-alp*lam*x^(alp-1)/(1+lam*x^alp) return(h) } x<-seq(0, 6, 0.05) h1<-weibull.hazard(x, 1.5, 0.25) plot(x, h1, type="l", lwd=2, ylab="Hazard Function", xlab="Time", ylim=c(0,1)) h2<-loglogistic.hazard(x, 0.5, 0.25) lines(x, h2, lwd=2, col=2) h3<-loglogistic.hazard(x, 2, 1) lines(x, h3, lwd=2, col=3) h4<-0.01*(x-3)^4 lines(x, h4, lwd=2, col=4)
Cumulative Hazard Function • Often used instead of the hazard function • Relationship between H(x) and S(x) • More on this later or model checking…
What if time is discrete? • So far we’ve focused on time Xas a continuous r.v. • Discrete x • Interval censoring • Grouping into intervals • Depending on level of discreteness, use discrete data approach where p(xj) is a pmf(P(X= xj)).
Complications • How can we use this to define our “discrete” hazard function?
Complications • How can we use this to define our “discrete” hazard function?
Mean Residual Life • Biomedical applications • Median is very common • MRL is not common • MRL = the expected residual life • Theoretically, could be useful to predict survival times given survival to a certain point in time.
Mean • We do not see the mean quantified very often in biomedical applications • Why? • Recall our censoring issue • Empirical means depend on parametric model • Means can only be ‘model-based’ • Somewhat counterintuitive, especially when alternatives exist • More common: median
Median • Very/Most common way to express the ‘center’ of the distribution • Rarely see another quantile expressed • Find time xsuch that • Complication: in some applications, median is not reached empirically • Reported median based on model seems like an extrapolation • Often just state ‘median not reached’ and given alternative point estimates
X-Year Survival Rate • Many applications have ‘landmark’ times that historically used to quantify survival • Examples: • Breast cancer: 5 year relapse-free survival • Pancreatic cancer: 6 month survival • Acute myeloid leukemia (AML): 12 month relapse-free survival • Solve for S(x) given x
Common Parametric Distributions • Course will focus on non-parametric and semi-parametric methods • But… some parametrics can be useful • Especially for trial design • Note that power and precision are improved under parametric approaches versus others
Example 1: Exponential • Recall the exponential distribution • f(x) = • F(x) = • What is S(x) based on F(x) and f(x) • S(x) =
Example 1: Exponential • What about H(x) and h(x) • H(x) = • h(x) = • l represents the failure rate per unit of time • Large l, rapid decay • Small l, slow decay
R Code for the Plot time<-seq(0, 60, 0.1) S1<-exp(-0.1*time) S2<-exp(-0.05*time) S3<-exp(-0.01*time) plot(time, S1, xlab="Time", ylab="Survival Function", col=3 , lwd=2, type="l") lines(time, S2, col=2 , lwd=2) lines(time, S3, col=4 , lwd=2) labs<-c(expression(paste(lambda, " = ",0.1, sep="")), expression(paste(lambda, " = ",0.05, sep="")), expression(paste(lambda, " = ",0.01, sep=""))) legend(x=45, y=.95, labs, col=c(3,2,4), lty=c(1,1,1), lwd=(2,2,2), cex=0.9)
Example: Kidney Infection after Catheterization • Kidney infection after catheter insertion in patients using portable dialysis equipment • Time to event was time to catheter removal BUT should be noted that catheter can be removed for reasons other than infection (right censored) • Only 76 observations (!) • Time to infection is outcome of interest • Question: can we describe it using a parametric approach?
Kidney Infection Example:Survival curve and 95% confidence intervals
Exponential • Overly used due to simplicity • One parameter • Recall: S(x) = e-lx • So let’s revisit the hazard function:
Exponential • Mean = • Median =
Exponential • MRL = • “lack of memory” • Realistic?
Exponential • Recall the cumulative hazard function H(x) • For exponential: • Plot of ln(H(x)) vs. ln(x) should be a straight line with: • Slope = • Intercept = • Use to check model with non-parametric distribution of H(x)
R Code library(survival) surv.kid<-Surv(kidney$time, kidney$status) fit.kid<-survfit(surv.kid~1) exp.kid<-survreg(surv.kid~1, dist="exp") plot(fit.kid, xlab="Time", ylab="Survival Fraction") # summarize KM estimator to get median survival summary(fit.kid) names(fit.kid) # define log cumulative hazard and log time logHt<-log(-log(fit.kid$surv)) logt<-log(fit.kid$time) # Plot log cumulative hazard vs. log time plot(logt, logHt, lwd=2, type="l", xlab="log(t)", ylab="log(H(t))") points(logt, logHt, pch=16) # Add plot of x=y line. If exponential fits, should be parallel. abline(-exp.kid$coef, 1, lwd=2, col="red")
Exponential • Another alternative model check • What about plotting –ln(S(x)) versus x? • Should be a straight line with • Slope = • Intercept = • Why would the previous be preferred? • It can accommodate Weibull as we will see….
More Model Checking • We will build likelihood later • For now, accept that the MLE of l is • Where diindicates whether the event is observed or censored for patient i, an ti is the event or censoring time • Here: • This implies a model such that S(x) =
What about specific survival time? Median survival? Mean survival? • Empirical: • 200 day survival = 21.0% • Median survival = 66 days • Mean survival = ? • Exponential Model: • 200 day survival = S(200) = ? • Median survival = ? • Mean survival = ?
Weibull • Generalization of the Exponential • VERY common for survival, but not always perfect • Shape and Scale parameters: a and l • Variable hazard • Increasing • Decreasing • Constant (a = 1)
Weibull: Generalization of Exponential • Shape Parameter: a • Scale Parameter: l • Note: There are different parameterizations for the Weibull
R Code for the Weibull Plot #Weibull time<-seq(0,60, 0.1) S1<-exp(-0.05*time^.5) S2<-exp(-0.05*time^1) S3<-exp(-0.01*time^0.5) S4<-exp(-0.01*time^1) plot(time, S1, xlab="Time", ylab="Survival Function", col=2, lwd=2, type="l", ylim=c(0,1)) lines(time, S2, col=1, lwd=2) lines(time, S3, col=3, lwd=2) lines(time, S4, col=4, lwd=2) labs<-c(expression(paste(lambda, " = ",0.05, ", ", alpha, " = ",0.5, sep="")), expression(paste(lambda, " = ",0.05, ", ", alpha, " = ",1, sep="")), expression(paste(lambda, " = ",0.01, ", ", alpha, " = ",0.5, sep="")), expression(paste(lambda, " = ",0.01, ", ", alpha, " = ",1, sep=""))) legend(x=0, y=.25, labs, col=c(2,1,3,4), lty=1, lwd=2,cex=0.9)
Weibull • Mean: • Median: • Model checking: • More later when we discuss likelihoods