340 likes | 634 Views
Lecture 21: poisson regression log-linear regression. BMTRY 701 Biostatistical Methods II. Poisson distribution. Used for count data generally, rare events in space or time upper limit is theoretically infinite Examples: earthquakes, hurricanes cancer incidence (spatial)
E N D
Lecture 21: poisson regression log-linear regression BMTRY 701 Biostatistical Methods II
Poisson distribution • Used for count data • generally, rare events • in space or time • upper limit is theoretically infinite • Examples: • earthquakes, hurricanes • cancer incidence (spatial) • absences in school year • AIDS deaths in a region • Assessing disease in different groups: • Probability, Risk, Rate, Incidence, Prevalence
The Poisson distribution • Probability mass function • Approximates a binomial for rare event • Notice it has only ONE parameter: λ • Mean = variance = λ
Simple poisson distribution example • The infection rate at a Neonatal Intensive Care Unit (NICU) is typically expressed as a number of infections per patient days. This is obviously counting a number of events across both time and patients. • assume that the probability of getting an infection over a short time period is proportional to the length of the time period. In other words, a patient who stays one hour in the NICU has twice the risk of a single infection as a patient who stays 30 minutes. • assume that for a small enough interval, the probability of getting two infections is negligible. • assume that the probability of infection does not change over time or over infants. • assume independence. • The probability of seeing an infection in one child does not increase or decrease the probability of seeing an infection in another child. • If an infant gets an infection during one time interval, it doesn't change the probability that he or she will get another infection during a later time interval.
Poisson regression • Based on the idea that the log of probability of disease is a linear function of risk factors • The rate ratio (“relative risk”) is modeled • Interpretation of slope:
Implementation • riis the rate • Often we observe • a number of events • a geographic region, time, or number of person-years • Need to account for these differences • rates based on smaller “exposure” are less precise • adjustment is made
Implementation • Unless there is uniform time, space, etc., the following is generally implemented: “OFFSET”
Offset term • Notice: NO COEFFICIENT on offset • Adjusts for population size or space • Example: breast cancer incidence per county in south carolina • cases are the number of women (& men) diagnosed within in a county in SC in one year. • the offset would be the population size in the county in the year (probably estimated)
Caveat • Standard poisson regression relies on poisson assumption about the variance • If events tend to occur in clusters, than there is “overdispersion” • This leads to a more general form of model: log-linear model (later)
Example: Catheter-Related Bloodstream Infections in the ICU (Critical Care Medicine, 2004). • Objective: To determine whether a multi-facted systems intervention would eliminate catheter-related bloodstream infections (CR-BSIs) • Design: prospective cohort in surgical ICU at JHU including all patients with central venous catheter in ICU. • Two ICUs • Interventions: • educating staff • creating catheter insertion cart • asking providers daily if catheters could be removed • implementing checklist to ensure adherence to guidelines • empowering nurses to stop catheter insertion if violation of guidelines was observed.
Example: Catheter-Related Bloodstream Infections in the ICU (Critical Care Medicine, 2004). • Analysis • Poisson regression • Outcome is rate of CR-BSIs • Data structure • number of infections per quarter in ICU • number of catheter days (counting every patient who has catheter at 12am each day). Patients each counted only once • indicator of control vs. intervention ICU • Intervention not implemented until 1st quarter 1999.
Dataset . list +-------------------------------------------------------------+ | quarter ncase cathdays rate dataset quartern | |-------------------------------------------------------------| 1. | Qtr1-98 6 1057 5.68 1 1 | 2. | Qtr2-98 4 1018 3.93 1 2 | 3. | Qtr3-98 10 899 11.12 1 3 | 4. | Qtr4-98 8 952 8.4 1 4 | 5. | Qtr1-99 3 952 3.15 1 5 | |-------------------------------------------------------------| 6. | Qtr2-99 10 939 10.65 1 6 | 7. | Qtr3-99 5 1045 4.78 1 7 | 8. | Qtr4-99 9 927 9.71 1 8 | 9. | Qtr1-00 7 1060 6.6 1 9 | 10. | Qtr2-00 7 1094 6.4 1 10 | |-------------------------------------------------------------| 11. | Qtr3-00 5 850 5.88 1 11 | 12. | Qtr4-00 10 822 12.17 1 12 | 13. | Qtr1-01 11 868 12.67 1 13 | 14. | Qtr2-01 4 830 4.82 1 14 | 15. | Qtr3-01 4 603 6.63 1 15 | |-------------------------------------------------------------| 16. | Qtr4-01 5 551 9.07 1 16 |
R code data <- read.csv("csicu7.csv") plot(data$quartern, data$rate, xlab="Quarter", ylab="Rate of Infection per 1000 catheter days", pch=16) points(data$quartern[data$dataset==1], data$rate[data$dataset==1], pch=16, col=2) lines(data$quartern[data$dataset==0], data$rate[data$dataset==0], col=1) lines(data$quartern[data$dataset==1], data$rate[data$dataset==1], col=2) legend(12,22, c("Intervention ICU","Control ICU"), col=c(1,2), pch=c(16,16)) abline(v=5, lty=3)
Estimating the Poisson regression • Want to model change in rates • However, the first 4 quarters there was no intervention. • Based on the observed data and on the data structure, what model is appropriate?
Poisson regression model What is the model for • IV=0 and quarter<5? • IV=0 and quarter≥5? • IV=1 and quarter<5? • IV=1 and quarter≥5?
R code ncase <- data$ncase cathdays <- data$cathdays control <- data$dataset intervention <- 1- control quartern <- data$quartern # create knot for spline model k1 <- ifelse(quartern>5,quartern-5,0) # FIT MODEL WITH INTERACTIONS WITH TIME FOR BOTH GROUPS reg <- glm(ncase~intervention*quartern+ intervention*k1, family=poisson, offset=log(cathdays)) summary(reg)
Results Call: glm(formula = ncase ~ intervention * quartern + intervention * k1, family = poisson, offset = log(cathdays)) Deviance Residuals: Min 1Q Median 3Q Max -3.6005 -0.8439 -0.2368 0.6349 2.4233 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.20386 0.37944 -13.715 <2e-16 *** intervention 0.73339 0.45986 1.595 0.111 quartern 0.07517 0.09148 0.822 0.411 k1 -0.08774 0.10365 -0.847 0.397 intervention:quartern -0.02874 0.11302 -0.254 0.799 intervention:k1 -0.08355 0.13080 -0.639 0.523 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 108.489 on 39 degrees of freedom Residual deviance: 61.317 on 34 degrees of freedom AIC: 213.76
R code fit.early.0 <- b[1] + b[3]*seq(1,5,1) fit.late.0 <- (b[1]-b[4]*5) + (b[3]+b[4])*seq(5,20,1) fit.early.1 <- (b[1]+b[2]) + (b[3]+b[5])*seq(1,5,1) fit.late.1 <- (b[1]+b[2]-b[4]*5-b[6]*5) + (b[3]+b[4]+b[5]+b[6])*seq(5,20,1) fit.early.0 rate.early.0 <- exp(fit.early.0)*1000 rate.early.0 rate.early.1 <- exp(fit.early.1)*1000 rate.late.0 <- exp(fit.late.0)*1000 rate.late.1 <- exp(fit.late.1)*1000 # add lines to plot for fitted control ICU lines(seq(1,5,1), rate.early.0, col=2) lines(seq(5,20,1), rate.late.0, col=2) # add lines to plot for fitted intervention ICU lines(seq(1,5,1), rate.early.1, col=1) lines(seq(5,20,1), rate.late.1, col=1)
Real question • Is the change in infection rates different in the two ICUs? • That is, are the slopes after Q5 different? • How to test that: • slope in control ICU: β3 + β4 • slope in intervention ICU: β3 + β4 + β5 + β6 • What is the hypothesis test?
Linear Combination of Coefficients > estimable(reg, c(0,0,0,0,1,1)) Estimate Std. Error X^2 value DF Pr(>|X^2|) (0 0 0 0 1 1) -0.1122858 0.03091206 13.19452 1 0.0002807688
Example: Breast Cancer Incidence in SC • Cunningham et al. • Hypothesize that there are differences in subtypes of breast cancer by race • ER + vs. ER- • Grades 1, 2, 3 • Stage 1, 2, 3, 4 • Incidence of breast cancer varies by age • Data: • Tumor registry data for SC (and Ohio) • Census data for SC
Poisson modeling • Rate of incidence per cancer type • Modeled as a function of ER, grade and race > summary(reg1) Call: glm(formula = nc ~ age + age2 + age3 + bl + er + gr + age * bl + age2 * bl + age3 * bl + age * er + age2 * er + age3 * er + age * gr + age2 * gr + age3 * gr + bl * er + bl * gr + er * gr, family = poisson, offset = log(9 * popn))