1.06k likes | 2.7k Views
Poisson Regression Analysis. Analysis of rates and/or counts. Poisson Regression Analysis Data structure. Thus far we have discussed models for continuous data and for binary data. Poisson Regression Analysis Data structure.
E N D
Poisson Regression Analysis Analysis of rates and/or counts
Poisson Regression AnalysisData structure Thus far we have discussed models for continuous data and for binary data.
Poisson Regression AnalysisData structure • Now suppose that instead of binary data what we observe are counts? • Some examples of count data include: • the number of COPD hospitalizations in an elderly population • the number URIs in a cohort of young children • the number of asthma attacks in an occupational cohort • While these examples imply we have observations on each individual, we might just have aggregate data, such as the vital statistics data on the total number of incident TB cases in some population.
Poisson Regression AnalysisData structure • A common feature of count data is that we typically have not only counts, but also some period of observation over which these counts occurred. • # months of follow-up in the hospitalization study • Person-years of observation in the TB incidence study As a result, interest is centered not so much on the absolute number of events as on the rate of occurrence of these events.
Poisson Regression AnalysisThe Poisson model Poisson regression analysis is a useful tool for the analysis of such data. It derives its name from the Poisson distribution, which is a mathematical distribution often used to describe the probability of occurrence of count data.
Poisson Regression AnalysisThe Poisson model Suppose that Y is our outcome variable (e.g., the number of TB cases), and that l is the rate of occurrence per unit of time (e.g., # TB cases/10,000 popn/year). The Poisson model may be written as where V is the number of time units (e.g., person-years). Think of “V” for “volume.”
Poisson Regression AnalysisThe Poisson model In this model E(Y) = lV, and we typically assume ln[E(Y)] = ln(lV) = ln(l) + ln(V) = b0 + b1X1+ … +bkXk+ ln(V) . That is, we assume ln(l) = b0 + b1X1+ … +bkXk .
Poisson Regression AnalysisInterpretation of coefficients If ln(l) = b0 + b1Age +b2Male , then RR (male vs female) = = =>b2 = ln(RR males vs females )
Poisson Regression AnalysisThe offset term ln[E(Y)]) = ln(lV) = ln(l) + ln(V) = b0 + b1X1+ … +bkXk+ ln(V) . In order to derive the proper estimates for the βs, we need to remember to tell our stat programs about the ln(V) term, which we typically refer to as the “offset” term. As we’ll see, how you specify the offset can vary from software package to software package, and even within a package from one stats module to another.
Poisson Regression AnalysisThe offset term How we express V determines the units in which our rate is expressed. For example, if we observe 250 cases of TB based on 100,000 person-years of observation, we could express this either as Rate1 = 250 cases/100,000 person-years orRate2 = 25 cases/10,000 person-years orRate3 = .0025 cases/person-year If V is originally recorded in total person-years, then how we transform it determines which rate we estimate. V*=V/100,000 V*=V/10,000 V*=V
Poisson Regression Application Courtesy of Dr. Beth Soares Background • TB rates rising in Rio favelas • introduced community DOTS program in June, 2003 • wanted to test impact on TB incidence • Data • observed number of new cases over time • linked to population denominator data
Poisson Regression Application The data Pre-intervention Post-intervention
Poisson Regression Application The model Want to model piecewise continuous linear trends pre and post intervention. For each period, need an equation of the form: ln(l)= b0 + b1Time , with the constraint that they are equal when time = 7 (ie., the last period in pre-phase). Any bright ideas?
a0 a1 Poisson Regression Application The model Consider the model ln(l) = b0 + b1Time+b2(Time-7)*Exp During the pre-intervention period (Exp=0), this has the value ln(l) = b0 + b1Time , which is a linear function of time with slope =b1. During the post-intervention period (Exp=1), this has the value ln(l) = b0 + b1Time +b2(Time-7) = b0 - 7b2+ (b1+b2)(Time) , which again is a linear function of time, but with slope =b1+ b2.
Poisson Regression Application The model Are the two equations equal when Time = 7? Pre:ln(l) = b0 + b1Time = b0 + 7b1(when t = 7) Post:ln(l) = b0 + b1Time +b2(Time-7) = b0 + 7b1(when t = 7) So our model also satisfies our continuity criteria!
Poisson Regression Application The model Rewriting the post-intervention model as ln(l) = (b0 + 7b1)+ (b1+b2)(Time-7) , we see that the ln(rate) changes by (b1+b2)for every 6-month time period after the start of the intervention program. The intervention effect, however, is given byb2. This is perhaps best seen by considering the original equation:ln(l) = b0 + b1Time +b2(Time-7) . The original secular trend, indicated byb1 Time, is assumed tocontinue, and we overlay on that the intervention effect ofb2per unit of time, for an observed trend of (b1+b2) Time.
Poisson Regression Application Regression results from Stata poisson cases time tpost,off(lnpopn) [tpost=(time-7)*Exp; lnpopn=ln(popn/10,000)] Poisson regression Number of obs = 12 LR chi2(2) = 16.68 Prob > chi2 = 0.0002 Log likelihood = -45.914885 Pseudo R2 = 0.1537 ------------------------------------------------------------------------------ cases | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- time | .0395974 .0132241 2.99 0.003 .0136787 .0655162 tpost | -.1033458 .0259482 -3.98 0.000 -.1542034 -.0524882 _cons | 3.192765 .0660234 48.36 0.000 3.063362 3.322169 lnpopn | (offset) ------------------------------------------------------------------------------ • The sig positive coefficient for ‘time’ suggests an upward trend in TB incidence rates prior to the start of the intervention. • The sig negative coefficient for ‘tpost’ suggests a downward effect of the intervention on TB incidence rates. • The ‘net’ effect post intervention, 0.040-0.103, => observed downward trend.
Poisson Regression Application A reality test … checking the fit } Intervention Effect A = observed rates B = fitted rates
Simple Linear RegressionRio TB data example • ln(RRs) are fine, but we want to be able to describe as well the pre- and post-intervention trends in absolute case rates on a linear scale. • Still use earlier model to characterize significance of terms, but use this new model to talk about the trends in terms people might better understand.
Simple Linear RegressionRio TB data example glm obs_rate time tpost [this is just simple linear regression with obs_rate = cases/(popn/10,000)] Generalized linear models No. of obs = 12 Optimization : ML Residual df = 9 Scale parameter = 4.366374 Gaussian (normal) distribution, identity link ------------------------------------------------------------------------------ obs_rate | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- time | 1.105171 .351993 3.14 0.002 .4152775 1.795065 tpost | -2.84941 .699114 -4.08 0.000 -4.219649 -1.479172 _cons | 24.18027 1.700288 14.22 0.000 20.84776 27.51277 ------------------------------------------------------------------------------ • We still see a sig upward trend, which now interpret as, on average, in increase of 1.1 cases per 10,000 per 6 months during the pre-intervention period. • We also see that the intervention effect equates to a reduction in incident case rates of 2.8/10,000 popn/6-mos • The observed case rate post interv is 1.1-2.8 = -1.7/10K/6mos
Obs. Trend = 1.1cases/10K/6mos } Interv effect = -2.8cases/10K Obs. Trend = -1.7cases/10K/6mos
Poisson Regression Application A word about the offset parameter Recall that, in the Poisson model, ln[E(Y)]) = ln(lV)= ln(l) + ln(V) = b0 + b1X1+ … +bkXk+ ln(V) , and the term ln(V) is referred to as an“offset”. Be sure you know how your regression package expects this term to be specified. I naively assumed Stata would know to take a ln, and so initially specified“offset(popn)”rather than“offset(lnpopn)”. I got wacky output and only realized the coefficients weren’t what I thought they were because the sign on the coefficients didn’t make any sense.
Poisson Regression Application A word about the offset parameter Note that the ‘time’ coefficient is now negative, even though the fitted values from this model looked almost identical to the previous model. My only clue was that a negative coefficient made no sense if the coefficient meant what I thought it did! BEWARE BLACK BOX STATISTICS!!!