600 likes | 852 Views
Exact Logistic Regression. Larry Cook. Outline. Review the logistic regression model Explore an example where model assumptions fail Brief algebraic interlude Explore an example with a different issue where logistic regression fails Computational considerations Example SAS code.
E N D
Exact Logistic Regression Larry Cook
Outline • Review the logistic regression model • Explore an example where model assumptions fail • Brief algebraic interlude • Explore an example with a different issue where logistic regression fails • Computational considerations • Example SAS code
Logistic Regression • Model a binary outcome, Y, with one or more predictors • Success/failure • Disease/not disease • Model outcome in terms of the log odds of a success • log(odds of Yi) = a + bxi + e
Why Log Odds? • Canonical link function • Makes a binary outcome continuous • Solves this problem • Probability is constrained to [0,1] • Odds are constrained to [0, ∞) • Log odds are in (-∞, ∞) • Exponentiating coefficients gives us estimates of odds ratios
Example: Motor Vehicle Crash Fatalities • What are odds of being hospitalized or killed in a motor vehicle crash for drivers using safety restraints vs. those that are not? • Outcome: Hospitalized/killed or not • Covariate: safety belt use
Hospital/Killed * Restraint Use OR = 0.22, p-value < 0.001
Example: Motor Vehicle Crash Fatalities • What are odds of being hospitalized or killed in a motor vehicle crash for drivers using safety restraints vs. those that are not? • Outcome: Hospitalized/killed or not • Covariate: safety belt use gender, age, alcohol, rural area
Assumptions • Conditional probabilities follow a logistic function of the independent variables • Observations are independent • Asymptotics • Sample size is large enough • Minimum of 50 to 100 observations • 10 successes/failures per variable
Corneal Graft Rejections • What if studying a rare disease? • Data for eight kids in young age group and eight in the older age group • Hypothesis is that rejection is more likely in older children
Graft Rejections OR = 21, p-value = 0.012, 100% of cell have expected counts < 5!!! Fisher’s Exact Test p-value (2-sided) = 0.0406; (1-sided) = 0.0203
Let’s Tackle the Graft Rejection Example as Logistic Regression
Graft Rejections Sample Size << 50! Don’t have 10 success or 10 failures!
Exact (Conditional)Logistic Regression • Rather than using the unconditional logistic regression, we will condition on nuisance parameters • Use conditional maximum likelihood for estimation and inference
Warning Algebra Ahead Proceed with Caution
Conditioning • If we are only trying to describe the relationship between rejection and age, do we care about the value of the intercept? • Remove the intercept, a, out of the likelihood by conditioning on its sufficient statistic, t0 = Syi. • Let S(to) = Set of all tables with Syi = t0and observed sample sizes
End of Algebra Back to Example
Graft Rejections Sufficient Statistics t0 = Syi = # of rejections = 7 t1 = Sxiyi = 0*# of rejections in young + 1*# of rejections in old = 0*1 + 1*6 = 6
Conditional Distribution for Graft Rejection • Need to calculate all possible tables that have exactly 7 rejections • Calculate how often each of the tables occur • Calculate CMLE • Calculate how rare our table is to obtain p-value
Confidence Interval • Lower Bound, b- • If t1 = t1,min • b- = -∞ • Otherwise • b- is the value of b that produces an upper p-value of a/2 • Upper Bound, b+ • If t1 = t1,max • b+ = ∞ • Otherwise • b+ is the value of b that produces a lower p-value of a/2
Example 2 PECARN C-Spine Study
Case Control Study Any problems estimating the odds ratio? Could exact logistic regression help?
What sufficient statisticsare needed? • Sy = 2 • Sxy = 0
Conditional Density One-sided p-value = 0.438 Two-sided p-value = 2*0.438 = 0.876 95% confidence interval (-∞, 2.345) Point estimate?
One More Example Dose Response
Toxicology Experiment • 400 mice randomized to one of four levels of a drug • Drug administered to each animal • Outcome is the number of deaths in each dose level Sy = 19 Sxy = 3 + 10 + 30 = 43
Exact vs. Unconditional Exact Unconditional Estimate = 0.712 SE = 0.246 OR = 2.04 CI = (1.26, 3.30) p-value = 0.004 • Estimate = 0.710 • SE = 0.246 • OR = 2.03 • CI = (1.26, 3.52) • p-value = 0.002
Counting All the Tables • One of the main hurdles for conditional logistic regression is counting all the tables in the sample space • Graft rejections – 11,440 possibilities • PECARN C-Spine - 1,277,601 • Toxicology – 2.79 x 1033 • Obviously don’t want to generate tables one at a time
Network Algorithm • Graphical representation of the sample space • Nodes represent a partial sum of the sufficient statistic • Arcs have combinatorial weighting value • One path through the graph represents a table in the sample space
Example Sufficient Statistics t0 = Syi = 4 t1 = Sxiyi = 1*0 + 2*1 + 3*1 + 4*2 = 13
What About Multiple Covariates? More Conditioning!
Osteogtenic SarcomaLogXact Manual • 46 patients surgically treated for osteogenic sarcoma and then observed for disease recurrence within 3 years • Covariates • Sex: Male = 1, Female = 0 • Any Ostoid Pathology (AOP) • Present = 1, not = 0 • Interested in the effect of AOP
Estimating the Effect of AOP • New statistics to condition • Group sizes • Sufficient statistic for intercept, Sy = 17 • Sufficient statistic for coefficient for sex, Sx1y = 15 • Calculate the conditional distribution of Sx2y • Sufficient statistic for coefficient for AOP • Number of cases with AOP in recurrence (=13) • Given exactly 17 with recurrence 15 of which are males