360 likes | 673 Views
Logistic regression for binary response variables. Space shuttle example. n = 24 space shuttle launches prior to Challenger disaster on January 27, 1986 Response y is an indicator variable y = 1 if O-ring failures during launch y = 0 if no O-ring failures during launch
E N D
Space shuttle example • n = 24 space shuttle launches prior to Challenger disaster on January 27, 1986 • Response y is an indicator variable • y = 1 if O-ring failures during launch • y = 0 if no O-ring failures during launch • Predictor x1 is launch temperature, in degrees Fahrenheit
If there are 20% smokers and 80% non-smokers, and Yi= 1, if smoker and 0, if non-smoker, then: If pi = P (Yi= 1) and 1 – pi = P (Yi = 0), then: The mean of a binary response
Then, the mean response … … is the probability that Yi = 1 when the level of the predictor variable is xi. A linear regression model for a binary response If the simple linear regression model is: for Yi = 0, 1
Alternative formulation of (simple) logistic regression function (algebra) “logit”
If pi = P (Yi= 1) and 1 – pi = P (Yi = 0), then: and Odds If there are 20% smokers and 80% non-smokers: and “Odds are 4 to 1” … 4 non-smokers to 1 smoker.
Odds ratio MALE: 20% smokers and 80% non-smokers: FEMALE: 40% smokers and 60% non-smokers: The odds that a male is a nonsmoker is 2.67 times the odds that a female is a nonsmoker.
Odds ratio Group 2 Group 1 The odds ratio
Predicted odds at x1 = 55 degrees: Predicted odds at x1 = 80 degrees: Space shuttle example Predicted odds:
Space shuttle example Predicted odds ratio for x1 = 55 relative to x1 = 80: The odds of O-ring failure at 55 degrees Fahrenheit is 76 times the odds of O-ring failure at 80 degrees Fahrenheit!
Interpretation of slope coefficients The ratio of the odds at X1 = A relative to the odds at X1 = B (for fixed values of other X’s) is:
Maximum likelihood estimation • Choose as estimates of the parameters the values that assign the highest probability to (“maximize likelihood of”) the observed outcome.
For first observation, Y1 = 1 and x1 = 53: … for second observation, Y2 = 1 and x2 = 56: … and for 24th observation, Y24 = 0 and x24 = 81: Suppose
The likelihood of the observed outcome is: The log likelihood of the observed outcome is: If α = 10 and β = -0.15, what is the probability of observed outcome?
Maximum likelihood estimation • Choose as estimates of the parameters the values that assign the highest probability to (“maximize likelihood of”) the observed outcome.
For first observation, Y1 = 1 and x1 = 53: … for second observation, Y2 = 1 and x2 = 56: … and for 24th observation, Y24 = 0 and x24 = 81: Suppose
The likelihood of the observed outcome is: The log likelihood of the observed outcome is: If α = 10.8 and β = -0.17, what is the probability of observed outcome?
Space shuttle example Link Function: Logit Response Information Variable Value Count failure 1 7 (Event) 0 17 Total 24 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 10.875 5.703 1.91 0.057 temp -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99
Properties of MLEs • If a model is correct and the sample size is large enough: • MLEs are essentially unbiased. • Formulas exist for estimating the standard errors of the estimators. • The estimators are about as precise as any nearly unbiased estimators. • MLEs are approximately normally distributed.
Confidence interval: Inference for βj follows approximate standard normal distribution. Test statistic:
Space shuttle example Link Function: Logit Response Information Variable Value Count failure 1 7 (Event) 0 17 Total 24 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 10.875 5.703 1.91 0.057 temp -0.17132 0.08344 -2.05 0.040 0.84 0.72 0.99
Space shuttle example • There is sufficient evidence, at the α = 0.05 level, to conclude that temperature is related to the probability of O-ring failure. • For every 1-degree increase in temperature, the odds ratio of O-ring failure to O-ring non-failure is estimated to be 0.84 (95% CI is 0.72 to 0.99).
Survival in the Donner Party • In 1846, Donner and Reed families traveled from Illinois to California by covered wagon. • Group became stranded in eastern Sierra Nevada mountains when hit by heavy snow. • 40 of 87 members died from famine and exposure. • Are females better able to withstand harsh conditions than are males?
Survival in the Donner Party Link Function: Logit Response Information Variable Value Count STATUS SURVIVED 20 (Event) DIED 25 Total 45 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 1.633 1.110 1.47 0.141 AGE -0.07820 0.03729 -2.10 0.036 0.92 0.86 0.99 Gender 1.5973 0.7555 2.11 0.034 4.94 1.12 21.72