130 likes | 143 Views
Learn about logistic regression, odds, odds ratio, and the interpretation of the model in this comprehensive guide.
E N D
Chapter 13 Logistic regression
Odds The odds of an event A are calculated as the probability of observing event A divided by the probability of not observing event. For example, in humans an average of 51 boys are born in every 100 births, so the odds of a randomly chosen delivery being a boy are: If the probability of an event is 0.5, then the odds of that event is one. If the odds of an event are greater than one, then the event is more likely to happen than not (the odds of an event that is certain to happen is infinite); if the odds are less than one, the event is less likely to happen.
Odds ratio The odds ratio is the ratio of two odds from different groups or conditions. If we let p1 denote the probability of some event in one group and p2 denote the probability in another group, then is the odds ratio of A. An odds ratio of 1 means that some event is equally likely in both groups. An odds ratio greater than one implies that the event is more likely in the first group, while an odds ratio less than one implies that the event is less likely in the first group relative to the second group. The odds ratio measures the effect size between two binary data values.
Example: Risk of Avadex The increasing risk of the fungicide Avadex on pulmonary cancer in mice was studied. Sixteen male mice were continuously fed small concentrations of Avadex (the treatment population), while 79 male mice were given the usual diet (the control population). After 85 weeks, all animals were examined for tumors, with the results shown below:
Example: Risk of Avadex The odds of tumors in the Avadex group and the odds in control group are respectively and Thus, the odds ratio is The odds of having tumors are 4.9 times as great for Avadex mice as for control mice.
Logistic regression model The logistic regression model is defined by We can formulate the logistic regression model in the same way as we defined the multiple regression model. The model can handle both quantitative and categorical explanatory variables. Unlike in linear models, we see that there is no error term. This is because the uncertainty of binary response of some event is determined directly by the probability of the event. All estimates are on the log odds scale.
Logistic regression model We model the probability of success for observations, and that it is linear on the logit scale. This ensures that the probabilities pi are always between 0 and 1 regardless of the right-hand side of linear regression. It is possible to back-transform the results if we desire to present the results as actual probabilities: βj is the estimated additive effect on the log odds if variable xj is increased by one unit, whereas exp(βj) is the estimated change in odds.
Example: Lethality of insecticide In a study groups of 20 male and female moths were exposed to various doses of trans-cypermethrin in order to examine the lethality of the insecticide. After three days it was registered how many moths were dead or immobilized. Data are shown in the table below:
Example: Lethality of insecticide Here we will look only at male moths, and we would like to model the effect of dose on the proportion of moths that die. We use a logistic regression model, and state that logit of the probability of the i-th group that moth dies. We have two parameters in this model: α and β. The intercept α should be interpreted as log odds of a male moth dying if it is not exposed to the insecticide (ie, the dose is zero), and β is the increase in log odds of dying for every unit increase in dose.
Example: Lethality of insecticide We can calculate the probability of dying for each dose based on the estimated parameters. For dose 16, the probability of death is
Example: Lethality of insecticide We can also get the estimates and standard errors shown in the table below: Then we can compute the 95% confidence interval for α by Likewise, the 95% confidence interval for β is 0.29723 ±1.96 · 0.06254 = (0.1747, 0.4198).
Example: Lethality of insecticide We may test the hypotheses We reject the null hypothesis H0 and conclude that the log odds for moths that receive dose 0 are significantly different from zero. This means that the odds for moths that receive dose 0 are significantly different from 1. We also reject the null hypothesis H1 and conclude that there is a significant effect of the insecticide, and furthermore, that the proportion of dead moths increases with increasing dose.