320 likes | 487 Views
Session 10. Outline. Binary Logistic Regression Why? Theoretical and practical difficulties in using regular (continuous) dependent variables How? Minitab procedure Interpreting results Some diagnostics Making predictions Comparison with regular regression model. Logistic Regression.
E N D
Outline • Binary Logistic Regression • Why? • Theoretical and practical difficulties in using regular (continuous) dependent variables • How? • Minitab procedure • Interpreting results • Some diagnostics • Making predictions • Comparison with regular regression model Applied Regression -- Prof. Juran
Logistic Regression In our previous discussions of regression analysis, we have implicitly assumed that the dependent variable is continuous. We have learned some methods for operationalizing binary independent variables (using dummy variables), but have not discussed any method for dealing with categorical or binary dependent variables with regression analysis. (One non-regression method is discriminant analysis.) There are a number of tools available, but we will focus here on logistic regression. Applied Regression -- Prof. Juran
The basic idea: instead of predicting the exact value of the (binary) dependent variable, we will try to model the probability that the dependent variable takes on the value of 1. In English, is the probability that the dependent variable is 1, given a particular vector of values for the independent variables. Applied Regression -- Prof. Juran
Example: Rick Beck Consumer Credit Applied Regression -- Prof. Juran
Why not a normal multiple regression model? Applied Regression -- Prof. Juran
Here we have Since is an estimated probability, it shouldn’t go outside of the range from zero to one. But our regression equation is unbounded, and in this data set sometimes takes on illogical estimated values. Applied Regression -- Prof. Juran
We address this problem with a logistic response function: Applied Regression -- Prof. Juran
This sort of relationship will meet our criteria of keeping in the proper range. (Note: the cumulative normal distribution has a similar shape, and is the basis for the probit model.) What we need is a transformation of either X or such that the relationship is linear. This would enable us to use linear regression to create a model. Applied Regression -- Prof. Juran
Minitab Results Response Information Here we get the number of observations that fall into each of the two response categories. The response value that has been designated as the “reference event” is the first entry under Value and labeled as the event. In this case, the reference event is “being in default”. Applied Regression -- Prof. Juran
Logistic Regression Table: This shows the estimated coefficients (parameter estimates), standard error of the coefficients, z-values, p-values, the odds ratio, and a 95% confidence interval for the odds ratio. A rule of thumb: if the confidence interval for the odds ratio is entirely below 1, then the relative odds are decreased by this variable (e.g. children). Similarly, if the confidence interval for the odds ratio is entirely above 1, then the relative odds are increased by this variable (e.g. single). Applied Regression -- Prof. Juran
From the output, we can see that all five independent variables have p-values less than 0.05, indicating that there is sufficient evidence that the parameters are not zero using a significance level of 0.05. The coefficient of 0.9699 for Single represents the estimated change in the log of P(default)/P(not default) when the subject is single compared to when he/she is not single, with the other independent variables held constant. The coefficient of –0.019388 for Debt is the estimated change in the log of P(default)/P(not default) with a $1000 increase in Debt, with the other independent variables held constant. Applied Regression -- Prof. Juran
Assumptions in Logit Regression Applied Regression -- Prof. Juran
Minitab displays the last Log-Likelihood from the maximum likelihood iterations along with the statistic G. This statistic tests the null hypothesis that all the coefficients associated with predictors equal zero versus these coefficients not all being equal to zero. In this example, G = 283.811, with a p-value of 0.000, indicating that there is sufficient evidence that at least one of the coefficients is different from zero. The G test is analogous to the F test in regular regression. It can be used to determine whether any of the coefficients are significantly different from zero, or whether a full model is significantly better than a reduced model. Applied Regression -- Prof. Juran
This Table of Observed and Expected Frequencies allows us to see how well the model fits the data by comparing the observed and expected frequencies. There is evidence here that the model fits the data well, as the observed and expected frequencies are similar. This supports the conclusions made by the Goodness of Fit Tests. Applied Regression -- Prof. Juran
This table is calculated by pairing the observations with different response values. Here, you have 153 individuals in default and 847 not in default, resulting in 153 * 847 = 129,591 pairs with different response values. Based on the model, a pair is concordant if the individual in default has a higher probability of being in default, discordant if the opposite is true, and tied if the probabilities are equal. 90.2% of pairs are concordant, 9.7% are discordant, with 0.2% tied. Somers' D, Goodman-Kruskal Gamma, and Kendall's Tau are summaries of this table. These measures most likely lie between 0 and 1, where larger values indicate that the model has a better predictive ability. Applied Regression -- Prof. Juran
Making Predictions Applied Regression -- Prof. Juran
Summary • Binary Logistic Regression • Why? • Theoretical and practical difficulties in using regular (continuous) dependent variables • How? • Minitab procedure • Interpreting results • Some diagnostics • Making predictions • Comparison with regular regression model Applied Regression -- Prof. Juran
For Session 11 and 12 • Student presentations Applied Regression -- Prof. Juran