Logistic Regression: An Analysis of Categorical DV and IV

Sociology 690 Multivariate Analysis Logistic Regression

The Analysis of Categories IV DV Category Quantity 1) Analysis of Variance Models (ANOVA) 2) Structural Equation Models (SEM) Quantity Linear Models Category 3) Log Linear Models (LLM) 4) Logistic Regression Models (LRM) Category Models

What is Logistic Regression • Logistic regression is typically used as an extension of multiple regression, particularly adapted to situations where the DV is categorical and IVs are continuous. • It is not, however, restricted to quantitative IV’s. In fact, to the extent the IVs are categorical themselves, logistic regression can be thought of as an extension of log linear modeling, where we are interested in differentiating the IVs and DV. • If the categorical DV is dichotomous (2-outcomes), it is called Binomial Logistic Regression. If the DV has more than two attributes, it is called Multinomial Logistic Regression. If the DV categories can be ranked, it is called ordinal logistic regression.

The Premise of Logistic Regression • Logistic Regression is similar to OLS regression with the exception that it is based on the IVs prediction of probabilities, odds, and the logarithm of the odds, for a categorical DV, rather than the prediction of specific values of a quantitative DV • For example, age and income become predictors of the “likelihood” of a dichotomous DV variable like union membership, rather than some quantitative variable, such as occupational prestige, which would be predicted in a multiple regression / path analysis.

Probability and Odds • Consider the following distribution of union membership for 650 respondents: members = 212; non-members = 438. • The “probability” of being a member would be simply the number of members (outcome of interest) expressed as a proportion to the total possible (e.g. P(M) = 212 / 650 =.326) • The “odds” would be the ratio of the probability P(x) to its compliment (1-P(x)). Using the example above, the odds of being a member would be P(M) / 1-P(M) = .326/.674 = .484.

The Logit of Logistic Regression • The index analyzed by logistic regression is the log of the odds. In our example, the odds were 0.484 and the log of the odds is ln(.484) = -.728. This is called a “logit” and is simply the natural logarithm of the odds of being in that category. • In our union membership example, we might want to know the effect a one unit change, in the value of age or income has on predicting union membership. (In logistic regression this odds ratio is symbolized by or Exp (B) in SPSS). It is defined as the ratio of the odds of being classified in one category of the DV for two different values of the IV.

The Formulae of Logistic Regression • Taking the definitions of probability, odds and logits into account, we produce a formula that is equivalent to a regression equation and is characterized by the value , where B1X1 + B2X2 +...BkXk = ln [ / 1- ]. • is a somewhat involved calculation based on the expected values of the odds ratios, but for us, let’s look at it as a number that gets us to where we can comment on the probability of observing one outcome or another, on the DV, given the best linear combination of IV predictors.

Example of Logistic Regression • Suppose we had a dichotomous dependent variable such as job satisfaction (satisfied vs. not satisfied), and wanted to know the ability of age and hours worked (as IVs) to predict the likelihood of being satisfied with one’s job, or not (i.e. to predict the likelihood of being in one category or the other). • This would be equivalent to a multiple regression analysis if job satisfaction were a continuous dependant variable. But it is not. Therefore, we use the binary logistic regression procedure to identify the equivalent of beta weights, the multiple R and residuals.

SPSS Input for Logistic Regression • In SPSS, this procedure is accessed through the menus ANALYSE, REGRESSION, BINARY LOGISTIC.

Output 1 for Logistic Regression There are two important pieces of output to review in assessing the effect of the IVs. The first is a classification table that uses the values of to generate predicted frequencies for each category of the DV. When compared to the observed frequencies, we can determine the percentage correct in using our IVs variables to predict DV outcomes

Output 2 for Logistic Regression The second output to be looked at is the table of coefficients. Here, it would show the beta weights for each variable and demonstrate that an incremental change in life satisfaction is marginally lower for each unit change in age and marginally higher for each unit change in hours worked the previous week. However, due to its lack of significance, age makes this a weak predictor IV.

Logistic Regression: An Analysis of Categorical DV and IV

Logistic Regression: An Analysis of Categorical DV and IV

Presentation Transcript

EECS 690

EECS 690

LBSC 690

EECS 690

Sociology 690 – Data Analysis

EECS 690

EECS 690

LBSC 690

EECS 690

EECS 690

Sociology 690

EECS 690

LBSC 690

Sociology 690

EECS 690

EECS 690