190 likes | 359 Views
Introduction to logistic regression. When and why to use Logistic Regression?. The response variable has to be binary or ordinal. Predictors can be continuous, discrete, or combinations of variables. Predictors do not have to be normally distributed
E N D
When and why to use Logistic Regression? • The response variable has to be binary or ordinal. • Predictors can be continuous, discrete, or combinations of variables. • Predictors do not have to be normally distributed • Predictors does not have to be linearly related. • estimated probabilities lie between 0 and 1. • Non-linear relationships between the response and predictors
When and why to use Logistic Regression? • A non-parametric method that requires no specific distribution of the errors. • Offers easy model-building or variable selection procedures. • Parameter estimates are obtained by maximum likelihood methods
{ logit of y The Logistic Model • If (p) is the probability of the event , then odds of the event is : • The simple logistic model is based on a linear relationship between natural logarithm (ln) of the odds of an event and independent variables
The Logistic Model • Using the laws of exponents and logs to express (p) in terms of L : Logistic Function and :
The Logistic Model probability (s shaped curve) L=ln(o)
Basic interpretation of . • When x1 = x and x2 = x +1, then the log odds changes by amount • which means that, the odds becomes exp() times the original.
Data Form The data could be collected either : • Individually ( binary data ) • As a group (if there are more observations on each x value) In this case, it is sufficient to report the total number of ‘1’s at each x value .
Example 1: binary data OR P(mastitis)=1
OR compares the odds of an event for two cows, one with and the one with With X values 1 unit apart :
Odds ratios • Odds ratios range from 0 to positive infinity • O R < 1 = (P) < .50 , • OR > 1 = ( p ) > .50 .
Deviance • Measure of deviation between the estimated and observed values • analogous to SS residual for linear model
95% confidence limits ? 95% confidence interval around the odds ratio :
Summary of main points • Logistic regression model is used to analyze the association between a binary outcome and one or many determinants. • The determinants can be binary, categorical or continuous measurements • The model is logit (p) = log[p / (1-p)] = a + bX where X is a factor, and a and b must be estimated