290 likes | 525 Views
Logistic regression. Recall the simple linear regression model: y = b 0 + b 1 x + e. where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model:
E N D
Recall the simple linear regression model: y = b0+ b1x + e where we are trying to predict a continuous dependent variable y from a continuous independent variable x. This model can be extended to Multiple linear regression model: y = b0+ b1x1+ b2x2+ … + + bpxp+ e Here we are trying to predict a continuous dependent variable y from a several continuous dependent variables x1, x2, … , xp .
Now suppose the dependent variable y is binary. It takes on two values “Success” (1) or “Failure” (0) We are interested in predicting a y from a continuous dependent variable x. This is the situation in which Logistic Regression is used
Example We are interested how the success (y) of a new antibiotic cream is curing “acne problems” and how it depends on the amount (x) that is applied daily. The values of y are 1 (Success) or 0 (Failure). The values of x range over a continuum
The logisitic Regression Model Let p denote P[y = 1] = P[Success]. This quantity will increase with the value of x. is called the odds ratio The ratio: This quantity will also increase with the value of x, ranging from zero to infinity. The quantity: is called the log odds ratio
Example: odds ratio, log odds ratio Suppose a die is rolled: Success = “roll a six”, p = 1/6 The odds ratio The log odds ratio
The logisitic Regression Model Assumes the log odds ratiois linearly related to x. i. e. : In terms of the odds ratio
The logisitic Regression Model Solving for p in terms x. or
Interpretation of the parameter b0(determines the intercept) p x
Interpretation of the parameter b1(determines when p is 0.50 (along with b0)) p when x
Also when is the rate of increase in p with respect to x when p = 0.50
Interpretation of the parameter b1(determines slope when p is 0.50 ) p x
The data The data will for each case consist of • a value for x, the continuous independent variable • a value for y (1 or 0) (Success or Failure) Total of n = 250 cases
Estimation of the parameters The parameters are estimated by Maximum Likelihood estimation and require a statistical package such as SPSS
Using SPSS to perform Logistic regression Open the data file:
Choose from the menu: Analyze -> Regression -> Binary Logistic
The following dialogue box appears Select the dependent variable (y) and the independent variable (x) (covariate). Press OK.
Here is the output The Estimates and their S.E.
Interpretation of the parameter b0(determines the intercept) Interpretation of the parameter b1(determines when p is 0.50 (along with b0))
Another interpretation of the parameter b1 is the rate of increase in p with respect to x when p = 0.50
Here we attempt to predict the outcome of a binary response variable Y from several independent variables X1, X2 , … etc
Multiple Logistic Regression an example In this example we are interested in determining the risk of infants (who were born prematurely) of developing BPD (bronchopulmonary dysplasia) More specifically we are interested in developing a predictive model which will determine the probability of developing BPD from X1 = gestational Age and X2 = Birthweight
For n = 223 infants in prenatal ward the following measurements were determined • X1 = gestational Age (weeks), • X2 = Birth weight (grams) and • Y = presence of BPD