280 likes | 552 Views
LOGISTIC REGRESSION. A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate the effect of a risk on the occurrence of a disease event. Example: Framingham Heart Study Coronary heart disease and blood pressure.
E N D
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate the effect of a risk on the occurrence of a disease event. Example: Framingham Heart Study Coronary heart disease and blood pressure
LOGISTIC REGRESSION: AN EXAMPLE Event: Coronary Heart Disease Occurrence is the dependent variable, which takes 2 values: Yes or No. Risk factor: Blood pressure Systolic blood pressure is the independent variable X, a continuous measurement. The probability of getting coronary heart disease depends on blood pressure.
PROPORTION WITH CHD BY SBP GROUP Systolic BP RangeProportion 130-149 mmHg 0/3 0.00 150-169 mmHg 2/4 0.50 170-189 mmHg 3/3 1.00
LOGISTIC REGRESSION PROBABILITY MODEL 1 p(X) = ----------------------------- 1 + exp (- b0 - b1X) The probability of the event varies as an S-shaped function of the risk factor X: the logistic curve.
LOGISTIC CURVE MODEL: OCCURRENCE OF CHD AS A FUNCTION OF SBP
LOGISTIC MODEL: LOG ODDS p (X) log ----------- = b0 + b1X 1 - p (X) The log of the odds of the event is a linear function of X. Log(odds of CHD) = - 6.08 + 0.0243(SBP)
ODDS The odds of an event is the chance that the event occurs divided by the chance of its not occurring: Odds = p/(1 - p) = p/q
b1: KEY PARAMETER OF THE LOGISTIC MODEL p (X) log ----------- = b0 + b1X 1 - p (X) The parameter b1 is like the slope of a linear regression model. b1= 0 indicates that X has no effect on the probability, e.g., a man’s chance of CHD does not depend on his SBP.
b1: KEY PARAMETER p (X) log ----------- = b0 + b1X 1 - p (X) The coefficient b1 measures the amount of change in the log of the odds per unit change in X.
b1: KEY PARAMETER log odds(X+1) = b0 + b1(X+1) = b0 + b1X+ b1 log odds(X) = b0 + b1X Difference in log odds = b1 E.g., the log of the odds of getting CHD increases by 0.0243 for an increase of 1 mmHg of systolic blood pressure. (Hard to explain to a patient!)
THE COEFFICIENT b1AND THE ODDS RATIO Difference in log odds given by b1 translates into the odds ratio (OR). exp(b1) = OR = ratio of odds at risk level of X+1 to the odds when risk level is X b1 = 0 OR = 1.
THE COEFFICIENT $1AND THE ODDS RATIO For example, the odds of CHD are multiplied by the factor exp(0.0243) = 1.025 for every increase of 1 mmHg in SBP. A difference of 10 mmHg multiplies the odds of CHD by (1.025)10, or 1.275.
ESTIMATION OF THE PARAMETERS Technique: Maximum likelihood estimation For large sample sizes, the normal distribution is used to put a confidence interval around the estimate of the coefficient b1.
HYPOTHESIS TESTING Ho: b1 = 0 No difference in risk at different levels of the risk factor X. No association between risk factor X and probability of occurrence.
HYPOTHESIS TESTING Ha: b1 =/= 0 or b1 > 0 (risk increases with X) or b1 < 0 (risk goes down as X increases)
HYPOTHESIS TESTING Ho: OR = 1 Ha: OR =/= 1 or OR > 1 (risk increases with X) or OR < 1 (X is protective)
RESULTS OF LOGISTIC REGRESSION OR with confidence interval and p value indicate whether there is a significant association between level of the risk factor and chance of occurrence OR = 1.025 (1.015, 1.034), p < 0.001
RESULTS OF LOGISTIC REGRESSION Can be used to predict an individual’s risk: prob. of CHD when SBP = 180: p/q = exp{-6.082 + 0.0243(180)} Solve for p: prob. of CHD = 0.125
MULTIVARIATE LOGISTIC REGRESSION Model with additional risk factors: p (X) log ----------- = b0 + b1X + b2X 1 - p (X) Log(odds of CHD) = b0+ b1(SBP) + b2(CHOL) + b3(smoker)