400 likes | 474 Views
Logistic regression analysis. Martin van der Esch, PhD. Discovering statistics using SPSS Andy Field http://www.youtube.com/watch?v=OvQShzJ7Sns (part 1) http://www.youtube.com/watch?v=zdJhydkcqv4 (part 2) http://www.youtube.com/watch?v=hxcDOoupB4Y (part 3) etc.
E N D
Logistic regression analysis • Martin van der Esch, PhD
Discovering statistics using SPSS Andy Field • http://www.youtube.com/watch?v=OvQShzJ7Sns (part 1) • http://www.youtube.com/watch?v=zdJhydkcqv4 (part 2) • http://www.youtube.com/watch?v=hxcDOoupB4Y (part 3) • etc
Logistic regression analysis • The basic principle of logistic regression is much the same as in linear regression analysis • Aim is to predict a transformation of the dichotomized dependent variable • logit transformation
Step 1: simple linear regression equation for binary dependent variable: Step 2: formulate estimated probability of Y: Step 3: in logistic regression we use odds ratio for estimated probability: Steps to follow
Step 4: in case of skewed data (right sided):Logit transformation , makes log odds. Step 5: Different ways of presentation: estimated probability of p can be calculated from combination of variables Steps to follow 2
Binary instead of continuous outcome We are interested in a binary outcome measure For example; Heart attack Y = 0 (“no”) Y = 1 (“yes”)
… and we want But, how do we get there…?
Analysing a binary variable (Y) as if it was a continuous variable Not possible, because Y (heart attack) is no or yes (0 or 1)
Heart attack 1 0,8 0,6 0,4 0,2 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 age Possible… Relation between age and probable heart attack; p(y=1) Relation between age and probable heart attack; p(y=1)
use of logistic model • NO modelling of the dichotomous outcome event itself • model probability of the outcome event given a set of prognostic factors • probability (D=1 | X1,X2,…,Xn) • probability (death | man, 80 yrs, with hypertension, normal cholesterol level)
Estimated probability of outcome But, distribution of probability is skewed…
Logit(p) of outcome Logit transformation of proportion to remove skeweness!
Logit(p) of outcome a probability can be transformed into a number between minus infinity and infinity in two step • obtain the odds (2 out of 5 is sick: odds = 2/3) • take the natural logarithm The natural logarithm is the logaritm with the basic value e (e=2,71828…): 'elog' of 'ln'
Model • the ln(odds) of an event is modelled • the model is similar to the linear regression model
Model • It is far more easier to model along the whole number line, as in linear regression • from minus infinity to infinity • a probability is defined as being between 0 and 1
Solution: logit transformation (is linear in x) Logit
Outcome = natural logit of the odds on the outcome Model for Logistic Regression
Summary • Rewrite the outcome as a probility on the outcome • 2. Logit transformation: rewrite the outcome as a Ln(odds)
Model voor Logistic regression β’s (beta’s) estimated with Maximum Likelihood procedure
Logistic regression analysis • ‘Best’ line is calculated with ‘maximum likelihood procedure’ • Maximum likelihood: obtained by several repeated cycles of calculation
Example: Binary outcome (heart attack) and one binary predictor (smoking)
Ln(odds)infarct = -0.171 + 0,8 x Roken What is ß0 ? ß0 = ln(odds)heartattack non-smoker oddsheartattack non-smoker = EXP(ß0)
Ln(odds)heartattack = -171 + 0,8 x Roken ln(odds)smoking - ln(odds)non-smoking = ß0 + ß1 - ß0= ß1 ln[(odds)smoking/(odds)non-smoking]= ß1 ln (OR) = ß1 OR = EXP(ß1) = EXP(0,8) = 2,23 Interpretation?
Hypothesis testing: statistical difference between smokers and non-smokers • Wald toets • 95% CI of Odds Ratio • Likelihood-ratio-test (see M2-HC7 diagnosis)
Wald toets = (b/SE(b))2 Chi-square divided with one degree of freedom (0.7997 / 0.2454)2 = 10.6231
Example: Binary outcome (heart attack) and one binary predictor (smoking)
Testing the regression coefficient • Likelihoodratiotest: • -2log likelihood of the model with the determinant in comparison with the -2log likelihood of the model without the determinant • Difference is chi-square divided • The amount of df is the same as the difference between the variables between both models
Logistic regression with categorical predictor Analysis of three groups
Frequence of ‘recovery’ • recovery recovery • group yes no • medication1 35 65 • medication2 40 60 • placebo 20 80
What to do? • We analyse both medicationgroups with the placebogroup with dummy-variables
We are also able to analyse the relationship between continuous variable and binary outcome with logistic regression analysis
Logistic regression analysic with a continuous variable • Relation between age and pain(no/yes) • Effect size is odds ratio for the change of one unit of the determinant
Linearity check • Similar with linear regression analysis • No scatter plot, but histogram: • Adding a quadratic term and splitting exposure variable into groups. • Be careful: do not use OR, but !