430 likes | 721 Views
Lecture 14: Logistic regression. risk. 30. 30. 2 2 table (contingency table). Depression. No depression. 10. 20. Divorce . 1. 29. No divorce. a+b. c+d. 2 2 table (contingency table). Disease. Healthy. a. b. Exposed. c. d. Not-exposed. Risk ( absolute).
E N D
risk 30 30 22 table (contingency table) Depression No depression 10 20 Divorce 1 29 No divorce
a+b c+d 22 table (contingency table) Disease Healthy a b Exposed c d Not-exposed
Risk (absolute) • Proportion of individuals initially healthy who contracted the disease during a given periode • The observation period must be the same for everyone • Given as a proportion (0-1), or a percentage (0-100%) • Estimation of the risk • For exposed • For non-exposed
Relative risk • Ratio of the risk for the exposed to the risk for the non-exposed (no unit) • Interpretation • RR>1: exposition increases the risk • RR=1: exposition does not modify the risk • RR<1: exposition decreases the risk • When RR<1, relative risk reduction (in %)may be presented RR = Re / Rne RRR = (1-RR)*100
Caveat • Relative risk does not provide any information on the importance of the compared risks • RR = 3 peut correspondre à: • Re = 45% Rne = 15% • Re = 3% Rne = 1% • Re = 0.006% Rne = 0.002% • …
Risk difference • Difference between the risks of the exposed and non-exposed individuals • Given as a proportion or a percentage • Interpretation • DR>0: exposition increases the risk • DR=0: exposition does not modify the risk • DR<0: exposition decreases the risk DR = Re – Rne
H0: Re = Rne 2 test Test and estimation • Null hypothesis: no effect of exposition: • RR = 1 • DR = 0 • It is also possible to compute a standard-error for each of these statistics, and to obtain a confidence interval
Prevalence of smoking in an freshman class Prevalence • Proportion of individuals having a disease (or presenting a characteristic) at a given moment • Moment is defined by • Date (eg: 19 october 2011) • Event (eg: at birth)
Incidence rate • Risk of contracting a disease during one time unit • Numerator: new cases • Denominator: sum of the time-person at risk • Person-years • Person-days • …
Comparison with prospective study: case control study… • Allows to studyvery rare conditions (e.g., autism, suicide) • Can be made more quickly • Requiresless observations (cost) • Allows to test severalriskfactors for the outcome • But… • Does not allow to compute a risk, or an incidence rate (question: why) • Risk of bias in the measure of risk factor • Difficult to chooseappropriate control • Can onlystudy one condition at a time
We take only a sample We take everyone Case control study Case Control The sums: a+b and c+d have no sense! a b Exposed Impossible to compute risk and relative risk c d Non-exposed
Solution: odds ratio • Transform a proportion in odds • Transform odds in proportion:
denominator numerator b/d a/c exposition odds Cas control study Case Control a b Exposed c d Non-exposed
Disease odds a/b c/d b/d a/c Exposition odds Prospective study Disease Healthy a b Exposed c d Non-exposed
odds ratio Disease odds Exposition odds Property of odds ratio • Exposition odds ratio = disease odds ratio Odds ratio are the same, computed from a prospective study or a case control study
Odds ratio and relative risk • When the condition is rare, (a<<b et c<<d), OR is approximately equal to RR
573 patients with facial clefts • 763 controls • Exposition: • Taking more than 400 mg of folic acid supplements BMJ 2007; 334:464-470
Results Odds ratio for folic acid < 400 mg
Why not compute the RR? • The prevalenceis not correct since the goal of the case-control design is to have as many cases as controls. • Thus, a/(a+b)does not makesense.
Interpretation of odds ratio • Similar to relative risk: • OR>1: IV is associated with DV (e.g., exposition is associated with disease) • OR=1: IV is not associated with DV • OR<1: IV is negatively associated with DV • Folicacid and facial cleft: • Not takingfolicacidsupplements(>400 mg)increases the risk of facial cleft (by 40%)
Continuous IV • Compare means of cases and controls (t test) • Divide the IV in several categories and compute an odds ratio for each category • Model:
Model: linear regression Proportion of males >100% male Distribution of the residuals [y - (a+bx)] is not normal female This is not the right method! Proportion of females <0%
Logistic regression • The DV must be transformed • Y: probability that sex=1 (female), instead of 0 (male) • “logit” odds!
Interpretation of « b » • In linear regression: • b: mean change of Y expected for an increase of one unit of X • By analogy, logistic regression: • b: mean change of logit(Y=1) expected for an increase of one unit of X eb= odds ratio ofY for one unit of X
Odds ratio and logistic regression coefficients • Dependent variable: Y = 1 (case) or = 0 (control) • Independent variable: X = 1 (exposed) or = 0 (non-exposed) • Model: logit(y) = a + bx • Equation among exposed: logit(yexp) = a + b*1 = a + b • Equation among non-exposed: logit(ynon-exp) = a + b*0 = a • Equation for b = (a + b) – a = logit(yexp) – logit(ynon-exp)
Example weight - sex • Weight in 4 categories: • odds ratio = 8.9 • Odds of being a man is multiplied by 8.9 for each increase to a higher category of weight • Weight in kilos: • odds ratio = 1.2 • Odds of being a man is multiplied by 1.2 for each additional kilo de poids
adjusted for height Adjusted for weight Multiple IVs • Example: • Results: • Odds ratio for one additional kg: 1.15 (p<0.001) • Odds ratio for one additional cm: 1.20 (p<0.001)
Conclusions • Case-control study: design to examine associations between risk factors and disease • Mostly for rare disease • Efficient and cost effective • Odds ratio: measure of association – often similar to relative risk • Logistic regression: modeling method for binary dependent variables
Multilevel logistic regression This is a random intercept multilevel logistic regression
Correct judgment of normality • Statistical normality test (KS) was correct for 71.4% of the distributions. • Only 57.1% for AD and JB. • Levene test correct in both cases (because data were normally distributed) • You were correct for 71.4% of the distributions, but your errors were not the same as the statistical tests. • All of you correctly found the bimodal distribution • You were not influenced by sample size