540 likes | 739 Views
Basic epidemiologic analysis with Stata. Biostatistics 212 Lecture 5. Housekeeping. Final Project – by the last session you should: Have dataset imported into Stata Clean up the variables you will use Sketch out (paper and pencil) a table and a figure Be ready to write analysis do files.
E N D
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5
Housekeeping • Final Project – by the last session you should: • Have dataset imported into Stata • Clean up the variables you will use • Sketch out (paper and pencil) a table and a figure • Be ready to write analysis do files
Today... • Introduction to 3 “suites” of commands: st, svy, and epitab • Exploring interaction and confounding with Stata Epitab commands • Adjusting for many things at once • Logistic regression • General notes about regression, post-estimation • Testing for trends
Three special “suites” of Stata commands • Very quick introduction to 2: • Survival analysis (st suite) • Complex survey data (svy suite) • Most of lecture on the third: • Tables for epidemiologists (epitab suite) • Stata’s tools for analyzing confounding/interaction
Survival analysis commands • “Time-to-event” data is common, special • Outcome is specified by TWO variables • time (continuous) and “failure” (y/n) • Special set of commands: the st suite • help st and help sts • Two step process: 1) Set up the data with stset 2) Use special commands (e.g. sts graph, stcox) • Special handout coming…
Complex survey commands • Complex survey data • Multistage sampling, weights, clustering • Must account for this when analyzing • Common with national surveys like NHANES • Special set of commands: the svy suite • help svy and help svy estimation • Two step process: 1) Set up the data with svyset 2) Use svy: prefix (supported by many standard commands) • svy: total • svy, subpop(if male==1): tab smoke diabetes • svy: logistic cac age male • Special handout coming…
Confounding and Interaction • The “meat” of this lecture, and the lab today
Sidenote: Stata is a great tool for understanding theory • Hands on vs. theoretical teaching • Use Stata to get your hands on the data • See the dataset • Write the command • See the output • Lab 1: Exposure to basic stats • Today: Exposure to basic epi concepts • Confounding and interaction
Confounding and Interaction • Practical questions • What is confounding? • What does it mean to “adjust” for something? • When to adjust and what to adjust for? • When to stratify? • What do the adjusted estimates mean?
An Example • Does binge drinking cause atherosclerosis? • RQ: Is there an association between self-reported binge drinking and presence of coronary calcium among young adults? • CARDIA Year 15 examination
Binge drinking and coronary calcium tab binge cac, row chi2 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ Binge | pattern | (>5 drinks | coronary artery on | calcium detected? occasion) | 0 1 | Total -----------+----------------------+---------- 0 | 2,165 186 | 2,351 | 92.09 7.91 | 100.00 -----------+----------------------+---------- 1 | 585 106 | 691 | 84.66 15.34 | 100.00 -----------+----------------------+---------- Total | 2,750 292 | 3,042 | 90.40 9.60 | 100.00 Pearson chi2(1) = 33.9612 Pr = 0.000
Binge drinking and coronary calcium 8% with CAC if no binge 15% with CAC if binge p<.001 tab binge cac, row chi2 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ Binge | pattern | (>5 drinks | coronary artery on | calcium detected? occasion) | 0 1 | Total -----------+----------------------+---------- 0 | 2,165 186 | 2,351 | 92.09 7.91 | 100.00 -----------+----------------------+---------- 1 | 585 106 | 691 | 84.66 15.34 | 100.00 -----------+----------------------+---------- Total | 2,750 292 | 3,042 | 90.40 9.60 | 100.00 Pearson chi2(1) = 33.9612 Pr = 0.000
Binge drinking and coronary calcium • Answer to RQ: Yes! There is an association. • But does binge drinking CAUSE atherosclerosis?
Binge drinking and coronary calcium • Possible explanations* • Chance • Bias • Effect-cause • Confounding • Cause-effect * Hulley et al. Designing Clinical Research
Binge drinking and coronary calcium • Possible explanations • Chance very unlikely • Bias possible – not focus here! • Effect-cause unlikely? • Confounding YES! • Cause-effect ?
Binge drinking and coronary calcium • Male gender could “confound” the association Male Binge drinking ? Coronary calcium
Binge drinking and coronary calcium • Male gender could “confound” the association Male ? ? Binge drinking ? Coronary calcium
Binge drinking and coronary calcium • Men more likely to binge • 34% of men, 14% of women • Men have more coronary calcium • 15% of men, 7% of women
Binge drinking and coronary calcium • Male gender could “confound” the association • Now what do we do?? Male Binge drinking ? Coronary calcium
2 x 2 Tables • Practical tools • “Contingency tables” are a traditional analytic tool of the epidemiologist Outcome + - + - a b OR = (a/b) /(c/d) = ad/bc RR = a/(a+b) / c/(c+d) Exposure c d
2 x 2 Tables Coronary calcium + - + - 106 585 691 OR = 2.1 (1.6 – 2.7) RR = 1.9 (1.6 – 2.4) Binge drinking 186 2165 2351 292 2750 3042
2 x 2 Tables Coronary calcium + - + - 106 585 691 OR = 2.1 (1.6 – 2.7) RR = 1.9 (1.6 – 2.4) Binge drinking 186 2165 2351 292 2750 3042
2 x 2 Tables • How do we use 2x2 tables to “adjust” for a confounder?
2 x 2 Tables • How do we use 2x2 tables to “adjust” for a confounder? 1) Stratify 2) Examine strata-specific estimates (for interaction) 3) Combine estimates if appropriate (if no interaction) • Weighted average of strata-specific estimates
2 x 2 Tables CAC • First, stratify… + - + - RR = 1.94 (1.55-2.42) Binge In men In women CAC CAC + - + - + - + - Binge Binge RR = 1.50 (1.16-1.93) RR = 1.57 (0.94-2.62)
2 x 2 Tables • …compare strata-specific estimates… • (they’re about the same) In men In women CAC CAC + - + - (34%) (14%) + - + - Binge Binge (15%) (7%) RR = 1.50 (1.16-1.93) RR = 1.57 (0.94-2.62)
2 x 2 Tables • …and then “combine” the estimates. In men In women CAC CAC + - + - + - + - Binge Binge RR = 1.50 (1.16-1.93) RR = 1.57 (0.94-2.62) RRadj = 1.51 (1.21-1.89)
+ - + - RR = 1.94 (1.55-2.42) Binge In men In women CAC CAC + - + - (34%) (14%) + - + - Binge Binge (15%) (7%) RR = 1.50 (1.16-1.93) RR = 1.57 (0.94-2.62) RRadj = 1.51 (1.21-1.89)
2 x 2 Tables • How do we do this with Stata? • Tabulate – output not exactly what we want. • The “epitab” commands • Stata’s answer to stratified analyses cs, cc csi, cci tabodds, mhodds
2 x 2 Tables • Example – demo using Stata Binge drinking and coronary calcium cs cac binge cs cac binge, by(male) Moderate drinking and coronary calcium cs cac modalc cs cac modalc, by(racegender) Case-control studies cc cac binge
2 x 2 Tables • Example of a crude association (unadjusted) . cs cac binge | Binge pattern [>5 drinks| | on occasion] | | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 106 186 | 292 Noncases | 585 2165 | 2750 -----------------+------------------------+------------ Total | 691 2351 | 3042 | | Risk | .1534009 .0791153 | .0959895 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .0742856 | .0452852 .103286 Risk ratio | 1.938954 | 1.551487 2.423187 Attr. frac. ex. | .484258 | .355457 .5873203 Attr. frac. pop | .1757923 | +------------------------------------------------- chi2(1) = 33.96 Pr>chi2 = 0.0000
2 x 2 Tables • Example of Confounding . cs cac binge, by(male) male | RR [95% Conf. Interval] M-H Weight -----------------+------------------------------------------------- 0 | 1.570175 .9402789 2.622042 9.339759 1 | 1.497071 1.164201 1.925117 39.53256 -----------------+------------------------------------------------- Crude | 1.938954 1.551487 2.423187 M-H combined | 1.511042 1.205656 1.89378 ------------------------------------------------------------------- Test of homogeneity (M-H) chi2(1) = 0.027 Pr>chi2 = 0.8700
2 x 2 Tables • Example of Effect Modification . cs cac modalc, by(racegender) racegender | RR [95% Conf. Interval] M-H Weight -----------------+------------------------------------------------- Black women | .75888 .3595892 1.601547 8.043758 White women | .8960739 .4971477 1.61511 11.07552 Black men | 1.945668 1.114927 3.3954 8.304878 White men | .9279831 .66551 1.293974 29.45557 -----------------+------------------------------------------------- Crude | 1.30072 1.023022 1.653798 M-H combined | 1.046446 .8225915 1.331218 ------------------------------------------------------------------- Test of homogeneity (M-H) chi2(3) = 6.245 Pr>chi2 = 0.1003
2 x 2 Tables • Inmediate commands • csi, cci • No dataset required – just 2x2 cell frequencies csi a b c d csi 106 186 585 2165 (for cac binge)
Multivariable adjustment • Binge drinking appears to be associated with coronary calcium • Association partially due to confounding by gender • What about race? Age? SES? Smoking?
Multivariable adjustmentmanual stratification # 2x2 tables Crude association 1 Adjust for gender 2 Adjust for gender, race 4 Adjust for gender, race, age 68 Adjust for “” + income, education 816 Adjust for “” + “” + smoking 2448
Multivariable adjustmentcs command • cs command • Does manual stratification for you • Lists results from every strata • Tests for overall homogeneity • Adjusted and crude results • Demo cs cac binge, by(male black age)
Multivariable adjustmentcs command • cs command • Does manual stratification for you • Lists results from every strata • Tests for overall homogeneity • Adjusted and crude results • Demo cs cac binge, by(male black age) • Can’t interpret interactions!
Multivariable adjustmentmhodds command • mhodds allows you to look at specific interactions, adjusted for multiple covariates • Does same stratification for you • Adjusted results for each interaction variable • P-value for specific interaction (homogeneity) • Summary adjusted result • Demo mhodds cac binge age, by(racegender)
Multivariable adjustmentmhodds command • mhodds allows you to look at specific interactions, adjusted for multiple covariates • Does same stratification for you • Adjusted results for each interaction variable • P-value for specific interaction (homogeneity) • Summary adjusted result • Demo mhodds cac binge age, by(racegender) • But strata get “thin”!
Multivariable adjustmentlogistic command • Assumes “logit” model • Await biostats class for details! • Coefficients estimated, no actual stratification • Continuous variables used as they are
Multivariable adjustmentlogistic command Basic syntax: logistic outcomevar [predictorvar1 predictorvar2 predictorvar3…]
Multivariable adjustmentlogistic command If using any categorical predictors: logistic outcomevar [i.catvar var2…] Creates “dummy variables” on the fly If you forget, Stata won’t know they are categorical, and you’ll get the wrong answer!
Multivariable adjustmentlogistic command Demo logistic cac binge logistic cac binge male logistic cac binge male black logistic cac binge male black age logistic cac binge male black age i.smoke logistic cac binge##i.racegender age i.smoke testparm binge#racegender logistic cac modalc##racegender age i.smoke testparm modalc#racegender
Multivariable adjustmentlogistic command Demo • . logistic cac binge male black age i.smoke • Logistic regression Number of obs = 3036 • LR chi2(6) = 211.95 • Prob > chi2 = 0.0000 • Log likelihood = -852.99988 Pseudo R2 = 0.1105 • ------------------------------------------------------------------------------ • cac | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • binge | 1.387573 .1985356 2.29 0.022 1.04825 1.836736 • male | 3.253031 .4608842 8.33 0.000 2.464287 4.294227 • black | .7282563 .0994953 -2.32 0.020 .5571755 .9518675 • age | 1.19833 .025771 8.41 0.000 1.148869 1.24992 • | • smoke | • 1 | 1.357694 .2308652 1.80 0.072 .9728859 1.894707 • 2 | 2.120925 .3302699 4.83 0.000 1.563063 2.87789 • ------------------------------------------------------------------------------
logistic command interaction demo . logistic cac modalc##racegender age i.smoke Logistic regression Number of obs = 2795 LR chi2(10) = 186.28 Prob > chi2 = 0.0000 Log likelihood = -739.54359 Pseudo R2 = 0.1119 ------------------------------------------------------------------------------ cac | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.modalc | .6024889 .2430813 -1.26 0.209 .2732258 1.328546 | racegender | 2 | 1.018361 .3137632 0.06 0.953 .5567262 1.862783 3 | 1.601149 .519393 1.45 0.147 .8478374 3.023786 4 | 4.119486 1.100853 5.30 0.000 2.439922 6.955209 | modalc#| racegender | 1 2 | 1.422897 .7314808 0.69 0.493 .5195041 3.897247 1 3 | 2.867897 1.473405 2.05 0.040 1.047736 7.850102 1 4 | 1.546468 .7057105 0.96 0.339 .6322751 3.782472 | age | 1.184036 .0271845 7.36 0.000 1.131937 1.238534 | smoke | 1 | 1.438413 .2623889 1.99 0.046 1.00603 2.056629 2 | 2.464978 .4157232 5.35 0.000 1.771154 3.430597 ------------------------------------------------------------------------------
logistic command interaction demo The testparm command is a “post-estimation” command (used after regression) that provides a single p-value for simultaneous test of multiple regresssion terms . testparm modalc#racegender ( 1) [cac]1.modalc#2.racegender = 0 ( 2) [cac]1.modalc#3.racegender = 0 ( 3) [cac]1.modalc#4.racegender = 0 chi2( 3) = 4.84 Prob > chi2 = 0.1842 There are many other “post-estimation” commands that use stored data/results from the prior command…
Multivariable adjustmentlogistic command • Pro’s • Provides all OR’s in the model • Accepted approach (mhodds rarely used by statisticians) • Can deal with continuous variables (like age) • Better estimation for large models? • Con’s • Interaction testing more cumbersome, less automatic • More assumptions • Harder to test for trends
Multivariable adjustment • Format for linear regression, and other types of regression is the same as for logistic regression, except for the initial command: regressoutcomevar [predictorvar1 predictorvar2 predictorvar3…] ologitoutcomevar [predictorvar1 predictorvar2 predictorvar3…] Etc; and post-estimation generally works the same with all of them
Testing for trendstaboddscommand • For trends in a dichotomous variable with “higher” categories of an ordinal categorical variable . tabodds cac alccat -------------------------------------------------------------------------- alccat | cases controls odds [95% Conf. Interval] ------------+------------------------------------------------------------- 0 | 110 1325 0.08302 0.06835 0.10084 <1 | 90 933 0.09646 0.07770 0.11976 1-1.9 | 46 295 0.15593 0.11429 0.21275 2+ | 45 193 0.23316 0.16856 0.32252 -------------------------------------------------------------------------- Test of homogeneity (equal odds): chi2(3) = 36.70 Pr>chi2 = 0.0000 Score test for trend of odds: chi2(1) = 32.20 Pr>chi2 = 0.0000