400 likes | 963 Views
Biostat 201: Winter 11. Lab Session 4. Assignment 4. What do we need to do?. Import data Recode variables Run a logistic regression Interaction terms Selection: stepwise, backward Statistics: deviance, mean deviance, HL fit, R-square, c Create an ROC curve
E N D
Biostat 201: Winter 11 Lab Session 4
What do we need to do? • Import data • Recode variables • Run a logistic regression • Interaction terms • Selection: stepwise, backward • Statistics: deviance, mean deviance, HL fit, R-square, c • Create an ROC curve • Create a sensitivity/specificity/accuracy table • Make a prediction table
Dataset used in this lab • The Evans county dataset from Kleinbaum’s “Logistic Regression: A Self Learning Text.” • The data are from a cohort study in which 609 white males are followed for 7 years, with coronary heart disease as the outcome of interest.
SAS: Recoding variables • We do this in the DATA step • We can do: mathematical transformations, program interactions, recode variables (dummy coding; effect coding; categorical to categorical; continuous to categorical; etc.) • We could create a new dataset: • datanew_dataset;setorig_dataset; <insert equations here>;run; • Or we could update the old dataset: • dataorig_dataset; setorig_dataset; <insert equations here>; run;
SAS: Recoding variables • If/Then statements are the most common way to do non-mathematical transformations • ifboolean_stmntthennewvar=###; • Example: • dataevans; set evans; if age <52 then agecat=0; if age>=52 then agecat=1; run; • Warning: When recoding variables, make sure every possibility is accounted for! • What if age is missing? • Missing values in SAS are considered as extremely small values.
SAS: Recoding variablesHypothetical Examples • Common boolean statements (examples): • ifage<40 thenagecat=0; • if25<bmi<=30 thenweightcat=0; • ifstate in (“CA”,“OR”,“WA”) thenregion=4; • if age in (50,51,52) thenagecat=3;
STATA: Recoding variablesHypothetical Examples • Recoding variables in STATA uses the “gen” command and “replace” command • SAS: ifage<40 thenagecat=0; • STATA: gen agecat=0 if age<40 • SAS: ifstate in (“CA”,“OR”) thenregion=4; • STATA: • gen region=4 if state==“CA” • replace region=4 if state==“OR”
STATA: Dummy Codin g • gen agecat=0 if age<52 • replace agecat=1 if age>=52 Another option (Note this only creates binary variables) • gen agecat=(age>=52) • Numeric missing values are represented by large positive values
SAS: Logistic regression • proc logisticdata=dataset;classz1 z2 …;modely (event=“#”)=z1 z2 … x1 x2 … /<insert options here>;run; quit; • (event=“#”): This specifies what is the event being modeled • There are many ways to specify similar models in proc logistic (e.g., the class notes specify another way that is also valid), so review the manual for all the details
SAS: Logistic regression • Some additional notes: • Just like in PROC GLM, interaction terms can be coded directly! • Example: • modely (event=“1”)= z1 z2 x1 z1*x1; • When using the CLASS statement to specify your categorical variables, know that you are letting SAS determine the reference group. Therefore, pay careful attention to this when interpreting your model
SAS: Logistic Regression • proclogistic data=evans; class cat (ref="0") agecat (ref="0") smk (ref="0") ecg (ref="0") hpt (ref="0") ch (ref="0")/param=ref; model chd (event="1")= cat agecat smk ecg hpt ch; run;
STATA: Logistic regression • This will give us the model with the odds ratios: • logisticyvarx1x2x3 … • This will give us the model with the coefficients: • logisticyvarx1x2x3 …, coef • Both models will yvar=1 use as the “event”
STATA: Logistic regressionEstimated coefficients • logistic chd cat agecat smk ecg hpt ch, coef
STATA: Logistic regression • logistic chd cat agecatsmkecghptch, or
SAS: Logistic regression • Some important MODEL options: • Variable selection method: • Stepwise: • selection=stepwiseslentry=#slstay=# • Backward: • selection=backwardslstay=# • Forcing variables in the model: • include=# • proclogistic data=evans; class cat (ref="0") agecat (ref="0") smk (ref="0") ecg (ref="0") hpt (ref="0") ch (ref="0")/param=ref; model chd (event="1")= cat agecat smk ecg hpt ch/selection=stepwise slstay=0.05 include=2; run;
STATA: Logistic regression • Some important MODEL options: • Variable selection method: • Stepwise: • stepwise, pe(#) pr(#): logistic … • Backward: • stepwise, pr(#): logistic … • Forcing variables in the model: • stepwise lockterm1, …: logistic … • stepwise, pe(0.1): logistic chd cat agecat smk ecg hpt ch
STATA: Logistic regression • stepwise, pe(.1) lockterm1: logistic chd cat agecat smk ecg hpt chstepwise, pe(.1) lockterm1: logistic chd (cat agecat) smk ecg hpt ch stepwise, pe(.1) lockterm1: logistic chd smk ecg hpt ch cat agecatstepwise, pe(.1) lockterm1: logistic chd (smk ecg) hpt ch cat agecat
SAS: Logistic regression • Some important MODEL options: • Additional statistics • Hosmer-Lemeshow fit statistic: • lackfit • Risk limit confidence intervals: • risklimits • R-square: • rsquare • Deviance: • aggregate scale=none
SAS: Logistic Regression • proclogistic data=evans; class cat (ref="0") agecat (ref="0") smk (ref="0") ecg (ref="0") hpt (ref="0") ch (ref="0")/param=ref; model chd (event="1")= cat agecatsmkecghptch/rsquarelackfitrisklimits aggregate scale=none; run;
SAS: Logistic Regression Large chi square values (small p-values) indicates lack of fit. G+D Deviance Mean deviance= deviance / (n-k) Where n = number of subjects and k = number of parameters estimated
STATA: Logistic regression • Some important MODEL options: • Additional statistics • Hosmer-Lemeshow fit statistic: • estat gof, group(10) • Risk limit confidence intervals: • Automatically given • R-square: • fitstat [see “Cox-Snell” and “Nagelkerke”] • Deviance: • fitstat [see “D(#)”]
STATA: Logistic regression Large chi square values (small p-values) indicates lack of fit.
STATA: Logistic regression NOTE: fitstat is NOT a built in command in STATA. It must be downloaded. Type: “findit spost9_ado” in the commands window and then follow the instructions to download
SAS: Creating an ROC curve • To create an ROC curve, we’ll need to get all the ROC statistics • SAS does this by creating a new dataset from the logistic regression model • We do this by adding the following to the MODEL options: • outroc=roc_dataset • This dataset will contain sensitivity, 1-specificity, etc., giving us info to create an ROC curve
SAS: Creating an ROC curve symbol1 i=join v=none ; procgplot data=roc_dataset; title 'ROC Curve'; plot _sensit_*_1mspec_=1 / vaxis=0 to 1 by .1 ; run; quit;
SAS: Getting the ROC valuePart of SAS output: Associated Statistics
SAS: Creating a sensitivity / specificity / accuracy table • Because this dataset contains sensitivity and 1-specificity information, we could create a table with the sensitivity, specificity, and unweighted accuracy information • DATA roc_dataset; SET roc_dataset;spec = 1 - _1MSPEC_;acc = (_SENSIT_ + spec)/2;RUN;
SAS: Creating a sensitivity / specificity table (Part 2) • proclogistic data=evans; class cat (ref="0") agecat (ref="0") smk (ref="0") ecg (ref="0") hpt (ref="0") ch (ref="0")/param=ref; model chd (event="1")= cat agecatsmkecghptch/rsquarelackfitrisklimits aggregate scale=none outroc=roc_datasetctablepprob = (.05 to .6 by .05); run;
SAS: Making a prediction table • To make a prediction table, we need to get probabilities that are predicted by our model • We do this by outputting a new dataset when running proc logistic • proc logistic data=dataset;modely (event=“#”)= x1 x2…;outputout=ptable_dataset pred=predp xbeta=logit;run; quit; • Making a prediction table requires additional processing of this newly created dataset. Please refer to the handout for additional information.
STATA: Creating an ROC curve • To create an ROC curve, we just need to run the the lroc command after running the logistic regression • This will also give us the area under the ROC curve • Syntax: • logisticyvarx1x2x3 … • lroc • lsens, genprob(var) gensens(var) genspec(var)
STATA: Creating an ROC curve • logistic chd cat agecat smk ecg hpt ch, or • lroc
STATA: sensitivity / specificity • lsens, genprob(prob) gensens(sens) genspec(spec)
STATA: Making a prediction table • In STATA, we could create a prediction table by iteratively running the estat classification command after running our model • estat classification, cutoff(#) • For cutoff, we specify a value between 0 and 1 • Syntax: • logisticyvarx1x2x3 … • estat classification, cutoff(0.1) • estat classification, cutoff(0.2) • estat classification, cutoff(0.3) • And so forth… • Based on this output, we could then create a classification table. Sensitivity and specificity are also given, so we could also make a sensitivity/specificity/accuracy table too.
STATA: Making a prediction table • estat classification
STATA: Making a prediction table • estat classification, cutoff(0.45)