1 / 38

Biostat 201: Winter 11

Biostat 201: Winter 11. Lab Session 4. Assignment 4. What do we need to do?. Import data Recode variables Run a logistic regression Interaction terms Selection: stepwise, backward Statistics: deviance, mean deviance, HL fit, R-square, c Create an ROC curve

kobe
Download Presentation

Biostat 201: Winter 11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostat 201: Winter 11 Lab Session 4

  2. Assignment 4

  3. What do we need to do? • Import data • Recode variables • Run a logistic regression • Interaction terms • Selection: stepwise, backward • Statistics: deviance, mean deviance, HL fit, R-square, c • Create an ROC curve • Create a sensitivity/specificity/accuracy table • Make a prediction table

  4. Dataset used in this lab • The Evans county dataset from Kleinbaum’s “Logistic Regression: A Self Learning Text.” • The data are from a cohort study in which 609 white males are followed for 7 years, with coronary heart disease as the outcome of interest.

  5. SAS: Recoding variables • We do this in the DATA step • We can do: mathematical transformations, program interactions, recode variables (dummy coding; effect coding; categorical to categorical; continuous to categorical; etc.) • We could create a new dataset: • datanew_dataset;setorig_dataset; <insert equations here>;run; • Or we could update the old dataset: • dataorig_dataset; setorig_dataset; <insert equations here>; run;

  6. SAS: Recoding variables • If/Then statements are the most common way to do non-mathematical transformations • ifboolean_stmntthennewvar=###; • Example: • dataevans; set evans; if age <52 then agecat=0; if age>=52 then agecat=1; run; • Warning: When recoding variables, make sure every possibility is accounted for! • What if age is missing? • Missing values in SAS are considered as extremely small values.

  7. SAS: Recoding variablesHypothetical Examples • Common boolean statements (examples): • ifage<40 thenagecat=0; • if25<bmi<=30 thenweightcat=0; • ifstate in (“CA”,“OR”,“WA”) thenregion=4; • if age in (50,51,52) thenagecat=3;

  8. STATA: Recoding variablesHypothetical Examples • Recoding variables in STATA uses the “gen” command and “replace” command • SAS: ifage<40 thenagecat=0; • STATA: gen agecat=0 if age<40 • SAS: ifstate in (“CA”,“OR”) thenregion=4; • STATA: • gen region=4 if state==“CA” • replace region=4 if state==“OR”

  9. STATA: Dummy Codin g • gen agecat=0 if age<52 • replace agecat=1 if age>=52 Another option (Note this only creates binary variables) • gen agecat=(age>=52) • Numeric missing values are represented by large positive values

  10. SAS: Logistic regression • proc logisticdata=dataset;classz1 z2 …;modely (event=“#”)=z1 z2 … x1 x2 … /<insert options here>;run; quit; • (event=“#”): This specifies what is the event being modeled • There are many ways to specify similar models in proc logistic (e.g., the class notes specify another way that is also valid), so review the manual for all the details

  11. SAS: Logistic regression • Some additional notes: • Just like in PROC GLM, interaction terms can be coded directly! • Example: • modely (event=“1”)= z1 z2 x1 z1*x1; • When using the CLASS statement to specify your categorical variables, know that you are letting SAS determine the reference group. Therefore, pay careful attention to this when interpreting your model

  12. SAS: Logistic Regression • proclogistic data=evans; class cat (ref="0") agecat (ref="0") smk (ref="0") ecg (ref="0") hpt (ref="0") ch (ref="0")/param=ref; model chd (event="1")= cat agecat smk ecg hpt ch; run;

  13. SAS: Logistic Regression

  14. STATA: Logistic regression • This will give us the model with the odds ratios: • logisticyvarx1x2x3 … • This will give us the model with the coefficients: • logisticyvarx1x2x3 …, coef • Both models will yvar=1 use as the “event”

  15. STATA: Logistic regressionEstimated coefficients • logistic chd cat agecat smk ecg hpt ch, coef

  16. STATA: Logistic regression • logistic chd cat agecatsmkecghptch, or

  17. SAS: Logistic regression • Some important MODEL options: • Variable selection method: • Stepwise: • selection=stepwiseslentry=#slstay=# • Backward: • selection=backwardslstay=# • Forcing variables in the model: • include=# • proclogistic data=evans; class cat (ref="0") agecat (ref="0") smk (ref="0") ecg (ref="0") hpt (ref="0") ch (ref="0")/param=ref; model chd (event="1")= cat agecat smk ecg hpt ch/selection=stepwise slstay=0.05 include=2; run;

  18. STATA: Logistic regression • Some important MODEL options: • Variable selection method: • Stepwise: • stepwise, pe(#) pr(#): logistic … • Backward: • stepwise, pr(#): logistic … • Forcing variables in the model: • stepwise lockterm1, …: logistic … • stepwise, pe(0.1): logistic chd cat agecat smk ecg hpt ch

  19. STATA: Logistic regression • stepwise, pe(.1) lockterm1: logistic chd cat agecat smk ecg hpt chstepwise, pe(.1) lockterm1: logistic chd (cat agecat) smk ecg hpt ch stepwise, pe(.1) lockterm1: logistic chd smk ecg hpt ch cat agecatstepwise, pe(.1) lockterm1: logistic chd (smk ecg) hpt ch cat agecat

  20. SAS: Logistic regression • Some important MODEL options: • Additional statistics • Hosmer-Lemeshow fit statistic: • lackfit • Risk limit confidence intervals: • risklimits • R-square: • rsquare • Deviance: • aggregate scale=none

  21. SAS: Logistic Regression • proclogistic data=evans; class cat (ref="0") agecat (ref="0") smk (ref="0") ecg (ref="0") hpt (ref="0") ch (ref="0")/param=ref; model chd (event="1")= cat agecatsmkecghptch/rsquarelackfitrisklimits aggregate scale=none; run;

  22. SAS: Logistic Regression Large chi square values (small p-values) indicates lack of fit. G+D Deviance Mean deviance= deviance / (n-k) Where n = number of subjects and k = number of parameters estimated

  23. SAS: Logistic Regression

  24. STATA: Logistic regression • Some important MODEL options: • Additional statistics • Hosmer-Lemeshow fit statistic: • estat gof, group(10) • Risk limit confidence intervals: • Automatically given • R-square: • fitstat [see “Cox-Snell” and “Nagelkerke”] • Deviance: • fitstat [see “D(#)”]

  25. STATA: Logistic regression Large chi square values (small p-values) indicates lack of fit.

  26. STATA: Logistic regression NOTE: fitstat is NOT a built in command in STATA. It must be downloaded. Type: “findit spost9_ado” in the commands window and then follow the instructions to download

  27. SAS: Creating an ROC curve • To create an ROC curve, we’ll need to get all the ROC statistics • SAS does this by creating a new dataset from the logistic regression model • We do this by adding the following to the MODEL options: • outroc=roc_dataset • This dataset will contain sensitivity, 1-specificity, etc., giving us info to create an ROC curve

  28. SAS: Creating an ROC curve symbol1 i=join v=none ; procgplot data=roc_dataset; title 'ROC Curve'; plot _sensit_*_1mspec_=1 / vaxis=0 to 1 by .1 ; run; quit;

  29. SAS: Getting the ROC valuePart of SAS output: Associated Statistics

  30. SAS: Creating a sensitivity / specificity / accuracy table • Because this dataset contains sensitivity and 1-specificity information, we could create a table with the sensitivity, specificity, and unweighted accuracy information • DATA roc_dataset; SET roc_dataset;spec = 1 - _1MSPEC_;acc = (_SENSIT_ + spec)/2;RUN;

  31. SAS: Creating a sensitivity / specificity table (Part 2) • proclogistic data=evans; class cat (ref="0") agecat (ref="0") smk (ref="0") ecg (ref="0") hpt (ref="0") ch (ref="0")/param=ref; model chd (event="1")= cat agecatsmkecghptch/rsquarelackfitrisklimits aggregate scale=none outroc=roc_datasetctablepprob = (.05 to .6 by .05); run;

  32. SAS: Making a prediction table • To make a prediction table, we need to get probabilities that are predicted by our model • We do this by outputting a new dataset when running proc logistic • proc logistic data=dataset;modely (event=“#”)= x1 x2…;outputout=ptable_dataset pred=predp xbeta=logit;run; quit; • Making a prediction table requires additional processing of this newly created dataset. Please refer to the handout for additional information.

  33. STATA: Creating an ROC curve • To create an ROC curve, we just need to run the the lroc command after running the logistic regression • This will also give us the area under the ROC curve • Syntax: • logisticyvarx1x2x3 … • lroc • lsens, genprob(var) gensens(var) genspec(var)

  34. STATA: Creating an ROC curve • logistic chd cat agecat smk ecg hpt ch, or • lroc

  35. STATA: sensitivity / specificity • lsens, genprob(prob) gensens(sens) genspec(spec)

  36. STATA: Making a prediction table • In STATA, we could create a prediction table by iteratively running the estat classification command after running our model • estat classification, cutoff(#) • For cutoff, we specify a value between 0 and 1 • Syntax: • logisticyvarx1x2x3 … • estat classification, cutoff(0.1) • estat classification, cutoff(0.2) • estat classification, cutoff(0.3) • And so forth… • Based on this output, we could then create a classification table. Sensitivity and specificity are also given, so we could also make a sensitivity/specificity/accuracy table too.

  37. STATA: Making a prediction table • estat classification

  38. STATA: Making a prediction table • estat classification, cutoff(0.45)

More Related