1 / 26

Statistical Procedures in Logistic and Linear Regression

Explore logistic and linear regression using SAS macros with a binary factor model and continuous factor model examples. Analyze Maximum Likelihood Estimates, Odds Ratios, and Multivariate vs. Univariate results. Understand the impact of independent variables on outcomes.

jgibbs
Download Presentation

Statistical Procedures in Logistic and Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lesson 11 - Topics • Statistical procedures: PROC LOGIST, REG • Multiple logistic and linear regression • Introduction to SAS macros Programs 21-22

  2. Logistic Regression Model a binary factor (yes/no) as a function of one or more independent variables. TOMHS Example: Smoking as a function of age, gender, race, and education Log(p/1-p) = b0 +b1x1 + b2x2 + bkxk

  3. DATA stat ; INFILE '~/SAS_Files/tomhsfull.data' ; INPUT @1 ptid $10. @27 age 2. @30 sex 1. @32 race 1. @49 educ 1. @51 eversmk 1. @53 nowsmk 1. @180 energy 5. ; if race = 2 then aa = 1; else aa = 0; if sex = 2 then women = 1; else women = 0; if educ in(1,2,3,4,5,6) then collgrad = 0; else if educ in(7,8,9) then collgrad = 1; if eversmk = 2 then currsmk = 2; else currsmk = nowsmk;

  4. if eversmk = 2 then currsmk = 2; else currsmk = nowsmk; Did you ever smoke cigarettes? 1 = yes, 2= no Do you now smoke cigarettes? 1 = yes, 2= no Var: eversmk Var: nowsmk Note: Second question only answered if first question is answered yes.

  5. PROC MEANS; VAR age women collgrad aa ; CLASS currsmk; RUN; N currsmk Obs Variable N Mean ------------------------------------------------------ 1 98 age 98 52.31 women 98 0.44 collgrad 98 0.23 aa 98 0.45 2 801 age 801 55.08 women 801 0.38 collgrad 799 0.38 aa 801 0.17 ------------------------------------------------------

  6. ODS SELECT ParameterEstimates OddsRatios PROC LOGIST; MODELcurrsmk = age women collgrad aa ; RUN; Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 1.7422 1.0235 2.8976 0.0887 age 1 -0.0732 0.0189 15.0704 0.0001 women 1 -0.2367 0.2407 0.9672 0.3254 collgrad 1 -0.6866 0.2618 6.8805 0.0087 aa 1 1.3394 0.2416 30.7354 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age 0.929 0.896 0.964 women 0.789 0.492 1.265 collgrad 0.503 0.301 0.841 aa 3.817 2.377 6.128 OR = exp(estimate) OR (age) = exp(-0.07) = 0.93

  7. Comparison of univariate versus multivariate results Multivariate Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 1.7422 1.0235 2.8976 0.0887 age 1 -0.0732 0.0189 15.0704 0.0001 women 1 -0.2367 0.2407 0.9672 0.3254 collgrad 1 -0.6866 0.2618 6.8805 0.0087 aa 1 1.3394 0.2416 30.7354 <.0001 Univariate (Separate regression runs) Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 1.7422 1.0235 2.8976 0.0887 age 1 -0.0736 0.0185 15.8221 <.0001 women 1 0.2561 0.2162 1.4026 0.2363 collgrad 1 -0.6945 0.2492 7.7635 0.0053 aa 1 1.4091 0.2242 39.5071 <.0001 Note: Women more likely to be AA then men in TOMHS and AA more likely to be smokers.

  8. Linear Regression Model a continuous factor as a function of one or more independent variables. TOMHS Example: Energy (calories) intake as a function of age, gender, race, and education

  9. ODS SELECT ParameterEstimates ; PROC REG; MODELenergy = age women collgrad aa ; RUN; The REG Procedure Model: MODEL1 Dependent Variable: energy Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 3574.78842 184.91689 19.33 <.0001 age 1 -20.67969 3.25993 -6.34 <.0001 women 1 -570.45804 44.34733 -12.86 <.0001 collgrad 1 -109.19062 44.01230 -2.48 0.0133 aa 1 -253.62159 54.07279 -4.69 <.0001 Energy = 3575 -21*age – 570*women – 109*collgrad – 253*aa

  10. Multivariate Analysis Parameter Standard Variable DF Estimate Error t Value Pr > |t| age 1 -20.67969 3.25993 -6.34 <.0001 women 1 -570.45804 44.34733 -12.86 <.0001 collgrad 1 -109.19062 44.01230 -2.48 0.0133 aa 1 -253.62159 54.07279 -4.69 <.0001 Univariate Analysis (Separate regression runs) Parameter Standard Variable DF Estimate Error t Value Pr > |t| age 1 -17.1154 3.60184 -4.75 <.0001 women 1 -595.40078 43.74189 -13.61 <.0001 collgrad 1 41.21749 48.61549 0.85 0.3968 aa 1 -388.19448 57.32940 -6.77 <.0001 Women less likely to be college graduates and also to have lower coloric intake.

  11. PROC MEANS; VARenergy; CLASS women aa collgrad; RUN; Analysis Variable : energy N women aa collgrad Obs N Mean -------------------------------------------------------------------------- 0 0 0 277 276 2445.043 1 213 213 2338.319 1 0 42 42 2141.714 1 23 23 1992.261 1 0 0 162 162 1795.938 1 71 71 1853.366 1 0 92 92 1694.196 1 20 20 1532.300 --------------------------------------------------------------------------

  12. Macro Variables and Use LIBNAME t ‘C:\SAS_Files'; %let nut = kcalbl dcholbl calcbl sodbl; %let cat = clinic; DATA temp; SET t.tomhs (KEEP=ptid &nut &cat); RUN; PROCMEANSDATA=temp ; VAR &nut ; CLASS &cat; TITLE"PROC Means results for variables &nut by &cat"; RUN; * Makes it easy to modify code;

  13. Macro Variables • Defined using %LET statement • Referenced by using &macrovarname • SAS substitutes the value of macrovarname when it encounters &macrovarname • Useful for making a program easy to modify • Usually put near top of program %let macrovarname = characters ;

  14. Simple Macro to Shorten Code Suppose I want to compute the change in 4 variables at 3 time points. Can use macro to help you. Variables: Dbp12,24,36 and dbpbl Sbp12,24,36 and sbpbl Chol12,24,36 and cholbl Gluc12,24,36 and glucbl %macro change(v); dbpdif&v = dbp&v - dbpbl; sbpdif&v = sbp&v - sbpbl; choldif&v = chol&v - cholbl; glucdif&v = gluc&v - glucbl; %mend change; option mprint; * Shows code generated in the log; data temp; set temp; %change(12); %change(24); %change(36); run;

  15. Simple Macro to Shorten Code %macro change(v); 36 %change(12); MPRINT(CHANGE): dbpdif12 = dbp12 - dbpbl; MPRINT(CHANGE): sbpdif12 = sbp12 - sbpbl; MPRINT(CHANGE): choldif12 = chol12 - cholbl; MPRINT(CHANGE): glucdif12 = gluc12 - glucbl; 36 %change(24); MPRINT(CHANGE): dbpdif24 = dbp24 - dbpbl; MPRINT(CHANGE): sbpdif24 = sbp24 - sbpbl; MPRINT(CHANGE): choldif24 = chol24 - cholbl; MPRINT(CHANGE): glucdif24 = gluc24 - glucbl; 37 %change(36); MPRINT(CHANGE): dbpdif36 = dbp36 - dbpbl; MPRINT(CHANGE): sbpdif36 = sbp36 - sbpbl; MPRINT(CHANGE): choldif36 = chol36 - cholbl; MPRINT(CHANGE): glucdif36 = gluc36 - glucbl; 38 run; SAS substitutes the value of v everywhere there is an &v

  16. Another Macro Example Goal of Macro named Summary: For a given dataset give summary statistics using PROC CONTENTS, MEANS and FREQ and (optionally) display the data using PROC PRINT. Instead of having to write the code each time, write a macro.

  17. Name of macro %macro summary ( dataset=, mvar=_numeric_, fvar = _character_, print=N, pvar=_all_); Parameters to Macro = defaults dataset: Name of dataset used mvar: List of variables to run for PROC MEANS (default is all numeric var) fvar: List of variables to run for PROC FREQ (default is all character var) print: If set to Y then run PROC PRINT (default is N) pvar: List of variables to run for PROC PRINT Remember: SAS Macros generate SAS code when you call it

  18. Parameters to Macro Name of macro %macro summary ( dataset=, mvar=_numeric_, fvar = _character_, print=N, pvar=_all_); proc contents data=&dataset varnum; run; proc means data=&dataset; var &mvar; run; proc freq data=&dataset; tables &fvar; run; %if &print = Y %then %do; proc print data=&dataset; var &pvar; run; %end; %mend summary; This will generate the proc print code only if the macro variable print equals Y.

  19. * This is the macro; proc contents data=&dataset varnum; run; proc means data=&dataset; var &mvar; run; proc freq data=&dataset; tables &fvar; run; %if &print = Y %then %do; proc print data=&dataset; var &pvar; run; %end; CALL TO MACRO: libname t ‘C:/PH6420/data/'; data tomhs; set t.tomhs; run; option mprint; * Call with only dataset given; %summary (dataset=tomhs); Code Generated: MPRINT(SUMMARY): proc contents data=tomhs varnum; MPRINT(SUMMARY): run; MPRINT(SUMMARY): proc means data=tomhs; MPRINT(SUMMARY): var _numeric_; MPRINT(SUMMARY): run; MPRINT(SUMMARY): proc freq data=tomhs; MPRINT(SUMMARY): tables _character_; MPRINT(SUMMARY): run;

  20. * This is the macro; proc contents data=&dataset varnum; run; proc means data=&dataset; var &mvar; run; proc freq data=&dataset; tables &fvar; run; %if &print = Y %then %do; proc print data=&dataset; var &pvar; run; %end; CALL TO MACRO: libname t ‘C:/PH6420/data/'; data tomhs; set t.tomhs; run; option mprint; %summary (dataset=tomhs, print=Y); Code Generated: MPRINT(SUMMARY): proc contents data=state varnum; MPRINT(SUMMARY): run; MPRINT(SUMMARY): proc means data=state; MPRINT(SUMMARY): var _numeric_; MPRINT(SUMMARY): run; MPRINT(SUMMARY): proc freq data=state; MPRINT(SUMMARY): tables _character_; MPRINT(SUMMARY): run; MPRINT(SUMMARY): proc print data=state; MPRINT(SUMMARY): var _all_; MPRINT(SUMMARY): run;

  21. * This is the macro; proc contents data=&dataset varnum; run; proc means data=&dataset; var &mvar; run; proc freq data=&dataset; tables &fvar; run; %if &print = Y %then %do; proc print data=&dataset; var &pvar; %end; CALL TO MACRO: libname t ‘C:/PH6420/data/'; data tomhs; set t.tomhs; run; option mprint; %summary (dataset=tomhs, fvar=clinic sex); Code Generated: MPRINT(SUMMARY): proc contents data=tomhs varnum; MPRINT(SUMMARY): run; MPRINT(SUMMARY): proc means data=tomhs; MPRINT(SUMMARY): var _numeric_; MPRINT(SUMMARY): run; MPRINT(SUMMARY): proc freq data=tomhs; MPRINT(SUMMARY): tables clinic sex; MPRINT(SUMMARY): run;

  22. -------------------------------------------------------------------------------------------------------------------------------------------- | | Diastolic BP at 12-Months | | |---------------------------------------| | | N | Mean | Std | Min | Max | |----------------------------+-------+-------+-------+-------+-------| |Study Group (1-6) | | | | | | |1 | 15| 77.8| 9.3| 68.0| 94.0| |2 | 17| 81.7| 7.1| 72.0| 100.0| |3 | 14| 78.1| 7.6| 67.0| 90.0| |4 | 14| 77.7| 6.0| 66.0| 89.0| |5 | 13| 79.6| 8.5| 66.0| 99.0| |6 | 19| 79.6| 7.3| 64.0| 95.0| |All | 92| 79.2| 7.6| 64.0| 100.0| ---------------------------------------------------------------------- proc tabulate data=_last_ noseps; class group; var sbp12; table (group all), (sbp12)*(n*f=7.0 mean*f=7.1 std*f=7.1 min*f=7.1 max*f=7.1)/rts=30; run;

  23. MACRO BRKSPSS: Creates tabulate table for each var in dlist by group %macro brkspss (grp,dlist,data=_last_,dec=3,all=all); %do I = 1 %to 100; %let depvar = %scan(&dlist,&i); %let %length(&depvar) = 0 %then %goto done; proc tabulate data=&data noseps; class &grp; var &depvar; table (&grp &all), (&depvar)*(n*f=7.0 mean*f=7.&dec std*f=7.&dec min*f=7.&dec max*f=7.&dec)/rts=30; run; %end; %done: %mend brkspss; %brkspss(group,dbp12 sbp12 chol12);

  24. MACRO BRKSPSS: Creates tabulate table for each var by group LIBNAME t '~/PH6420/2017/Data/'; DATA stat; set t.tomhs; RUN; * Example calls; %brkspss(group,dbp12 sbp12 chol12); %brkspss(group,dbp12 sbp12 chol12, dec=1); * Just 1-decimal; %brkspss(group,dbp12 sbp12 chol12, all=); * No totals;

  25. Output from last call: First 2 variables. ---------------------------------------------------------------------- | | Diastolic BP at 12-Months | | |---------------------------------------| | | N | Mean | Std | Min | Max | |----------------------------+-------+-------+-------+-------+-------| |Study Group (1-6) | | | | | | |1 | 15| 77.800| 9.314| 68.000| 94.000| |2 | 17| 81.706| 7.078| 72.000|100.000| |3 | 14| 78.071| 7.580| 67.000| 90.000| |4 | 14| 77.714| 5.954| 66.000| 89.000| |5 | 13| 79.615| 8.540| 66.000| 99.000| |6 | 19| 79.579| 7.313| 64.000| 95.000| ---------------------------------------------------------------------- ---------------------------------------------------------------------- | | Systolic BP at 12-Months | | |---------------------------------------| | | N | Mean | Std | Min | Max | |----------------------------+-------+-------+-------+-------+-------| |Study Group (1-6) | | | | | | |1 | 15|120.200| 12.537| 93.000|141.000| |2 | 17|124.118| 11.280|108.000|142.000| |3 | 14|117.429| 9.436|104.000|135.000| |4 | 14|127.571| 11.876|112.000|149.000| |5 | 13|123.154| 18.348| 94.000|158.000| |6 | 19|129.895| 12.987|105.000|154.000| ----------------------------------------------------------------------

  26. Where to put macro? • At beginning of program before you call it %macro brkspss(parameters); … macro code %mend brkspss; %brkspss (group, dbp12 sbp12, data=tomhs); • Save as separate sas file and %include file on top of program. %include ‘/folderpath/brkspss.sas’; %brkspss(group, dbp12 sbp12, data=tomhs);

More Related