SHRUG, Feb 2013: Networking exercise

SHRUG, Feb 2013: Networking exercise Many Ways to Solve a SAS Problem

Data sets work.cohort work.sesquart

What do you consider? • How easy was the code to read and understand? • How short was the code? • In the code samples that follow, are there new techniques that you’ll use? • Is there new syntax you learned? • Does the formatting of the code help legibility? • Do you see code that could create more user-friendly output?

Solution 1 Code & Results – David Moses, Ron Dewar proc rank data=COHORT out=COHORT_QUARTILE groups=4; var INCOME; ranks INCOME_QUARTILE; run; work.cohort_quartile

Solution 2: Code – Art Tabachneck data _for_format; set SESQUART; fmtname="income"; type="N"; do label=1 to 4; if label eq 1 then do; start=0; end=Pctl_25-.0000000001; end; else if label eq 2 then do; start=Pctl_25; end=Pctl_50-.0000000001; end; else if label eq 3 then do; start=Pctl_50; end=Pctl_75-.0000000001; end; else do; start=Pctl_75; end=99999999999; end; output; end; run; proc format cntlin = _for_format; run ; data want; set cohort; quartile=put(income,income.); run;

Solution 2 - Results work._for_format work.want

Solution 3: Code – Anonymous data inc_quart; * this appends the income cutpoints to the cohort dataset; if _n_ = 1 then do; set SESQUART (keep = Pctl_25 Pctl_50 Pctl_75); end; else; set COHORT ; * assign cutpoints to an array, so income can be compard in a loop; array q(*) Pctl_25 Pctl_50 Pctl_75 ; income_quartile = .; * who knows? income may be missing; if income ^= . then do; quart = 1; do i = 1 to 3; if income > q(i) then income_quartile = i+1; end; end; drop i Pctl_25 Pctl_50 Pctl_75 ; run;

Solution 3 - Results work.inc_quart

Solution 4: Code – Peter Ye proc sqlnoprint; select Pctl_25,Pctl_50,Pctl_75 into:q25, :q50, :q75 from sesquart; quit; data cohort; set cohort; select; when(missing(income)) quantile = .; when(income < &q25) quantile = 25; when(income < &q50) quantile = 50; when(income < &q75) quantile = 75; otherwise quantile = 100; end; run;

Solution 4 - Results work.cohort

Solution 5 Code: Anonymous (x2) * create formats for income; proc format; value incfmt .=‘Missing’ low-29721='At or below 25th percentile' 29722-36775='Above 25th percentile to median' 36776-44924='Above median to 75th percentile' other='Above 75th percentile‘ ; run; * put income into quartiles; proc sql feedback; create table incquartiles as select a.* , put(income,incfmt.) as incquart length=35 from work.cohort as a ; quit;

Solution 5 - Results work.incquartiles

Solution 6: Code – Anonymous data a; set work.sesquart; x=1; *x=1 added to this record; data b; set work.cohort; x=1; *x=1 added to each record; data c; merge a b; by x; *25th, 50th & 75th percentiles added to each record; q1=(y>pctl_75) +(y>pctl_50) +(y>pctl_25) +(income>=0); *quartile calculated by method 1; q2=(y>=pctl_75)+(y>=pctl_50)+(y>=pctl_25)+(income>=0); *quartile calculated by method 2; q =(q1+q2)/2; * 0 signifies invalid (missing or negative) income *1.5 signifies income can be classified as either 1st or 2nd quartile; *2.5 signifies income can be classified as either 2nd or 3rd quartile; *3.5 signifies income can be classified as either 3rd or 4th quartile; drop pctl_25 pctl_50 pctl_75 q1 q2 x; *drop variables that are no longer required; /* Dataset c contains the original cohort data with quartile (q) added to each record. q= 0 signifies an invalid (missing or negative) income. q=1.5 signifies ties in the value of income. Client could be classified as either 1st or 2nd cohort. q=2.5 signifies ties in the value of income. Client could be classified as either 2nd or 3rd cohort. q=3.5 signifies ties in the value of income. Client could be classified as either 3rd or 4th cohort. */ RUN;

Solution 6 - Results work.c NOTE: Variable y is uninitialized. NOTE: There were 1 observations read from the data set WORK.A. NOTE: There were 10 observations read from the data set WORK.B. NOTE: The data set WORK.C has 10 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.07 seconds cpu time 0.01 seconds

Solution 7 – Code: Anonymous * create formats for income; proc format; value incfmt .=‘Missing’ low-29721='At or below 25th percentile' 29722-36775='Above 25th percentile to median' 36776-44924='Above median to 75th percentile' other='Above 75th percentile' ; run; * create a new data set with income categorized into quartiles; data quartiles_inc; set cohort; if _n_=1 then set sesquart; attrib incquart length=$35 label='Income quartile'; * categorize income based on observation; incquart=put(income,incfmt.); run;

Solution 7 - Results work.quartiles_inc

Summary of Techniques

SHRUG, Feb 2013: Networking exercise