190 likes | 308 Views
SHRUG, Feb 2013: Networking exercise. Many Ways to Solve a SAS Problem. Data sets. work.cohort. work.sesquart. What do you consider?. How easy was the code to read and understand? How short was the code? In the code samples that follow, are there new techniques that you’ll use?
E N D
SHRUG, Feb 2013: Networking exercise Many Ways to Solve a SAS Problem
Data sets work.cohort work.sesquart
What do you consider? • How easy was the code to read and understand? • How short was the code? • In the code samples that follow, are there new techniques that you’ll use? • Is there new syntax you learned? • Does the formatting of the code help legibility? • Do you see code that could create more user-friendly output?
Solution 1 Code & Results – David Moses, Ron Dewar proc rank data=COHORT out=COHORT_QUARTILE groups=4; var INCOME; ranks INCOME_QUARTILE; run; work.cohort_quartile
Solution 2: Code – Art Tabachneck data _for_format; set SESQUART; fmtname="income"; type="N"; do label=1 to 4; if label eq 1 then do; start=0; end=Pctl_25-.0000000001; end; else if label eq 2 then do; start=Pctl_25; end=Pctl_50-.0000000001; end; else if label eq 3 then do; start=Pctl_50; end=Pctl_75-.0000000001; end; else do; start=Pctl_75; end=99999999999; end; output; end; run; proc format cntlin = _for_format; run ; data want; set cohort; quartile=put(income,income.); run;
Solution 2 - Results work._for_format work.want
Solution 3: Code – Anonymous data inc_quart; * this appends the income cutpoints to the cohort dataset; if _n_ = 1 then do; set SESQUART (keep = Pctl_25 Pctl_50 Pctl_75); end; else; set COHORT ; * assign cutpoints to an array, so income can be compard in a loop; array q(*) Pctl_25 Pctl_50 Pctl_75 ; income_quartile = .; * who knows? income may be missing; if income ^= . then do; quart = 1; do i = 1 to 3; if income > q(i) then income_quartile = i+1; end; end; drop i Pctl_25 Pctl_50 Pctl_75 ; run;
Solution 3 - Results work.inc_quart
Solution 4: Code – Peter Ye proc sqlnoprint; select Pctl_25,Pctl_50,Pctl_75 into:q25, :q50, :q75 from sesquart; quit; data cohort; set cohort; select; when(missing(income)) quantile = .; when(income < &q25) quantile = 25; when(income < &q50) quantile = 50; when(income < &q75) quantile = 75; otherwise quantile = 100; end; run;
Solution 4 - Results work.cohort
Solution 5 Code: Anonymous (x2) * create formats for income; proc format; value incfmt .=‘Missing’ low-29721='At or below 25th percentile' 29722-36775='Above 25th percentile to median' 36776-44924='Above median to 75th percentile' other='Above 75th percentile‘ ; run; * put income into quartiles; proc sql feedback; create table incquartiles as select a.* , put(income,incfmt.) as incquart length=35 from work.cohort as a ; quit;
Solution 5 - Results work.incquartiles
Solution 6: Code – Anonymous data a; set work.sesquart; x=1; *x=1 added to this record; data b; set work.cohort; x=1; *x=1 added to each record; data c; merge a b; by x; *25th, 50th & 75th percentiles added to each record; q1=(y>pctl_75) +(y>pctl_50) +(y>pctl_25) +(income>=0); *quartile calculated by method 1; q2=(y>=pctl_75)+(y>=pctl_50)+(y>=pctl_25)+(income>=0); *quartile calculated by method 2; q =(q1+q2)/2; * 0 signifies invalid (missing or negative) income *1.5 signifies income can be classified as either 1st or 2nd quartile; *2.5 signifies income can be classified as either 2nd or 3rd quartile; *3.5 signifies income can be classified as either 3rd or 4th quartile; drop pctl_25 pctl_50 pctl_75 q1 q2 x; *drop variables that are no longer required; /* Dataset c contains the original cohort data with quartile (q) added to each record. q= 0 signifies an invalid (missing or negative) income. q=1.5 signifies ties in the value of income. Client could be classified as either 1st or 2nd cohort. q=2.5 signifies ties in the value of income. Client could be classified as either 2nd or 3rd cohort. q=3.5 signifies ties in the value of income. Client could be classified as either 3rd or 4th cohort. */ RUN;
Solution 6 - Results work.c NOTE: Variable y is uninitialized. NOTE: There were 1 observations read from the data set WORK.A. NOTE: There were 10 observations read from the data set WORK.B. NOTE: The data set WORK.C has 10 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.07 seconds cpu time 0.01 seconds
Solution 7 – Code: Anonymous * create formats for income; proc format; value incfmt .=‘Missing’ low-29721='At or below 25th percentile' 29722-36775='Above 25th percentile to median' 36776-44924='Above median to 75th percentile' other='Above 75th percentile' ; run; * create a new data set with income categorized into quartiles; data quartiles_inc; set cohort; if _n_=1 then set sesquart; attrib incquart length=$35 label='Income quartile'; * categorize income based on observation; incquart=put(income,incfmt.); run;
Solution 7 - Results work.quartiles_inc