260 likes | 284 Views
Learn how to create SAS datasets from procedure outputs, generate reports using data step and PROC TABULATE, and use PROC RANK to divide observations into categories based on variable values.
E N D
Lesson 8 - Topics • Creating SAS datasets from procedures • Creating reports using data-step and PROC TABULATE • Using PROC RANK • Programs 14-15 in course notes • LSB 4:11;13-17;5:3
Making SAS Datasets From Procedures • Output from SAS PROCs can be put into SAS datasets: • To do further processing of the information from the output • To reformat output to make a report
Ways to Put Output into SAS Datasets • Using OUTPUT statement available in many procedures • Using ODS OUTPUT statement – any output table can be put into a SAS dataset
Report We Want to Generate Quartiles of Weight Change by Clinical Center Clinic N P25 P50 P75 A 15 -19.0 -12.0 -9.0 B 25 -11.0 -8.0 -2.8 C 35 -20.8 -14.0 -6.5 D 17 -12.8 -6.5 -1.5 Total 92 -17.7 -9.9 -5.2
Program 14 LIBNAME class ‘C:\SAS_Files'; * Will use SAS dataset version of TOMHS data; DATA wt; SET class.tomhs (KEEP=ptid clinic wtbl wt12 ); wtchg = wt12 - wtbl; RUN;
* Create report by clinic using OUTPUT; PROCMEANSDATA = wt NOPRINT; CLASS clinic; VAR wtchg ; OUTPUTOUT=summary N = n Q1 = p25 MEDIAN = p50 Q3 = P75 ; Dataset summary will have one observation for each clinic and the total. Name of new dataset Statistic name = variable name
PROCPRINTDATA = summary; RUN; Obs clinic _TYPE_ _FREQ_ n p25 p50 p75 1 0 100 92 -17.65 -9.9 -5.15 2 A 1 18 15 -19.00 -12.0 -9.00 3 B 1 29 25 -11.00 -8.0 -2.80 4 C 1 36 35 -20.80 -14.0 -6.50 5 D 1 17 17 -12.80 -6.5 -1.50
* Put total row at the bottom; PROC SORT; BY DESCENDING_type_ clinic; PROC PRINT ; RUN; Obs clinic _TYPE_ _FREQ_ n p25 p50 p75 1 A 1 18 15 -19.00 -12.0 -9.00 2 B 1 29 25 -11.00 -8.0 -2.80 3 C 1 36 35 -20.80 -14.0 -6.50 4 D 1 17 17 -12.80 -6.5 -1.50 5 0 100 92 -17.65 -9.9 -5.15
* Create final report; DATA summary; LENGTH clinic $5.; SET summary; if missing(clinic) then clinic = ‘Total’; DROP _type_ _freq_; RUN; PROC PRINT NOOBS ; FORMAT p25 p50 p75 6.1; RUN; clinic n p25 p50 p75 A 15 -19.0 -12.0 -9.0 B 25 -11.0 -8.0 -2.8 C 35 -20.8 -14.0 -6.5 D 17 -12.8 -6.5 -1.5 Total 92 -17.7 -9.9 -5.2
Using ODS to Send Output to a SAS Dataset Syntax: ODSOUTPUT output-table = new-data-set; * Output quantile table to a dataset; ODSOUTPUT quantiles = qwt; PROCUNIVARIATEDATA = wt ; VAR wtbl wt12 ; RUN; ODSOUTPUT CLOSE ; PROCPRINTDATA=qwt; RUN;
Display of Output Dataset Obs Varname Quantile Estimate 1 wtbl 100% Max 279.30 2 wtbl 99% 274.15 3 wtbl 95% 246.40 4 wtbl 90% 237.40 5 wtbl 75% Q3 215.15 6 wtbl 50% Median 192.65 7 wtbl 25% Q1 165.90 8 wtbl 10% 141.50 9 wtbl 5% 137.40 10 wtbl 1% 130.25 11 wtbl 0% Min 128.50 12 wt12 100% Max 271.50 13 wt12 99% 271.50 14 wt12 95% 239.00 15 wt12 90% 227.00 16 wt12 75% Q3 202.50 17 wt12 50% Median 180.00 18 wt12 25% Q1 153.50 19 wt12 10% 133.00 20 wt12 5% 130.00 21 wt12 1% 123.00 22 wt12 0% Min 123.00 Would like to put side-by-side
DATA wtbl wt12 ; SET qwt; if varname = 'wtbl' then output wtbl; else if varname = 'wt12' then output wt12; RUN; PROC DATASETS ; MODIFY wtbl; RENAME estimate = wtbl; MODIFY wt12; RENAME estimate = wt12; RUN; DATA all; MERGE wtbl wt12; DROP varname; RUN; PROC PRINT; Separate the data into 2 datasets PROC DATASETS used for changing variable names Put 2 datasets side-by-side Note: no BY statement, OK here
Obs Quantile wtbl wt12 1 100% Max 279.30 271.50 2 99% 274.15 271.50 3 95% 246.40 239.00 4 90% 237.40 227.00 5 75% Q3 215.15 202.50 6 50% Median 192.65 180.00 7 25% Q1 165.90 153.50 8 10% 141.50 133.00 9 5% 137.40 130.00 10 1% 130.25 123.00 11 0% Min 128.50 123.00
PROC RANK • Used to divide observations into equal size categories based on values of a variable • Creates a new variable containing the categories • New variable is added to the dataset or to a new dataset • Example: Divide weight change into 5 equal categories (Quinitiles)
PROC RANK SYNTAX PROCRANKDATA = dataset OUT = outdataset GROUPS = # of categories VARvarname; RANKSnewvarname; Most of the time you can set OUT to be the same dataset specified in DATA. PROC RANK writes no output
PROGRAM 15 LIBNAME class ‘C:\SAS_Files'; DATA wtchol; SET class.tomhs (KEEP=ptid wtbl wt12 cholbl chol12); wtchg = wt12 - wtbl; cholchg = chol12 - cholbl; RUN; *This PROC will add a new variable to dataset which is the tertile of weight change. The new variable will be 0,1,or 2; PROCRANKDATA = wtchol GROUPS=3OUT = wtchol; VAR wtchg; RANKS twtchg; Name of new variable
PARTIAL LOG 8 DATA wtchol; 9 SET class.tomhsp (KEEP=ptid clinic sex wtbl wt12 cholbl chol12); 10 wtchg = wt12 - wtbl; 11 cholchg = chol12 - cholbl; 12 RUN; NOTE: There were 100 observations read from the data set CLASS.TOMHSP. NOTE: The data set WORK.WTCHOL has 100 observations and 9 variables. PROC RANK DATA = wtchol GROUPS=3 OUT = wtchol; 20 VAR wtchg; RANKS twtchg; 21 RUN; NOTE: The data set WORK.WTCHOL has 100 observations and 10 variables.
PROCFREQDATA = wtchol; TABLES twtchg; RUN; OUTPUT: Rank for Variable wtchg Cumulative Cumulative twtchg Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 31 33.70 31 33.70 1 30 32.61 61 66.30 2 31 33.70 92 100.00 Frequency Missing = 8
PROCPRINTDATA = wtchol (obs=20); VAR ptid wtchg twtchg; TITLE'Partial Listing of Datset wtchol with new variable added'; RUN; Partial Listing of Datset wtchol with new variable added Obs PTID wtchg twtchg 1 A00083 -12.00 1 2 A00301 . . 3 A00312 -9.50 1 4 A00354 -21.00 0 5 A00400 . . 6 A00504 -9.25 1 7 A00608 . . 8 A00720 -18.50 0 9 A00762 -5.25 2 10 A00811 -6.75 1
PROCMEANSN MEAN MIN MAXMAXDEC=2; VAR cholchg wtchg; CLASS twtchg; TITLE'Mean Cholesterol Change by Tertile of Weight Change'; RUN;
Mean Cholesterol Change by Tertile of Weight Change The MEANS Procedure Rank for Variable N wtchg Obs Variable N Mean Minimum Maximum -------------------------------------------------------------------------- 0 31 cholchg 30 -13.43 -55.00 47.00 wtchg 31 -22.51 -36.50 -14.30 1 30 cholchg 30 -4.70 -37.00 26.00 wtchg 30 -10.21 -14.00 -6.80 2 31 cholchg 31 -0.74 -52.00 45.00 wtchg 31 -1.82 -6.50 13.00 -------------------------------------------------------------------------- Could graph this data in an x-y plot (3 points) Cutpoints for tertiles
TABLE GENERATION: PROC TABULATE (dbp12 sbp12)*(N MEAN*f=8.1) GROUP ALL
PROCTABULATEDATA=class.tomhs FORMAT=8.0; CLASS group; VAR sbp12 dbp12; TABLES group ALL='Total', (dbp12 sbp12)*(N MEAN*f=8.1)/RTS=20; LABEL dbp12 = 'Diastolic BP'; LABEL sbp12 = 'Systolic BP'; LABEL group = 'RX Group'; FORMAT group fgroup.; TITLE'Average Blood Pressure at 12-Months'; RUN; Same as PROC MEANS
Closer Look At TABLES Statement TABLES group ALL='Total', (dbp12 sbp12)*(N MEAN*f=8.1)/RTS=20; Statement before comma indicates row information to display Statement after comma indicates column information to display A * indicates to crosstabulate data A space indicates to concatenate data Words: For each group and the total display the N and mean of diastolic and systolic BP
(sex=' ')*(N ROWPCTN*f=10.1) CLINIC ALL
PROCTABULATEDATA=class.tomhsp FORMAT=8.; CLASS clinic sex; TABLE (clinic ALL='Total'), (sex=' ')*(N ROWPCTN*f=10.1)/RTS=15; FORMAT sex sex. clinic $clinic.; LABEL clinic = 'Clinical Center'; KEYLABEL ROWPCTN = 'Percent'; TITLE'N and Percent Men and Women Enrolled by Center'; RUN; ODS HTML FILE = ‘mytable.html’;