280 likes | 447 Views
Lesson 6 - Topics. Formatting Output Working with Dates Programs 7-9 in course notes Reading: LSB:3:8-9; 4:1,5-7; 5:1-4. Annotating SAS Output. TITLE statements - label procedure output LABEL statements - label names of variables FORMAT statements - label values of variables.
E N D
Lesson 6 - Topics • Formatting Output • Working with Dates • Programs 7-9 in course notes • Reading: LSB:3:8-9; 4:1,5-7; 5:1-4
Annotating SAS Output • TITLE statements - label procedure output • LABEL statements - label names of variables • FORMAT statements - label values of variables
Standard Output The FREQ Procedure Cumulative Cumulative clinic Frequency Percent Frequency Percent ----------------------------------------------------------- A 18 18.00 18 18.00 B 29 29.00 47 47.00 C 36 36.00 83 83.00 D 17 17.00 100 100.00 Annotated Output Number of Patients by Clinic The FREQ Procedure Clinical Center Cumulative Cumulative clinic Frequency Percent Frequency Percent ---------------------------------------------------------------- Birmingham 18 18.00 18 18.00 Chicago 29 29.00 47 47.00 Minneapolis 36 36.00 83 83.00 Pittsburgh 17 17.00 100 100.00 TITLE LABEL for clinic FORMAT for clinic
Standard Output The FREQ Procedure Cumulative Cumulative sebl_6 Frequency Percent Frequency Percent ----------------------------------------------------------- 1 70 70.00 70 70.00 2 23 23.00 93 93.00 3 6 6.00 99 99.00 4 1 1.00 100 100.00 Annotated Output The FREQ Procedure Patient Report Headaches Cumulative Cumulative sebl_6 Frequency Percent Frequency Percent ------------------------------------------------------------- None 70 70.00 70 70.00 Mild 23 23.00 93 93.00 Moderate 6 6.00 99 99.00 Severe 1 1.00 100 100.00 LABEL for sebl_6 FORMAT for sebl_6
TITLE STATEMENTS • PROCFREQDATA=tdata; • TABLES clinic group sex educ sebl_1 sebl_6; • TITLE'Distribution of Selected Variables'; • TITLE2'on the TOMHS Dataset' ; • RUN; • TITLE statements can go anywhere in the program. Good practice to put under PROC • Can change the titles at any time • TITLE(n)'text' is general syntax
Label Statements LABEL clinic = 'Clinical Center'; LABEL group = 'Drug Treatment Group'; LABEL educ = 'Highest Education Attained'; LABEL sebl_1 = 'Patient Report Drowsiness'; LABEL sebl_6 = 'Patient Report Headaches'; Label statements can go anywhere in the datastep or under a procedure (But not in-between!)
Format Statements • FORMAT brthdate mmddyy10. ; • FORMAT group groupF. ; • FORMAT fever headache seF. ; • FORMAT clinic $clinicF. ; • Tells SAS to display the values of the variable according to the format. • Format statements can go anywhere in the datastep or in a procedure • There are build in formats (e.g. dates) and user defined formats. • A format can apply to more than one variable. • Formats end with a period (.) • Character formats begin with a $
How to Make User Defined FORMATS PROCFORMAT; VALUE groupF 1 = 'Beta Blocker' 2 = 'Calcium Channel Blocker' 3 = 'Diuretic' 4 = 'Alpha Blocker' 5 = 'ACE Inhibitor' 6 = 'Placebo'; VALUE genderF 1 = 'Men'2='Women' ; VALUE seF 1 = 'None'2 = 'Mild’ 3 = 'Moderate'4 = 'Severe'; The format name does NOT have to be the name of a variable on the dataset. It cannot end in a number. Name of format
PROCFORMAT; VALUE $clinicF 'A' = 'Birmingham' 'B' = 'Chicago' 'C' = 'Minneapolis' 'D' = 'Pittsburgh' ; Don't confuse the format with the variable(s) to be formatted! From PROC FORMAT alone SAS does not know which variables you plan to format with the given format. You need to apply format to the variable using the format statement
LOG FILE PROC FORMAT; 7 VALUE groupF 1 = 'Beta Blocker' 2 = 'Calcium Channel Blocker' 8 3 = 'Diuretic' 4 = 'Alpha Blocker' 9 5= 'ACE Inhibitor' 6 = 'Placebo'; NOTE: Format GROUPF has been output. 10 11 VALUE gender 1 = 'Men' 2='Women' ; NOTE: Format GENDERF has been output. 12 13 VALUE educ 1 = '8th grade or less' 2 = 'Trade school - no HS' 14 3 = 'Some high school' 4 = 'HS graduate' 15 5 = 'Trade school after HS' 6 = 'Some College' 16 7 = 'Bachelor degree' 8 = 'Some post grad' 17 9 = 'Graduate degree' ; NOTE: Format EDUCF has been output. 18 19 VALUE se 1 = 'None' 2 = 'Mild' 3 = 'Moderate' 4 = 'Severe'; NOTE: Format SE has been output. 20 21 VALUE smoke 1 = 'Smoker' 2 = 'Non-smoker'; NOTE: Format SMOKEF has been output. 22 23 VALUE $clinic 'A' = 'Birmingham' 'B' = 'Chicago' 24 'C' = 'Minneapolis' 'D' = 'Pittsburgh' ; NOTE: Format $CLINICF has been output. 25 26 run;
* Formats defined but not applied; PROCFREQ; TABLES clinic sebl_6; RUN; ========================================== * Applying the formats ; PROCFREQ; TABLES clinic sebl_6; FORMAT clinic $clinicF. sebl_6 seF. ; RUN;
Program 8 PROC FORMAT; ... DATA tdata ; INFILE‘C:\SAS_Files\tomhs.data' ; INPUT @ 1 ptid $10. @ 12 clinic $1. @ 25 group 1. @ 30 sex 1. @ 49 educ 1. @ 51 eversmk 2. @230 alcbl 1. @236 sebl_1 1. @246 sebl_6 1. ; LABEL clinic = 'Clinical Center'; LABEL group = 'Drug Treatment Group'; LABEL educ = 'Highest Education Attained'; LABEL sebl_1 = 'Patient Report Drowsiness'; LABEL sebl_6 = 'Patient Report Headaches'; LABEL alcbl = 'Alcoholic Drinks Per Week'; LABEL eversmk = 'Ever Smoke Cigarettes'; PROCFREQ DATA=tdata; TABLES clinic sebl_6; FORMAT clinic $clinicF. sebl_6 seF. ;
Label in Wrong Spot DATA tdata ; INFILE‘C:\SAS_Files\tomhs.data' ; INPUT @ 12 clinic $1. ; RUN; LABEL clinic = 'Clinical Center'; PROCFREQ DATA=tdata; TABLES clinic sebl_6; RUN; Statement is between data step and proc. This would cause an error. ERROR 180-322: Statement is not valid or it is used out of proper order.
PROCMEANSDATA=tdata NMEANSTD; VAR alcbl; CLASS eversmk; ; FORMAT eversmk smokeF. ; TITLE'PROC MEANS With Variable and Value Labels'; RUN; The MEANS Procedure Analysis Variable : alcbl Alcoholic Drinks Per Week Ever Smoke N Cigarettes Obs N Mean Std Dev ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Smoker 48 47 5.3829787 6.4892995 Non-smoker 52 52 3.5384615 4.6292401 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Items to Remember • Formats need to be defined before you use them (PROC FORMAT). • Formats are applied by using the FORMAT statement. • Label and format statements in the datastep apply to all subsequent PROCs • Label and format statements under a PROC apply only to that PROC
Creating HTML Tables for Web or Word/Excel ODSHTMLFILE='educ.html'; PROCFREQDATA=tdata ; TABLES educ; FORMAT educ educ.; TITLE'HTML Output From PROC FREQ'; RUN; ODSHTMLCLOSE; The output will go to the html file. This can then be viewed on the web or inserted into word. Can also create several other types of files such as RTF and PDF. Note: Run statement is needed here, and ODS statement must be after run.
Working With Dates:Dates Come in Many Ways • 10/18/04 • 18/10/04 • 10/18/2004 • 18OCT2004 • 101804 • October 18, 2004 Need to know how to read-in dates and then work with them
What do you want to do with dates? • Display them • Compare two dates: find the number of days between 2 dates ndays = date2 - date1; Will this work? Problem: dates do not subtract well What if: date2 = 03/02/2003 date1 = 08/02/2002 ========== -05/00/0001
DATA dates; INFILE DATALINES; INPUT @1 brthdate mmddyy10.; DATALINES; 03/03/1971 02/14/1956 01/01/1960 ; PROCPRINT; VAR brthdate; PROCPRINT; VAR brthdate; FORMAT brthdate mmddyy10.; ------------------------------------------------------ Obs brthdate 1 4079 2 -1417 3 0 Obs brthdate 1 03/03/1971 2 02/14/1956 3 01/01/1960 Jan 1, 1960
When you read in a variable with a date informat • SAS makes the variable numeric • SAS assigns the numeric value relative to • January 1, 1960 • This makes it easy to subtract two dates to get the number of days between the dates. • dayselapsed = date2 – date1; • FORMAT date1 date2 mmddyy10.; • Note: Once read in SAS treats the variable as it does any numeric variable.
* Program 9 ; DATA age; INFILE‘C:\SAS_Files\tomhs.data' ; INPUT @14 randdate mmddyy10. @34 brthdate mmddyy10. @74 date12 mmddyy10. ; agedays = randdate - brthdate ; ageyrs = (randdate - brthdate)/365.25; ageint = INT( (randdate - brthdate)/365.25); * Can also use YRDIF function; ageyrsX = yrdif(brthdate,randdate,'Actual'); agetoday= (TODAY() - brthdate)/365.25 ; ageendst= (MDY(02,28,1992) - brthdate)/365.25; daysv12 = date12 - randdate; if ABS(daysv12 - 365) = .then window12 = .; else if ABS(daysv12 - 365) < 31then window12 = 1; else if ABS(daysv12 - 365) >= 31then window12 = 2; yrrand = YEAR(randdate);
PROCPRINTDATA=age (obs=10); VAR brthdate randdate agedays ageyrs ageyrsX ageint agetoday; TITLE'Printing Dates Without a Date Format'; RUN; PROCPRINTDATA=age (obs=10); VAR brthdate randdate agedays ageyrs ageyrsX ageint agetoday; FORMAT brthdate mmddyy10. randdate mmddyy10.; TITLE'Printing Dates With a Date Format'; RUN;
Printing Dates Without a Date Format Obs brthdate randdate agedays ageyrs ageyrsX ageint agetoday 1 -8589 10175 18764 51.3730 51.3739 51 69.0678 2 -6880 10239 17119 46.8693 46.8711 46 64.3888 3 -12572 10002 22574 61.8042 61.8055 61 79.9726 4 -9592 10175 19767 54.1191 54.1205 54 71.8138 5 -12996 10280 23276 63.7262 63.7268 63 81.1335 All before 1960
Printing Dates With a Date Format Obs brthdate randdate 1 06/26/1936 11/10/1987 2 03/01/1941 01/13/1988 3 07/31/1925 05/21/1987 4 09/27/1933 11/10/1987 5 06/02/1924 02/23/1988 Section 3.9 LSB lists several date formats and informats
PROCPRINTDATA=age (OBS=20); VAR randdate date12 daysv12 window12; FORMAT randdate date12 mmddyy8.; TITLE'Printing Days From Randomization to 1st Year Visit'; RUN; PROCFREQDATA=age; TABLES yrrand ; ; TITLE'Frequency Distribution of Year Randomized'; RUN;
Obs randdate date12 daysv12 window12 1 11/10/87 11/25/88 381 1 2 01/13/88 01/09/89 362 1 3 05/21/87 . . . 4 11/10/87 11/30/88 386 1 5 02/23/88 02/13/89 356 1 6 11/12/87 11/02/88 356 1 7 12/05/86 12/03/87 363 1 8 06/12/87 06/16/88 370 1 9 01/21/88 01/09/89 354 1 10 04/16/87 04/04/88 354 1 11 08/12/87 08/10/88 364 1 12 04/16/87 05/02/88 382 1 13 02/02/88 02/08/89 372 1 14 11/04/86 11/30/87 391 1 15 05/27/87 06/08/88 378 1 16 03/29/88 07/13/89 471 2
Frequency Distribution of Year Randomized The FREQ Procedure Cumulative Cumulative yrrand Frequency Percent Frequency Percent ----------------------------------------------------------- 1986 9 9.00 9 9.00 1987 65 65.00 74 74.00 1988 26 26.00 100 100.00