790 likes | 1.8k Views
Introduction to the SAS ® System’s PROC FORMAT . Ben Cochran The Bedford Group. A SAS Institute Alliance Partner bedfordgroup@nc.rr.com www.bedford-group.com. Introduction . A format is a set of instructions to the SAS system about how to WRITE or DISPLAY a stored value.
E N D
Introduction to the SAS® System’s PROC FORMAT Ben Cochran The Bedford Group A SAS Institute Alliance Partner bedfordgroup@nc.rr.com www.bedford-group.com
Introduction A format is a set of instructions to the SAS system about how to WRITE or DISPLAY a stored value. For example, the number 9876 can be written several ways: There are dozens of default formats that ship with the SAS system. Or, you can create your own with PROC FORMAT. more 2
Introduction Use PROC FORMAT to: • create tables that store coded values and the definitions of the codes • reference these user-created formats when a table lookup operation is . needed. PROC FORMAT can be used to create: • value formats descriptive labels for numeric or character values • picture formats numeric templates such as 999-99-9999 for SSNs • informats directions to SAS on how to READ values User defined FORMATs and INFORMATS: • are stored in SAS catalogs • can be temporary or permanent more 3
Introduction • creates user - defined formats • can create user - defined informats • does not generate output PROC FORMAT: Typical Syntax of the FORMAT Procedure : proc format options; value name options range1 = ‘label1’ range2=‘label2’ ; run ; Note: Formats can be used on a PUT or FORMAT statement, or with the PUT function. Examples 4
Value Statement • single numbers - valueanswer1 =‘ Yes ’ 2=‘ No ’; • range of numbers - value gfmt low – 10 = ‘ Group 1 ’ . 11 – 20 =‘ Group 2 ’. 21 – high = ‘ Group 3 ’; • several values - value$state‘APEX’,’CARY’,’RALEIGH’=‘NC’.‘ATLANTA’,’SAVANNAH’= ‘GA’; • other -value sexfmt 1 = ‘ FEMALE ’ . 2 = ‘ MALE ’. other = ‘ ?‘ ; VALUE statement examples : example 5
The VALUE Statement Create a format for monthly values : print step 6
The VALUE Statement Generate a report to show the unformatted values of MONTH. Notice the unformatted values of MONTH. formatted 7
The VALUE Statement Recall the program and format the values of MONTH. Notice the formatted values of MONTH. 8
Examine SASHELP.PM proc freq data=sasuser.pm; run; Partial PROC FREQ output… 9
Examining SASHELP.PM Partial PROC FREQ output… Note: There is also a variable REGION that has 2 values: NORTH AMERICA, and OTHER. Not all variables are appropriate to be processed with the FREQ procedure. 10
PROC FREQ proc freq data = sasuser.pm ; tables region * code / nofreq; run ; Create a TWO WAY report from CODE and REGION. Notice the values of CODE and how they are formatted in this report. 11
PROC FREQ Task: Use PROC FORMAT to help reformat the report so that the values of CODE ‘ fit ’ into the report. This is an example of formatting character values. proc format ; value $cdfmt ‘SECOND DAY’= ‘2nd DAY ’ ‘THIRD DAY’ = ‘3rd DAY ’ ‘NEXT WEEK’ = ‘Next Week’ ; proc freq data=sasuser.pm; format code $cdfmt. ; tables region * code / nofreq ; run ; more 12
PROC FREQ Formatting Numeric Values Task: Use PROC FORMAT to format the values of a numeric variable. proc format; value incfmt LOW - 1000 = ‘Small ’ 1001 - 5000 = ‘Medium’ 5001 - HIGH = ‘Large ’ ; proc freq data=sasuser.pm; tables region * income / norow nocol nopercent ; format incomeincfmt. ; run ; 13
FUZZ Factor Demonstrate the use of the FUZZ option on the VALUE statement. 14
The PICTURE Statement You can also create formats with the PICTURE statement. The format (picture) names: • are valid SAS names (begin with a letter or underscore, 8 characters or . less ) • are unique (cannot be the same as a SAS format ) . Pictures: • are valid for numeric variables only • are a sequence of characters in quotes that specify how a number is to . be formatted – like a template • are created with 3 types of characters • 0 : defines positions for digits (leading zeros not printed) • 1 – 9 : defines positions for digits includes leading zeros • non – numeric : message characters (printed after numeric digits) Example: picture combin low – high = ’99-99-99’ ; more 15
The PICTURE Statement Demonstrate the PICTURE statement. more Why isn’t there a ‘$’ for the negative values of MONEY ? 16
The PICTURE Statement Enhance the program to add a PREFIX to negative values. more 17
The INVALUE Statement Create an INFORMAT with the INVALUE statement: more 18
Creating Permanent Formats You can store a format in a permanent catalog so you won’t have to re- create a format every time a program is run. The LIBRARY= option in the PROC FORMAT statement specifies the storage location. The typical form of a PROC FORMAT statement is: proc format library =libref < .catalog > ; If the LIBRARY = option: • is missing, formats are stored in the WORK.FORMATS catalog • specifies only a libref, formats are stored in libref.FORMATS catalog • specifies libref.catalog, formats are stored in that catalog. Write access to the libref data library is required to create formats. Without the LIBRARY = option, the formats are stored temporarily. more 19
Creating Permanent Formats The typical form of a PROC FORMAT step is: proc format library = libref < .catalog > ; value$char- format‘value1’ = ‘formatted - value – 1’ ‘value2’ = ‘formatted – value –2’ ‘value3’ = ‘formatted – value –3’; value num-format value = ‘formatted – value –1’ value = ‘formatted – value– 2’ value= ‘formatted – value– 3’; run; Character formats: • have names that are valid SAS names, except they begin with a ‘$’ and can . NOT end in a number • have quoted values • can only be associated with a character variable Numericformats: • have names that are valid SAS names but can NOT end in a number • can only be associated with a numeric variable more 20
Creating Permanent Formats Task: Create both a Character and a Numeric format and store them permanently in the SAS_3 library: Notice the use of the keywords: other, low, and high. • other - a catch-all category • low - the lowest value of the variable with which this format will be associated • high - the highest value of the variable with which this format will be associated more 22
Creating Permanent Formats Looking at the log verifies the storage location as SAS_3.FORMATS. These two formats ( $reg_fmt and amt_fmt) are stored in the FORMATS catalog of the SAS_3 library. The formats cannot be edited, so save the source code that is used to create the formats. more 23
Using Permanent Formats Formats are referenced in: • FORMAT statements • PUT statements • PUT functions When a user defined format is referenced, SAS: • loads the format into memory from the catalog entry • performs a binary search on values in the table to execute a lookup • returns a single result for each lookup operation To help SAS find your user defined format quickly, use the FMTSEARCH= option. The typical form of the FMTSEARCH= option is: options fmtsearch=(item-1, item-2, item-3… item-n); item is either libref or libref.catalog example 24
Using Permanent Formats Task: Use the FMTSEARCH option to tell SAS where to search for the user defined formats that are to be used in the PROC FORMAT step. Notice: • the OPTIONS statement • the FORMAT statement • the options on the TABLE statement What kind of variable is STORE? How many values does it have? What kind of variable is AMOUNT? How many values does it have? output 25
Using Permanent Formats PROC FREQ Output... PUT 26
Using Permanent Formats Using the PUT Function You can use the PUT function to return the formatted value of a variable for use in an expression of assignment statement. The typical form of the PUT function: . . . put (argument, format) . . . The PUT funtion: • always returns a character string • writes the value defined ( argument ) using the format specified • aligns the result depending on the type ( $ or num.) of format • requires that argument and format agree in type • can be used in a WHERE statement or subsetting IF to produce a subset . based on the formatted value. output 27
Using Permanent Formats Using the PUT Function Illustrate how the PUT function works in an assignment statement. store region $ 4. $ 13. 850 Western region = put ( store, $regfmt.); The length of the variable REGION is determined by the default length of the character string returned by the format $REGFMT. In many cases, expecially if a permanent format exist, the PUT function requires less coding than IF – THEN / ELSE statements. example 28
Using Permanent Formats Task: Use the PUT function to subset data, as well as creating a new variable. Note: In this example, a variable to store the values of REGION was NOT created. The logic in the WHERE statement is represented in the following two statements: region = put(store, $reg_fmt); where region= ‘Midwest’; In many cases, especially if a permanent format exist, the PUT function requires less coding than IF – THEN / ELSE statements. output 29
Using Permanent Formats Task: Print the first 22 observations of the MIDWEST data set. example 30
Creating Formats From SAS Data Sets You can create a format from a SAS data set. This is referred to as a CONTROL DATA SET, and it is read in to a format using the CNTLIN = option in PROC FORMAT. The typical form of the CNTLIN option is: proc format library = libref.catalog CNTLIN=SAS data set; run; The CNTLIN data set: • must contain the variables: FMTNAME, START, and LABEL • must contain the variable TYPE for character formats, unless the value for . FMTNAME begins with a ‘$’. • does not require a TYPE variable for a numeric format • can be used to create new formats, as well as edit existing ones • must be sorted by FMTNAME if multiple formats are specified. example 31
Creating Formats From SAS Data Sets The STORES data set will be used as a CNTLIN data set to create a permanent format in the SAS_3.FORMATS catalog. Notice the names of the variables. example 32
Creating Formats From SAS Data Sets Manipulate the STORES data set to make it suitable as a CNTLIN data set. Then write the PROC FORMAT step to create the permanent format. data stores; set sas_3.stores(rename=(store=start store_name=label)); fmtname = ‘strfmt’; type=‘C’; run; proc format library = sas_3.formats CNTLIN=stores ; run; Notice the use of the RENAME= option on the SET statement. Also note the name of the format. log 33
Creating Formats From SAS Data Sets An examination of the log reveals that the $STRFMT format was created and is stored permanently in the SAS_3.FORMATS catalog. log 34
Documenting Formats To obtain more information about formats, use the FMTLIB option in the PROC FORMAT statement. proc format library = sas_3 FMTLIB; run; log 35
Documenting Formats SAS Output continued…. By default, documentation on all formats is displayed. To control which format gets displayed, use a SELECT statement with the format name. edit 36
Maintaining Formats One way to maintain permanent formats is to use Control Data Sets with PROC FORMAT. FMT_ DATA Formats Catalog proc format library = sas_3.formats CNTLOUT = fmt_data; run; EDITING PROCESS Formats Catalog FMT_ DATA proc format library = sas_3.formats CNTLIN = fmt_data; run; • Use the CNTLOUT option to create an output dataset that has info on all formats, • Use PROC FSEDIT, or VIEWTABLE to add, delete or change information • Use the CNTLIN option to re-create the formats from the modified data set. example 37
Maintaining Formats The Atlanta Eastside store has decided to move to Miami. Although their location has changed, their store id number remains the same: 200. Task: Change the $STRFMT format so that store 200 has a location of Miami. proc format library = sas_3.formats CNTLOUT = fmt_data; select $strfmt; run; proc print data = fmt_data(obs=7); run; example 38
Maintaining Formats The first 7 observations and first 16 variables of the FMT_DATA data set. The objective is to change the value of ‘Atlanta Eastside’ to ‘Miami’ in observation 4. Partial PROC PRINT output PROC FSVIEW will be used to make these changes, but first, examine PROC CONTENTS output on the next page…. more 39
Maintaining Formats PROC CONTENTS output… more 40
Maintaining Formats To edit the data set, submit the following statements: proc fsview data = fmt_data; run; Next, issue the ‘ Modify Member ‘ command to open the window in Edit mode. The ‘E’ indicates we are in Edit mode. Next, move the cursor over to the LABEL column of the 4th row and type ‘ Miami ‘ over ‘ Atlanta Eastside ‘. Then, ‘end’ out of the window to save the changes. more 41
Maintaining Formats Next, submit the following statements: procformatlibrary = sas_3 cntlin=fmt_data; select $strfmt; run; The following report appears in the Output window… notice line 4… more 42
Advantages of Formats Advantages of using formats for table lookups: • formats can be used in PROC steps, frequently eliminating the need to . preprocess data in a DATA step. • values to be looked up can be discrete values, one of a list of values, or a range . of values. • DATA step lookups using the PUT function with a format require less CPU time . than a comparable MERGE step. • by default, a binary search is used to locate the value to be looked up. Disadvantages of using formats for table lookups: • each format can return only one value as the result of a lookup operation. • the values of only one variable are used to perform the lookup operation. • since the entire format must be loaded into memory, the size of the format is . limited by amount of available memory. • only one format can be applied to each variable using the FORMAT statement. 5.4 43
V8 Enhancements The new enhancement to PROC FORMAT is the MULTILABLE option on the VALUE statement. This option allows multiple values for a given range or overlapping ranges. What this means, is that a variable’s value can be in two or more different ranges. The SASUSER.HOUSES data set is used to illustrate the new features of PROC FORMAT. Notice the number of observations. Notice the values of PRICE. 44
V8 Enhancements Notice the placement of the MULTILABEL option on the VALUE statement in PROC FORMAT. Notice the use of the MLF option on the CLASS statement in PROC TABULATE. output 45
V8 Enhancements How many observations in the SASUSER.HOUSES dataset? Add up the columns and compare that amount to the FREQUENCY cell. What is going on? PRINT 46
V8 Enhancements Task: Illustrate how the multilabel optionworks. Create a new data set named houses with a new variable called PRICE2.Format the PRICE variable and leave PRICE2 unformatted. Use PROC PRINT to view the results. MEANS 47
Version 8 Enhancements Date formats can now have users specified separators. The following are used to specify the separator : • B separates with a blank • C “ “ “ colon • D “ “ “ dash • P “ ““ period • S “ “ “ slash (default) • N indicates NO separator. • ex.mmddyyd10. 12-31-1999 more 48
The End Any Questions?????????? more 49