470 likes | 805 Views
This is the Modern World: Simple, Overlooked SAS® Enhancements. Bruce Gilsen Federal Reserve Board June 2009. Introduction. Smaller, less dramatic enhancements fall thru cracks at Federal Reserve (SAS consultant for 24 years) SAS-L Conferences Users still use more cumbersome methods
E N D
This is the Modern World: Simple, Overlooked SAS® Enhancements Bruce Gilsen Federal Reserve Board June 2009
Introduction • Smaller, less dramatic enhancements fall thru cracks at • Federal Reserve (SAS consultant for 24 years) • SAS-L • Conferences • Users still use more cumbersome methods • Enhancements from V92, V9, V8, V7, even V6! • Impetus: yymmddn8. in earlier paper
Underutilized enhancements • Write date values in form yyyymmdd • Increment date values w/INTNX function • Transport files: PROC CPORT/CIMPORT vs PROC COPY w/EXPORT engine • Count # times character or substring occurs in a character string • Concatenate character strings • Sort by numeric part of character values • Retrieve DB2 data on z/OS mainframes
1. Write date values in form yyyymmdd • Lots of informats, formats to read/write date values • 9.1.3: 22 date informats, 46 date formats • Most common input form: yyyymmdd • YYMMDD8. informat reads dates in form yyyymmdd • Most common output form (overwhelmingly): yyyymmdd • YYMMDD8. format writes dates as yy-mm-dd • YYMMDD10. format writes dates as yyyy-mm-dd
Writing dates as yyyymmdd, pre-V8 • Some date output • data one; • date1 = "18oct2008"d; * 10/18/2008 date value; • put date1 yymmdd8.; * writes 08-10-18 ; • put date1 yymmdd10.; * writes 2008-10-18 ; • Before Version 8 • PUT: character string of form yyyy-mm-dd • COMPRESS: remove dashes • datetemp = compress(put(date1,yymmdd10.),'-'); • put datetemp $8. ; * writes 20081018 ;
Writing dates as yyyymmdd, V8 • New format, YYMMDDxw • x: optional separator • w: number of bytes • x, one of: B blank N none C colon P period D dash S slash • date1 = "18oct2008"d; * 10/18/2008 date value; • put date1 yymmddn8.; * writes 20081018 ;
YYMMDDxw format, examples • data _null_ ; • date1 = "18oct2008"d; * 10/18/08 • date value ; • put date1 yymmddb8.; * 08 10 18 ; • put date1 yymmddc8.; * 08:10:18 ; • put date1 yymmddd10.; * 2008-10-18 ; • put date1 yymmddn8.; * 20081018 ; • put date1 yymmddp8.; * 08.10.18 ; • put date1 yymmdds10.; * 2008/10/18 ; • put date1 yymmddn6.; * 081018 ;
2. Increment date values with INTNX • INTNX returns SAS ® date value incremented by number of intervals (days, months, quarters, years,…) • SAS date value: integer, number of days before or after January 1, 1960 • December 31, 1959 -1 • January 1, 1960 0 • January 2, 1960 1 • ... • October 18, 2008 16,727
INTNX Syntax (informal, incomplete) INTNX(interval, start-from, increment <,alignment>) • interval: unit (years, months, days,...) start-from is incremented by • start-from: SAS date value to increment • increment: # of intervals start-from is incremented by • alignment: where start-from aligned after incrementing. Values: BEGINNING, MIDDLE, END, SAMEDAY (v9). Default: BEGINNING.
Before Version 9: 2 INTNX issues • Alignment BEGINNING (start of period) • Leap year handling
INTNX issue 1: alignment to start of period • Increment by 2 days, months, years • Unexpected result for months, years • data one; • date1 = ‘18oct2008'd; • dateplus2day = intnx(‘day',date1,2); • dateplus2month = intnx(‘month',date1,2); • dateplus2year = intnx(‘year',date1,2); • run; • Variable Description Expect Actual • dateplus2day 10/18/08+2 days 10/20/08 10/20/08 • dateplus2month 10/18/08+2 months 12/18/08 12/1/08 • dateplus2year 10/18/08+2 years 10/18/10 1/1/10
dateplus2day = intnx(‘day',date1,2); • date1 = ‘18oct2008'd; • = 17,823, date value for 10/18/2008 • Increment by 2 days • First, increment by 2 days • Then, align to start of 10/20/2008 (beginning of interval, DAY), no effect • DATEPLUS2DAY = 17,825, date value for 10/20/2008 as expected
dateplus2month = intnx(‘month',date1,2); • date1 = ‘18oct2008'd; • = 17,823, date value for 10/18/2008 • Increment by 2 months • First, increment by 2 months • Then, align to start of 12/2008 (beginning of interval, MONTH) • DATEPLUS2MONTH = 17,867, date value for 12/1/2008
dateplus2year = intnx(‘year',date1,2); • date1 = ‘18oct2008'd; • = 17,823, date value for 10/18/2008 • Increment by 2 years • First, increment by 2 years • Then, align to start of 2010 (beginning of interval, YEAR) • DATEPLUS2MONTH = 18,263, date value for 1/1/2010
INTNX issue 2: leap years • Example of shifting by years • offset = 2; * number of years to increment; • if day(date1) = 29 and month(date1) = 2 and mod(offset,4) ne 0 and mod(offset, 400) ne 0 • then date1 = mdy(2, 28, year(date1)+offset); • else date1 = mdy(month(date1), day(date1), • year(date1)+offset) ; • Need more generalized code for all frequencies
Solution: SAMEDAY • New alignment value in V9 • Preserves date value’s alignment in interval after it’s incremented • No effect when interval = DAY • Use for all intervals except DAY
Solution: SAMEDAY • dateplus2day = intnx(‘day’, date1, 2, “sameday”) • 2 days after 10/18/2008 • Result: 17,825, SAS date value for 10/20/2008 • dateplus2month = intnx(‘month’, date1, 2, “sameday”) • 2 months after 10/18/2008 • Result: 17,884, SAS date value for 12/18/2008 • dateplus2year = intnx(‘year’, date1, 2, “sameday”) • 2 years after 10/18/2008 • Result: 18,553, SAS date value for 10/18/2010
Some interesting dates • date2 = intnx(‘year’, ’29feb2000’d, 1, “sameday”) • 1 year after 2/29/2000 • Result: 15,034, SAS date value for 2/28/2001 • date3 = intnx(‘year’, ’29feb2000’d, 4, “sameday”) • 4 years after 2/29/2000 • Result: 16,130, SAS date value for 2/29/2004 • 2000, 2004 = leap years, 2001 not leap year
Some interesting dates • date4 = intnx(‘month’, ’31mar2003’d, -1, “sameday”) • 1 month before 3/31/2003 • Result: 15,764, SAS date value for 2/28/2003 • date5 = intnx(‘month’, ’31mar2004’d, -1, “sameday”) • 1 month before 3/31/2004 • Result: 16,130, SAS date value for 2/29/2004 • 2000, 2004 = leap years, 2003 not leap year
SAMEDAY until 9.2: single, non-shifted date intervals only • Following intervals might return wrong answer, no error or warning • multiple (e.g., month2 = two-month interval) • shifted (e.g., month.4 = month interval starting on April 1) • time • datetime • Interval = DAY, WEEK, WEEKDAY, TENDAY, SEMIMONTH, MONTH, QTR, SEMIYEAR, YEAR okay • Fixed in 9.2 • http://support.sas.com/techsup/unotes/SN/016/016184.html
3. Transport files: PROC CPORT/CIMPORT vs PROC COPY w/EXPORT engine • Transport file • File organized in machine-independent format • SAS can read on any operating system • Use to copy SAS data sets from one operating system to another • Starting w/ Version 6, two types of transport files • XPORT-style transport files • CPORT-style transport files
XPORT-style transport files: 3 steps • 1. On original host, create transport file with data set(s) • libname inn 'directory_with_data_set(s)' ; • libname trans xport 'transport_file_to_create'; • proc copy in=inn out=trans ; run ; • 2. Copy transport file to destination host with ftp command or another method • 3. On destination host, transport file ---> SAS data set(s) • libname trans xport 'transport_file' ; • libname sasdata1 ‘directory_for_data_sets‘; • proc copy in=trans out=sasdata1 ; run ;
CPORT-style transport files: 3 steps • 1. On original host, create transport file with data set(s) • libname sasdata1 'directory_with_SAS_data_set(s)'; • filename trans 'transport_file_to_create'; • proc cport library=sasdata1 file=trans ; run ; • 2. Copy transport file to destination host with ftp command or another method • 3. On destination host, transport file ---> SAS data set(s) • filename trans 'transport_file' ; • libname sasdata1 ‘directory_for_data_sets' ; • proc cimport infile=trans library=sasdata1 ; run ;
Version 7 changes and transport files • Version 7 changes • Variable name maximum length: 8 32 • Character variable maximum length: 200 32768 • Variable names display based on “first reference” • Transport files • XPORT-style created in Version 6 format • CPORT-style understands Version 7 changes • Always use CPORT-style
XPORT-style transport files, Version 7 and beyond • options validvarname=v6; • Documented in V8, undocumented but works in V9 • Empty data set if any character variable length more than 200, error in log • Variable names longer than 8 characters truncated, unique • Variable names converted to upper case • Otherwise • Empty data set if any character variable length more than 200 or variable name longer than 8 characters, error in log • Variable names converted to upper case
4. Count number of times a character or substring occurs in a character string • Typical objective, count number of times • "ab" occurs in "ab1aba344444abb555ab4“ (4 times) • "a” or “b" occur in "ab1aba344444abb555ab4“ (10 times) • Pre-Version 9: iterative methods (loops) • Version 9 • COUNT: # times substring occurs in a character string • COUNTC: # times 1 or more characters occur in a character string
Count occurrences of substring "ab" • in "ab1aba344444abb555ab4" • data one; • char1 = "ab1aba344444abb555ab4"; * character variable to check; • * Before Version 9 ; • num_ab = 0; * counter for number • of occurrences ; • loc=index(char1,"ab"); * find 1st occurrence ; • do while (loc ne 0); • num_ab+1; * count occurrences of "ab" ; • char1 = substr(char1,loc+2); * resume search after "ab" ; • loc = index(char1, "ab"); * find next occurrence ; • end; • * Or, a single statement in Version 9 ; • num_ab = count(char1,"ab"); * Version 9 ; • run;
Count occurrences of characters "a" • or "b" in "ab1aba344444abb555ab4" • data one; • char1 = "ab1aba344444abb555ab4"; * character variable to check; • * Before Version 9 ; • num_a_or_b = 0; * counter for number • of occurrences ; • loc=indexc(char1,"ab"); * find 1st occurrence ; • do while (loc ne 0); • num_a_or_b+1; * count occurrences • of "a” or “b“ ; • char1 = substr(char1,loc+1); * resume search • after "a” or “b" ; • loc = indexc(char1, "ab"); * find next occurrence ; • end; • * Or, a single statement in Version 9 ; • num_a_or_b = countc(char1,"ab"); * Version 9 ; • run;
Countw: Version 9.2 • Counts number of words in a character string • Can also replace iterative programming
5. Concatenate character strings • Pre-Version 9 • || to concatenate • TRIM removes trailing blanks • TRIM(LEFT) removes leading and trailing blanks • Version 9, new functions concatenate character strings • CAT does not remove leading or trailing blanks • CATT removes trailing blanks • CATS removes leading and trailing blanks • CATX removes leading and trailing blanks, inserts delimiter (separator) between each argument • Note difference in length of result
data one; • length char1 char2 char3 $8.; • char1="xxxxxxxx"; • char2=" yyy"; * 1 leading blank ; • char3="zzzz"; • * Leading and trailing blanks included; • cat1 = char1 || char2 || char3; • cat1 = cat(char1, char2, char3); * V9 ; • * Remove trailing blanks ; • cat2 = trim(char1) || trim(char2) || trim(char3); • cat2 = catt(char1, char2, char3); * V9 ; • cat1=xxxxxxxx yyy zzzz • cat2=xxxxxxxx yyyzzzz
data one; • length char1 char2 char3 $8.; • char1="xxxxxxxx"; • char2=" yyy"; * 1 leading blank ; • char3="zzzz"; • * Remove leading and trailing blanks ; • cat3 = trim(left(char1)) || • trim(left(char2)) || • trim(left(char3)); • cat3 = cats(char1, char2, char3); • cat3=xxxxxxxxyyyzzzz
Important difference: length of result • data one; • length char1 char2 char3 $8.; • char1="xxxxxxxx"; • char2=" yyy"; * 1 leading blank ; • char3="zzzz"; • cat1 = char1 || char2 || char3; • cat1 = cat(char1, char2, char3); • etc. • If length of result (e.g., CAT1) not defined • ||: sum of length of contributing variables: 8+8+8=24 • CAT, CATT, CATS, CATX: 200
data one; • length char1 char2 char3 $8.; • char1="xxxxxxxx"; • char2=" yyy"; * 1 leading blank ; • char3="zzzz"; • aa = '123'; • * Delimiter: value of AA, • can vary by observation ; • char_catx1 = catx(aa, char1, char2); • * Delimiter: comma ; • char_catx2 = catx(',', char1, char2, char3); • * Delimiter: '123' ; • char_catx3 = catx('123', char1, char2, char3); • * Delimiter: single blank ; • char_catx4 = catx(' ', char1, char2, char3); • run; • char_catx1=xxxxxxxx123yyy • char_catx2=xxxxxxxx,yyy,zzzz • char_catx3=xxxxxxxx123yyy123zzzz • char_catx4=xxxxxxxx yyy zzzz
6. Sort by the numeric part of character values • Variable BANK: • bank2 bank100 bank1 bank10 • Conventional PROC SORT • left to right character by character comparison • "1" in "bank100" less than "2" in "bank2“ • Wanted: sort integer values w/in text by numeric value • Conventional sort order Desired sort order • bank1 bank1 • bank10 bank2 • bank100 bank10 • bank2 bank100
Before Version 9.2: one method • Make new data set with variable w/numeric part of BANK • Sort by new variable • This example assumes numeric part of value • always starts at same location (5th character) • at most three digits • More general case slightly more complicated • data two; • set one; • banktemp = input(substr(bank,5),3.); run; • proc sort data=two out=three; • by banktemp; run;
Version 9.2: linguistic collation • SORTSEQ=LINGUISTIC + NUMERIC_COLLATION sorts integer values w/in text by numeric part of value • Don’t create extra data set • proc sort data=one out=two • sortseq=linguistic (numeric_collation=on); • by bank; run; • Sort order for BANK: • bank1 • bank2 • bank10 • bank100
Numeric collation not just at end of character values • ADDRESS values • East 14th Street • East 2nd Street • Conventional sort: “1” less than “2” • Sortseq=linguistic (numeric_collation=on): 2 less than 14 • Conventional sort order Linguistic collation order • East 14th Street East 2nd Street • East 2nd Street East 14th Street
7. Retrieve DB2 data on z/OS mainframes • Old methods • PROC DB2EXT • PROC ACCESS • Modern methods • SAS/ACCESS LIBNAME statement • Pass-Through Facility
PROC DB2EXT • Since Version 6 (1990), undocumented but works • In Version 9, message • NOTE: Beginning with SAS version 10, DB2EXT will no longer be available. • “Grandfathered” to Version 6 • Long DB2 column names truncated to 8 characters • Algorithm ensures unique variable names
PROC ACCESS • Introduced in Version 6 (1990) • Use access and view descriptors • Documented and supported, not recommended • PROC CV2VIEW: convert access and view descriptors to SQL views • “Grandfathered” to Version 6 • Long DB2 column names truncated to 8 characters • Algorithm ensures unique variable names
SAS/ACCESS LIBNAME statement • Introduced in Version 7 • Assign libref directly to DB2 database • Easiest method • Long DB2 column names processed directly
Pass-through facility • Introduced in Version 6 • Retrieve DB2 data by sending DBMS-specific SQL statements to DB2 • DBMS = Database Management System • Long DB2 column names • Processed directly unless options validvarname=v6; • options validvarname=v6; • Long name truncated to 8 characters, with unique names as per PROC DB2EXT, PROC ACCESS • Documented in V8, undocumented but works in V9
Retrieving data: 4 methods • options db2ssid=dsn; /* 1. PROC DB2EXT */ • run; • proc db2ext out=new1; • select cntry_nm • from nicua.cuv_country_nm; • run; • proc access dbms=db2; /* 2. PROC ACCESS */ • create work.new1.access; • table = nicua.cuv_country_nm; • ssid = dsn; • create work.new1.view; • select cntry_nm; • run;
Retrieving data: 4 methods • /* 3. SAS/ACCESS to DB2 LIBNAME statement */ • libname in1 db2 ssid=dsn authid=nicua; • data new1; • set in1.cuv_country_nm (keep = cntry_nm); • run; • proc sql; /* 4. Pass_Through Facility */ • connect to db2 (ssid=dsn); • create table new1 • as select * from connection to db2 • (select cntry_nm • from nicua.cuv_country_nm); • disconnect from db2; • quit;
Notes • SAS/ACCESS LIBNAME statement vs Pass-Through Facility • Both recommended • SAS/ACCESS LIBNAME statement easiest • Pass-Through Facility can be more efficient, especially for complex queries • Compared in SAS/ACCESS manual • If convert from PROC DB2EXT or PROC ACCESS • DB2 column names could be longer • In subsequent code: RENAME or change code
For more information, please contact • Bruce Gilsen • Federal Reserve Board, mail stop 157 • Washington, DC 20551 • phone: 202-452-2494 • e-mail: bruce.gilsen@frb.gov • SAS is a registered trademark or trademark of SAS • Institute Inc. in the USA and other countries. ® • indicates USA registration. • Other brand and product names are registered • trademarks or trademarks of their respective • companies.