1 / 47

This is the Modern World: Simple, Overlooked SAS® Enhancements

This is the Modern World: Simple, Overlooked SAS® Enhancements. Bruce Gilsen Federal Reserve Board June 2009. Introduction. Smaller, less dramatic enhancements fall thru cracks at Federal Reserve (SAS consultant for 24 years) SAS-L Conferences Users still use more cumbersome methods

jamal
Download Presentation

This is the Modern World: Simple, Overlooked SAS® Enhancements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This is the Modern World: Simple, Overlooked SAS® Enhancements Bruce Gilsen Federal Reserve Board June 2009

  2. Introduction • Smaller, less dramatic enhancements fall thru cracks at • Federal Reserve (SAS consultant for 24 years) • SAS-L • Conferences • Users still use more cumbersome methods • Enhancements from V92, V9, V8, V7, even V6! • Impetus: yymmddn8. in earlier paper

  3. Underutilized enhancements • Write date values in form yyyymmdd • Increment date values w/INTNX function • Transport files: PROC CPORT/CIMPORT vs PROC COPY w/EXPORT engine • Count # times character or substring occurs in a character string • Concatenate character strings • Sort by numeric part of character values • Retrieve DB2 data on z/OS mainframes

  4. 1. Write date values in form yyyymmdd • Lots of informats, formats to read/write date values • 9.1.3: 22 date informats, 46 date formats • Most common input form: yyyymmdd • YYMMDD8. informat reads dates in form yyyymmdd • Most common output form (overwhelmingly): yyyymmdd • YYMMDD8. format writes dates as yy-mm-dd • YYMMDD10. format writes dates as yyyy-mm-dd

  5. Writing dates as yyyymmdd, pre-V8 • Some date output • data one; • date1 = "18oct2008"d; * 10/18/2008 date value; • put date1 yymmdd8.; * writes 08-10-18 ; • put date1 yymmdd10.; * writes 2008-10-18 ; • Before Version 8 • PUT: character string of form yyyy-mm-dd • COMPRESS: remove dashes • datetemp = compress(put(date1,yymmdd10.),'-'); • put datetemp $8. ; * writes 20081018 ;

  6. Writing dates as yyyymmdd, V8 • New format, YYMMDDxw • x: optional separator • w: number of bytes • x, one of: B blank N none C colon P period D dash S slash • date1 = "18oct2008"d; * 10/18/2008 date value; • put date1 yymmddn8.; * writes 20081018 ;

  7. YYMMDDxw format, examples • data _null_ ; • date1 = "18oct2008"d; * 10/18/08 • date value ; • put date1 yymmddb8.; * 08 10 18 ; • put date1 yymmddc8.; * 08:10:18 ; • put date1 yymmddd10.; * 2008-10-18 ; • put date1 yymmddn8.; * 20081018 ; • put date1 yymmddp8.; * 08.10.18 ; • put date1 yymmdds10.; * 2008/10/18 ; • put date1 yymmddn6.; * 081018 ;

  8. 2. Increment date values with INTNX • INTNX returns SAS ® date value incremented by number of intervals (days, months, quarters, years,…) • SAS date value: integer, number of days before or after January 1, 1960 • December 31, 1959 -1 • January 1, 1960 0 • January 2, 1960 1 • ... • October 18, 2008 16,727

  9. INTNX Syntax (informal, incomplete) INTNX(interval, start-from, increment <,alignment>) • interval: unit (years, months, days,...) start-from is incremented by • start-from: SAS date value to increment • increment: # of intervals start-from is incremented by • alignment: where start-from aligned after incrementing. Values: BEGINNING, MIDDLE, END, SAMEDAY (v9). Default: BEGINNING.

  10. Before Version 9: 2 INTNX issues • Alignment BEGINNING (start of period) • Leap year handling

  11. INTNX issue 1: alignment to start of period • Increment by 2 days, months, years • Unexpected result for months, years • data one; • date1 = ‘18oct2008'd; • dateplus2day = intnx(‘day',date1,2); • dateplus2month = intnx(‘month',date1,2); • dateplus2year = intnx(‘year',date1,2); • run; • Variable Description Expect Actual • dateplus2day 10/18/08+2 days 10/20/08 10/20/08 • dateplus2month 10/18/08+2 months 12/18/08 12/1/08 • dateplus2year 10/18/08+2 years 10/18/10 1/1/10

  12. dateplus2day = intnx(‘day',date1,2); • date1 = ‘18oct2008'd; • = 17,823, date value for 10/18/2008 • Increment by 2 days • First, increment by 2 days • Then, align to start of 10/20/2008 (beginning of interval, DAY), no effect • DATEPLUS2DAY = 17,825, date value for 10/20/2008 as expected

  13. dateplus2month = intnx(‘month',date1,2); • date1 = ‘18oct2008'd; • = 17,823, date value for 10/18/2008 • Increment by 2 months • First, increment by 2 months • Then, align to start of 12/2008 (beginning of interval, MONTH) • DATEPLUS2MONTH = 17,867, date value for 12/1/2008

  14. dateplus2year = intnx(‘year',date1,2); • date1 = ‘18oct2008'd; • = 17,823, date value for 10/18/2008 • Increment by 2 years • First, increment by 2 years • Then, align to start of 2010 (beginning of interval, YEAR) • DATEPLUS2MONTH = 18,263, date value for 1/1/2010

  15. INTNX issue 2: leap years • Example of shifting by years • offset = 2; * number of years to increment; • if day(date1) = 29 and month(date1) = 2 and mod(offset,4) ne 0 and mod(offset, 400) ne 0 • then date1 = mdy(2, 28, year(date1)+offset); • else date1 = mdy(month(date1), day(date1), • year(date1)+offset) ; • Need more generalized code for all frequencies

  16. Solution: SAMEDAY • New alignment value in V9 • Preserves date value’s alignment in interval after it’s incremented • No effect when interval = DAY • Use for all intervals except DAY

  17. Solution: SAMEDAY • dateplus2day = intnx(‘day’, date1, 2, “sameday”) • 2 days after 10/18/2008 • Result: 17,825, SAS date value for 10/20/2008 • dateplus2month = intnx(‘month’, date1, 2, “sameday”) • 2 months after 10/18/2008 • Result: 17,884, SAS date value for 12/18/2008 • dateplus2year = intnx(‘year’, date1, 2, “sameday”) • 2 years after 10/18/2008 • Result: 18,553, SAS date value for 10/18/2010

  18. Some interesting dates • date2 = intnx(‘year’, ’29feb2000’d, 1, “sameday”) • 1 year after 2/29/2000 • Result: 15,034, SAS date value for 2/28/2001 • date3 = intnx(‘year’, ’29feb2000’d, 4, “sameday”) • 4 years after 2/29/2000 • Result: 16,130, SAS date value for 2/29/2004 • 2000, 2004 = leap years, 2001 not leap year

  19. Some interesting dates • date4 = intnx(‘month’, ’31mar2003’d, -1, “sameday”) • 1 month before 3/31/2003 • Result: 15,764, SAS date value for 2/28/2003 • date5 = intnx(‘month’, ’31mar2004’d, -1, “sameday”) • 1 month before 3/31/2004 • Result: 16,130, SAS date value for 2/29/2004 • 2000, 2004 = leap years, 2003 not leap year

  20. SAMEDAY until 9.2: single, non-shifted date intervals only • Following intervals might return wrong answer, no error or warning • multiple (e.g., month2 = two-month interval) • shifted (e.g., month.4 = month interval starting on April 1) • time • datetime • Interval = DAY, WEEK, WEEKDAY, TENDAY, SEMIMONTH, MONTH, QTR, SEMIYEAR, YEAR okay • Fixed in 9.2 • http://support.sas.com/techsup/unotes/SN/016/016184.html

  21. 3. Transport files: PROC CPORT/CIMPORT vs PROC COPY w/EXPORT engine • Transport file • File organized in machine-independent format • SAS can read on any operating system • Use to copy SAS data sets from one operating system to another • Starting w/ Version 6, two types of transport files • XPORT-style transport files • CPORT-style transport files

  22. XPORT-style transport files: 3 steps • 1. On original host, create transport file with data set(s) • libname inn 'directory_with_data_set(s)' ; • libname trans xport 'transport_file_to_create'; • proc copy in=inn out=trans ; run ; • 2. Copy transport file to destination host with ftp command or another method • 3. On destination host, transport file ---> SAS data set(s) • libname trans xport 'transport_file' ; • libname sasdata1 ‘directory_for_data_sets‘; • proc copy in=trans out=sasdata1 ; run ;

  23. CPORT-style transport files: 3 steps • 1. On original host, create transport file with data set(s) • libname sasdata1 'directory_with_SAS_data_set(s)'; • filename trans 'transport_file_to_create'; • proc cport library=sasdata1 file=trans ; run ; • 2. Copy transport file to destination host with ftp command or another method • 3. On destination host, transport file ---> SAS data set(s) • filename trans 'transport_file' ; • libname sasdata1 ‘directory_for_data_sets' ; • proc cimport infile=trans library=sasdata1 ; run ;

  24. Version 7 changes and transport files • Version 7 changes • Variable name maximum length: 8  32 • Character variable maximum length: 200  32768 • Variable names display based on “first reference” • Transport files • XPORT-style created in Version 6 format • CPORT-style understands Version 7 changes • Always use CPORT-style

  25. XPORT-style transport files, Version 7 and beyond • options validvarname=v6; • Documented in V8, undocumented but works in V9 • Empty data set if any character variable length more than 200, error in log • Variable names longer than 8 characters truncated, unique • Variable names converted to upper case • Otherwise • Empty data set if any character variable length more than 200 or variable name longer than 8 characters, error in log • Variable names converted to upper case

  26. 4. Count number of times a character or substring occurs in a character string • Typical objective, count number of times • "ab" occurs in "ab1aba344444abb555ab4“ (4 times) • "a” or “b" occur in "ab1aba344444abb555ab4“ (10 times) • Pre-Version 9: iterative methods (loops) • Version 9 • COUNT: # times substring occurs in a character string • COUNTC: # times 1 or more characters occur in a character string

  27. Count occurrences of substring "ab" • in "ab1aba344444abb555ab4" • data one; • char1 = "ab1aba344444abb555ab4"; * character variable to check; • * Before Version 9 ; • num_ab = 0; * counter for number • of occurrences ; • loc=index(char1,"ab"); * find 1st occurrence ; • do while (loc ne 0); • num_ab+1; * count occurrences of "ab" ; • char1 = substr(char1,loc+2); * resume search after "ab" ; • loc = index(char1, "ab"); * find next occurrence ; • end; • * Or, a single statement in Version 9 ; • num_ab = count(char1,"ab"); * Version 9 ; • run;

  28. Count occurrences of characters "a" • or "b" in "ab1aba344444abb555ab4" • data one; • char1 = "ab1aba344444abb555ab4"; * character variable to check; • * Before Version 9 ; • num_a_or_b = 0; * counter for number • of occurrences ; • loc=indexc(char1,"ab"); * find 1st occurrence ; • do while (loc ne 0); • num_a_or_b+1; * count occurrences • of "a” or “b“ ; • char1 = substr(char1,loc+1); * resume search • after "a” or “b" ; • loc = indexc(char1, "ab"); * find next occurrence ; • end; • * Or, a single statement in Version 9 ; • num_a_or_b = countc(char1,"ab"); * Version 9 ; • run;

  29. Countw: Version 9.2 • Counts number of words in a character string • Can also replace iterative programming

  30. 5. Concatenate character strings • Pre-Version 9 • || to concatenate • TRIM removes trailing blanks • TRIM(LEFT) removes leading and trailing blanks • Version 9, new functions concatenate character strings • CAT does not remove leading or trailing blanks • CATT removes trailing blanks • CATS removes leading and trailing blanks • CATX removes leading and trailing blanks, inserts delimiter (separator) between each argument • Note difference in length of result

  31. data one; • length char1 char2 char3 $8.; • char1="xxxxxxxx"; • char2=" yyy"; * 1 leading blank ; • char3="zzzz"; • * Leading and trailing blanks included; • cat1 = char1 || char2 || char3; • cat1 = cat(char1, char2, char3); * V9 ; • * Remove trailing blanks ; • cat2 = trim(char1) || trim(char2) || trim(char3); • cat2 = catt(char1, char2, char3); * V9 ; • cat1=xxxxxxxx yyy zzzz • cat2=xxxxxxxx yyyzzzz

  32. data one; • length char1 char2 char3 $8.; • char1="xxxxxxxx"; • char2=" yyy"; * 1 leading blank ; • char3="zzzz"; • * Remove leading and trailing blanks ; • cat3 = trim(left(char1)) || • trim(left(char2)) || • trim(left(char3)); • cat3 = cats(char1, char2, char3); • cat3=xxxxxxxxyyyzzzz

  33. Important difference: length of result • data one; • length char1 char2 char3 $8.; • char1="xxxxxxxx"; • char2=" yyy"; * 1 leading blank ; • char3="zzzz"; • cat1 = char1 || char2 || char3; • cat1 = cat(char1, char2, char3); • etc. • If length of result (e.g., CAT1) not defined • ||: sum of length of contributing variables: 8+8+8=24 • CAT, CATT, CATS, CATX: 200

  34. data one; • length char1 char2 char3 $8.; • char1="xxxxxxxx"; • char2=" yyy"; * 1 leading blank ; • char3="zzzz"; • aa = '123'; • * Delimiter: value of AA, • can vary by observation ; • char_catx1 = catx(aa, char1, char2); • * Delimiter: comma ; • char_catx2 = catx(',', char1, char2, char3); • * Delimiter: '123' ; • char_catx3 = catx('123', char1, char2, char3); • * Delimiter: single blank ; • char_catx4 = catx(' ', char1, char2, char3); • run; • char_catx1=xxxxxxxx123yyy • char_catx2=xxxxxxxx,yyy,zzzz • char_catx3=xxxxxxxx123yyy123zzzz • char_catx4=xxxxxxxx yyy zzzz

  35. 6. Sort by the numeric part of character values • Variable BANK: • bank2 bank100 bank1 bank10 • Conventional PROC SORT • left to right character by character comparison • "1" in "bank100" less than "2" in "bank2“ • Wanted: sort integer values w/in text by numeric value • Conventional sort order Desired sort order • bank1 bank1 • bank10 bank2 • bank100 bank10 • bank2 bank100

  36. Before Version 9.2: one method • Make new data set with variable w/numeric part of BANK • Sort by new variable • This example assumes numeric part of value • always starts at same location (5th character) • at most three digits • More general case slightly more complicated • data two; • set one; • banktemp = input(substr(bank,5),3.); run; • proc sort data=two out=three; • by banktemp; run;

  37. Version 9.2: linguistic collation • SORTSEQ=LINGUISTIC + NUMERIC_COLLATION sorts integer values w/in text by numeric part of value • Don’t create extra data set • proc sort data=one out=two • sortseq=linguistic (numeric_collation=on); • by bank; run; • Sort order for BANK: • bank1 • bank2 • bank10 • bank100

  38. Numeric collation not just at end of character values • ADDRESS values • East 14th Street • East 2nd Street • Conventional sort: “1” less than “2” • Sortseq=linguistic (numeric_collation=on): 2 less than 14 • Conventional sort order Linguistic collation order • East 14th Street East 2nd Street • East 2nd Street East 14th Street

  39. 7. Retrieve DB2 data on z/OS mainframes • Old methods • PROC DB2EXT • PROC ACCESS • Modern methods • SAS/ACCESS LIBNAME statement • Pass-Through Facility

  40. PROC DB2EXT • Since Version 6 (1990), undocumented but works • In Version 9, message • NOTE: Beginning with SAS version 10, DB2EXT will no longer be available. • “Grandfathered” to Version 6 • Long DB2 column names truncated to 8 characters • Algorithm ensures unique variable names

  41. PROC ACCESS • Introduced in Version 6 (1990) • Use access and view descriptors • Documented and supported, not recommended • PROC CV2VIEW: convert access and view descriptors to SQL views • “Grandfathered” to Version 6 • Long DB2 column names truncated to 8 characters • Algorithm ensures unique variable names

  42. SAS/ACCESS LIBNAME statement • Introduced in Version 7 • Assign libref directly to DB2 database • Easiest method • Long DB2 column names processed directly

  43. Pass-through facility • Introduced in Version 6 • Retrieve DB2 data by sending DBMS-specific SQL statements to DB2 • DBMS = Database Management System • Long DB2 column names • Processed directly unless options validvarname=v6; • options validvarname=v6; • Long name truncated to 8 characters, with unique names as per PROC DB2EXT, PROC ACCESS • Documented in V8, undocumented but works in V9

  44. Retrieving data: 4 methods • options db2ssid=dsn; /* 1. PROC DB2EXT */ • run; • proc db2ext out=new1; • select cntry_nm • from nicua.cuv_country_nm; • run; • proc access dbms=db2; /* 2. PROC ACCESS */ • create work.new1.access; • table = nicua.cuv_country_nm; • ssid = dsn; • create work.new1.view; • select cntry_nm; • run;

  45. Retrieving data: 4 methods • /* 3. SAS/ACCESS to DB2 LIBNAME statement */ • libname in1 db2 ssid=dsn authid=nicua; • data new1; • set in1.cuv_country_nm (keep = cntry_nm); • run; • proc sql; /* 4. Pass_Through Facility */ • connect to db2 (ssid=dsn); • create table new1 • as select * from connection to db2 • (select cntry_nm • from nicua.cuv_country_nm); • disconnect from db2; • quit;

  46. Notes • SAS/ACCESS LIBNAME statement vs Pass-Through Facility • Both recommended • SAS/ACCESS LIBNAME statement easiest • Pass-Through Facility can be more efficient, especially for complex queries • Compared in SAS/ACCESS manual • If convert from PROC DB2EXT or PROC ACCESS • DB2 column names could be longer • In subsequent code: RENAME or change code

  47. For more information, please contact • Bruce Gilsen • Federal Reserve Board, mail stop 157 • Washington, DC 20551 • phone: 202-452-2494 • e-mail: bruce.gilsen@frb.gov • SAS is a registered trademark or trademark of SAS • Institute Inc. in the USA and other countries. ® • indicates USA registration. • Other brand and product names are registered • trademarks or trademarks of their respective • companies.

More Related