310 likes | 469 Views
SAS Tips & Tricks. Brought to you by your fellow MASUG members. Jun Tang, Data Analyst, MS. UT Health Science center Missing Data in Multivariate Analysis. Missing Data in Multivariate Analysis.
E N D
SAS Tips & Tricks Brought to you by your fellow MASUG members
Jun Tang, Data Analyst, MS • UT Health Science center • Missing Data in Multivariate Analysis
Missing Data in Multivariate Analysis In multivariate analysis, we encountered this: 76 of 306 observations in data set LIB.FILENAME omitted due to missing values. It’s 25% of the sample. Sample size is drastically reduced.
Missing Data in Multivariate Analysis • To solve this problem, we consider two situations: • Do not use Multivariate Analysis, in stead, use Univariate Test. This will not affect the sample size of those non-missing value variables. • Example: MANOVA change to TTEST. • procanova; • class AAMC; • model ratevar1 ratevar2 ratevar3 …=AAMC; • manova h=AAMC/printe printh; • run; • Change to: • procttest data=lib.filename; • class AAMC; • VAR ratevar1 ratevar2 ratevar3 …; • run;
Missing Data in Multivariate Analysis • To replace the missing data : • Mean Estimation; • Hot-Deck imputation; • Predict missing values from Regression; • Multiple Imputation (which is still in experimental form in Release 8.1 of the SAS system). • SAS procedures for doing MI: • a. PROC MI creates Multiple Imputed data sets; • b. PROC MIANALYZE combines results after analysis. • Example: • proc MI data=lib.filename <seed=37851> <mu0=50 10 180> out=outmi; • VAR ratevar1 ratevar2 ratevar3…; • run; Refs: 1. Kim Chantala and C.Suchindran, Multiple Imputation for missing data; 2. www.ats.ucla.edu/stat/sas/v8/miv802.pdf, chapter 9, the MI procedure;
Shelly Lensing, Sr. Biostatistician IIDeqing Pei, Statistical Analyst II • St. Jude Children's Research Hospital • Missing Readings Without a Missing Record
Missing Readings Without a Missing Record • This is a problem that just came up: • We needed to group each set of readings (3h, 7h, 23h, 44h) for each ID to indicate group 1, 2, etc. over time. • Some readings were missing, but there was no missing record
Missing Readings Without a Missing Record data one; input id sample $ 10-18 mtx_con @37 hdmtx_date mmddyy8.; sampnum=compress(sample,'MTXHR')+0; format hdmtx_date mmddyy8.; datalines; *** example data; 11111 MTX 3HR 81.29 11/20/95 11111 MTX 7HR 22.07 11/20/95 11111 MTX 23HR 2.24 11/21/95 11111 MTX 44HR 1.49 11/22/95 11111 MTX 3HR 92.16 01/19/96 11111 MTX 44HR 0.18 01/21/96 11111 MTX 7HR 24.27 02/20/96 11111 MTX 23HR 3.34 02/21/96 22222 MTX 3HR 127.22 12/17/94 22222 MTX 7HR 29.73 12/17/94 22222 MTX 23HR 0.65 12/18/94 22222 MTX 7HR 45.00 12/28/94 22222 MTX 23HR 1.88 12/29/94 22222 MTX 3HR 132.48 02/06/95 22222 MTX 7HR 37.96 02/06/95 22222 MTX 23HR 1.53 02/07/95 22222 MTX 44HR 0.13 02/08/95 22222 MTX 7HR 85.71 02/09/95 22222 MTX 44HR 30.01 02/11/95 22222 MTX 44HR 35.01 02/13/95 33333 MTX 3HR 65.02 02/13/95 33333 MTX 23HR 34.13 02/14/95 33333 MTX 23HR 24.13 02/15/95 33333 MTX 3HR 74.23 02/16/95 33333 MTX 44HR 24.06 02/17/95 ; run;
Missing Readings Without a Missing Record proc sort data=one; *** make sure data are in order; by id hdmtx_date sampnum; run; data two; set one; by id; lagsamp=lag(sampnum); * Uses logic of when hour-based measurements should occur relative to each other; * to find first record of each set; if sampnum=3 then first=1; if sampnum=7 and lagsamp ne 3 then first=1; if sampnum=23 and lagsamp not in(7,3) then first=1; if sampnum=44 and lagsamp not in(23,7,3) then first=1; if first.id then do; * restarts group number with each new id; group=0; end; retain group; if first=1 then group+1; * increments group number for each grouping; run; proc print; run;
Vaibhav Gardé , Senior Marketing Analyst • FedEx Services - Marketing, Planning and Analysis Group • Interactive Mode • Beats the Batch Mode in most cases
I like to use the SAS Interactive Mode on UNIX. It is almost as convenient to use as Windows, and my code gets neatly color coded! This makes it easy to catch minor error in syntax – like forgetting a quote, semi-colon, etc.
In the interactive mode, it is quite easy to see the exact location, size, modification date, etc. of all your tables.
You can even right click to open (view) Oracle tables, SAS datasets to see the records contained therein. Much easier than the usual UNIX SQL options! (Can also delete, copy, save, etc.)
You can see a continuously visible log file (again color coded) to monitor the progress/success of your program. In case there is an error, it shows up in RED. There are other benefits of using the Interactive mode, but I have highlighted just these few. Hope you find it worthwhile to check it out!
Alan Teal, Chief, Technical Support Branch • Defense Contract Audit Agency • Macro to Document Date/Time of Processing
Macro to Document Date/Time of Processing • Purpose: It is a good practice to document the date and time when a SAS job was run, especially if the same job is often run but with different data sets. • The following macro writes the date and time to the SAS log. • It can also be used to determine how long each Data step or PROC processed, in order to evaluate which steps might be made more efficient.
Macro to Document Date/Time of Processing SAS Log Shown Below: 1 * TheTimeIs.SAS Writes the date and time to the SAS Log to document processing.; 2 * Created 03-25-2003, Alan Teal; 3 4 %MACRO Tim; 5 DATA _Null_; 6 TheTimeIs=Datetime(); 7 PUT TheTimeIs= DateTime20.; 8 %MEND Tim; 9 10 %Tim 11 12 RUN; TheTimeIs=25MAR2003:12:36:47 NOTE: DATA statement used: real time 0.32 seconds When the %Tim macro is called TheTimeIs (date/time) is written to the SAS Log, as shown above.
The Unknown SAS Instructor • I just have one comment to make…
I just have one comment to make… /* Here’s an easy way to comment out a chunk of code: */ %macro skip; data new; set old; . . . run; %mend skip; /* And the best part is, you can use it as many times as you want within the same program! */
Stephanie Thompson, Sr. Merchandising Analyst • AutoZone • Extra rsubmits
Extra rsubmits • Ever forget to remove an extra rsubmit from your code and it crashes? • Keep them there and use them at will with…...
Extra rsubmits rsubmit; procsql; createtable payprelim_vp as (SELECT VP, sum(purchase_cost) as p_cost, sum(inventory_cost) as i_cost FROM payprelim_t group by vp); *rsubmit; data q.payables_vp; set payprelim_vp; payables = p_cost / i_cost; run; procprintdata = q.payables_vp; format purchase_cost dollar12. inventory_cost dollar12. payables percent6.2; run; Higlight from after the ‘*’ to the end and submit.
Walter Smith, Marketing Project Manager • WorldWide Forecasting, FedEx • Dates and Fixing Capitalization
Creating SAS Dates From Character Data • Often date data is stored in files or tables as character data and is more useful when converted to SAS date values (number of days since Jan 1, 1960)
Sample Code *---------- create SAS dates ----------; data sasdates; set chardates; yy = input( substr( yyyymm , 1 , 4 ) , 4. ); mm = input( substr( yyyymm , 5 ) , 2. ); sasdate = mdy( mm , 1 , yy ); date9 = sasdate; mmddyy = sasdate; attrib sasdate label='Numeric Date' format=5.; attrib date9 label='Displayed As Date9' format=date9.; attrib mmddyy label='Displayed As mmddyy' format=mmddyy10.; keep yyyymm sasdate date9 mmddyy; tt = 1; put 'tt=' tt 'tt=' tt date9.; run; title 'Character & SAS Dates'; proc print; run; *-- read in some dates as character ---; data chardates; infile cards; input yyyymm $6. ; cards; 200206 200207 200208 200303 run;
The Result Character & SAS Dates Obs yyyymm sasdate date9 mmddyy 1 200206 15492 01JUN2002 06/01/2002 2 200207 15522 01JUL2002 07/01/2002 3 200208 15553 01AUG2002 08/01/2002 4 200303 15765 01MAR2003 03/01/2003
SAS Dates in Macro Variables • Sometimes it is convenient to have date values in the macro environment • Additional manipulations can be done – the macro function %sysfunc() gives access to nearly all the datastep functions!
Sample Code & Results *--- put min & max into macro vars ----; proc sql noprint; select put(min(sasdate),5.), max(sasdate) into :mindate, :maxdate from sasdates; quit; %put >> mindate="&mindate" maxdate="&maxdate"; %let nextmonth = %sysfunc( intnx( month , &maxdate , 1 ) , date9. ); %let sas_date_const = %unquote(%bquote(')&nextmonth%bquote('d)); %put >> nextmonth=&nextmonth sas_date_const=&sas_date_const; >> mindate="15492" maxdate=" 15765“ >> nextmonth=01APR2003 sas_date_const='01APR2003'd
Capitalizing Proper Names • Often names of people, places and things in data is in mixed or case, or worse, IN ALL CAPITAL LETTERS. • Names are easier to read when each word of the name has only the initial letter capitalized.
Sample Code *---- change names to initial caps ----; data newnames; set names; oldname = fullname; length newname $30; fullname = lowcase( fullname ); n_wrds = length( compbl( fullname ) ) - length( compress( fullname , ' ' ) ) +1; do i=1 to n_wrds; wrd = scan( fullname , i , ' ' ); newwrd = upcase(substr(wrd,1,1))||substr(wrd,2); newname = trim(newname)||' '||newwrd; end; keep oldname newname; run; title 'Names With Initial Capitals'; proc print; run; *--------- read in some data ----------; data names; infile cards; input fullname $30. ; cards; Chris Geith FaRoUK HiJaZi unusual case SOUTH KOREA TARIQ AZIZ Walter j SMITH los angeles CA state abbrev run;
Results Names With Initial Capitals Obs oldname newname 1 Chris Geith Chris Geith 2 FaRoUK HiJaZi unusual case Farouk Hijazi Unusual Case 3 SOUTH KOREA South Korea 4 TARIQ AZIZ Tariq Aziz 5 Walter j SMITH Walter J Smith 6 los angeles CA state abbrev Los Angeles Ca State Abbrev