1 / 48

SAS Essentials

Learn the processes, internals, and defaults of the DATA step in SAS programming for optimal utilization of this powerful tool.

fillion
Download Presentation

SAS Essentials

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAS Essentials How SAS Thinks Neil.Howard@amgen.com

  2. “The DATA step is your most powerful programming tool.So understand and use it well.” Socrates

  3. Objectives • understand DATA step: • processes • internals • defaults

  4. processes • internals • defaults • compilation of DATA step source code • execution of resultant machine code

  5. processes • internals • defaults compile and execute phases of: • INPUT (non SAS data) • SET

  6. processes • internals • defaults Compile Time Activities • syntax scan • source code translation to machine language • definition of input and output files

  7. processes • internals • defaults Compile TimeActivities • input buffer • LPDV (logical program data vector) • data set descriptor information

  8. processes • internals • defaults Creation of LPDV • Variables added in the order seen by the compiler • during parsing and interpretation of source statements

  9. location critical BY WHERE ARRAY ATTRIB FORMAT INFORMAT LENGTH location irrelevant DROP KEEP LABEL RENAME RETAIN • processes • internals • defaults Compile Time Statements

  10. processes • internals • defaults Retained Variables • all SAS special variables • _N_ • _ERROR_ • all vars in RETAIN statement • all vars from SET, MERGE, or UPDATE • accumulator vars in SUM statement(s)

  11. processes • internals • defaults Variables Not Retained • Variables from input statement • user defined variables (other than SUM statement)

  12. processes • internals • defaults Type and Length of Variables • determined at compile time • by first reference to the compiler (in the DATA step) • Numerics: • length is 8 during DATA step processing • length is an output property

  13. INPUT statement reading non-SAS data

  14. Compile Loop and LPDV data a ; put _all_ ; *write LPDV to LOG; input idnum diagdate: mmddyy8. sex $ rx_grp $ 10. ; time = intck (‘year’, diagdate, today() ) ; put _all_; *write LPDV to LOG; cards ; 1 09-09-52 F placebo 2 11-15-64 M 300 mg. 3 04-07-48 F 600 mg. run;

  15. input buffer logical program data vector idnum diagdate sex rx_grp time numeric numeric char char numeric 8 8 8 10 8 Building descriptor portion of SAS data set

  16. idnum diagdate sex rx_grp time _N__ERROR_ numeric numeric char char numeric 8 8 8 10 8 logical program data vector DKR*keep keep keep keep keepdrop drop *Drop/keep/rename

  17. Execution of a DATA Step

  18. Execution of a DATA Step _N_ + 1 Initialization of LPDV read input file Y next step end of file? N process statements in step termination implied output

  19. processes • internals • defaults DATA Step Execution • Implied read/write loop, stopped by: • no more data to read • explicit STOP • no input data • some execution time errors

  20. processes • internals • defaults Execution Time Activities • execute initialize-to-missing (ITM) • read from input source • modify data using user-controlled statements • supply values of variables to LPDV • output observation to SAS data set

  21. processes • internals • defaults Initialization • _N_ set to loop count • _ERROR_ set to 0 • user variables set to missing

  22. Execution Loop - raw data data a ; put _all_ ; *write LPDV to LOG; input idnum diagdate: mmddyy8. sex $ rx_grp $ 10. ; time = intck (‘year’, diagdate, today() ) ; put _all_; *write LPDV to LOG; cards ; 1 09-09-52 F placebo 2 11-15-64 M 300 mg. 3 04-07-48 F 600 mg. run; proc contents; run; proc print; run;

  23. LPDV IDNUM DIAGDATE SEX RX_GRP TIME _N_ . . . 1 1 -2670 F placebo 48 1 . . . 2 2 1780 M 300 mg. 36 2 . . . 3 3 -4286 F 600 mg. 52 3 . . . 4 (over all executions of DATA step……..)

  24. 2 data a ; 3 put _all_ ; *write LPDV to LOG; 4 input idnum 5 diagdate: mmddyy8. 6 sex $ 7 rx_grp $ 10. ; 8 time = intck ('year', diagdate, today() ) ; 9 put _all_; *write LPDV to LOG; 10 cards ; IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=1 IDNUM=1 DIAGDATE=-2670 SEX=F RX_GRP=placebo TIME=49 _ERROR_=0 _N_=1 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=2 IDNUM=2 DIAGDATE=1780 SEX=M RX_GRP=300 mg. TIME=37 _ERROR_=0 _N_=2 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=3 IDNUM=3 DIAGDATE=-4286 SEX=F RX_GRP=600 mg. TIME=53 _ERROR_=0 _N_=3 IDNUM=. DIAGDATE=. SEX= RX_GRP= TIME=. _ERROR_=0 _N_=4 NOTE: The data set WORK.A has 3 observations and 5 variables. NOTE: The DATA statement used 0.59 seconds. 14 run; 15 16 proc contents; run; NOTE: The PROCEDURE CONTENTS used 0.39 seconds.

  25. Data Set Name: WORK.A Observations: 3 • Member Type: DATA Variables: 5 • Engine: V612 Indexes: 0 • Created: 11:18 Saturday, January 20, 2001 Observation Length: 42 • Last Modified: 11:18 Saturday, January 20, 2001 Deleted Observations: 0 • Protection: Compressed: NO • Data Set Type: Sorted: NO • Label: • -----Engine/Host Dependent Information----- • Data Set Page Size: 8192 • Number of Data Set Pages: 1 • File Format: 607 • First Data Page: 1 • Max Obs per Page: 194 • Obs in First Data Page: 3 • -----Alphabetic List of Variables and Attributes----- • # Variable Type Len Pos • ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ • 5 TIME Num 8 34 • 2 DIAGDATE Num 8 8 • 1 IDNUM Num 8 0 • 4 RX_GRP Char 10 24 • 3 SEX Char 8 16

  26. PROC PRINT IDNUM DIAGDATE SEX RX_GRP TIME 1 -2670 F placebo 48 2 1780 M 300 mg. 36 3 -4286 F 600 mg. 52

  27. SET statement reading existing SAS data

  28. DATA Step Compile • no input buffer • compiler reads descriptor portion of input SAS data set to build the LPDV • returns same variables/attributes, including new variables

  29. processes • internals • defaults SET • determine which SAS data set to be read • identify next observation to be read • copy variable values to LPDV

  30. Execution Loop - SAS data data sas_a ; put _all_ ; set a ; tot_rec + 1 ; put _all_ ; run;

  31. Building LPDV from descriptor portion of old SAS data set logical program data vector idnum diagdate sex rx_grp time tot_rec numeric numeric char char numeric numeric 8 8 8 10 8 8 Building descriptor portion of new SAS data set

  32. LPDV IDNUM DIAGDATE SEX RX_GRP TIME TOT_REC _N_ . . . 0 1 1 -2670 F placebo 48 1 1 1 -2670 F placebo 48 1 2 2 1780 M 300 mg. 36 2 2 2 1780 M 300 mg. 36 2 3 3 -4286 F 600 mg. 52 3 3 3 -4286 F 600 mg. 52 3 4 (over all executions of DATA step……..)

  33. LOG idnum=. diagdate=. sex= rx_grp= time=. tot_rec=0 _ERROR_=0 _N_=1 idnum=1 diagdate=-2670 sex=F rx_grp=placebo time=48 tot_rec=1 _ERROR_=0 _N_=1 idnum=1 diagdate=-2670 sex=F rx_grp=placebo time=48 tot_rec=1 _ERROR_=0 _N_=2 idnum=2 diagdate=1780 sex=M rx_grp=300 mg. time=36 tot_rec=2 _ERROR_=0 _N_=2 idnum=2 diagdate=1780 sex=M rx_grp=300 mg. time=36 tot_rec=2 _ERROR_=0 _N_=3 idnum=3 diagdate=-4286 sex=F rx_grp=600 mg. time=52 tot_rec=3 _ERROR_=0 _N_=3 idnum=3 diagdate=-4286 sex=F rx_grp=600 mg. time=52 tot_rec=3 _ERROR_=0 _N_=4

  34. PROC PRINT IDNUM DIAGDATE SEX RX_GRP TIME TOT_REC 1 -2670 F placebo 48 1 2 1780 M 300 mg. 36 2 3 -4286 F 600 mg. 52 3

  35. Logic of a MERGE • compile • execute

  36. data left; • input ID X Y ; • cards; • 1 88 99 • 2 66 77 • 44 55 • ; data right; input ID A $ B $ ; cards; 1 A14 B32 3 A53 B11 ;

  37. proc sort data=left; by ID; run; proc sort data=right; by ID; run; data both; merge left (in=inleft) right (in=inright); by ID ; run;

  38. logical program data vector first iteration: MATCH ID X Y A B INLEFT INRIGHT _N_ _ERROR_ 1 88 99 A14 B32 1 1 1 0

  39. logical program data vector second iteration: NO MATCH ID X Y A B INLEFT INRIGHT _N_ _ERROR_ 2 66 77 1 0 2 0

  40. logical program data vector third iteration: MATCH ID X Y A B INLEFT INRIGHT _N_ _ERROR_ 3 44 55 A53 B11 1 1 3 0

  41. Let’s try this again………………… • data left; • input ID X Y ; • cards; • 1 88 99 • 2 66 77 • 44 55 • ; data right; input ID A $ B $ ; cards; 1 A14 B32 3 A53 B11 ;

  42. proc sort data=left; by ID; run; proc sort data=right; by ID; run; data both; merge left (in=inleft) right (in=inright); ***** by ID (one-on-one merge); run;

  43. logical program data vector first iteration: 1:1 “MATCH” ID X Y A B _N_ _ERROR_ 1 88 99 A14 B32 1 0 1 OVERWRITTEN – value came from data set “right”

  44. logical program data vector second iteration:1:1 “MATCH” ID X Y A B _N_ _ERROR_ 2 66 77 A53 B11 2 0 3 OVERWRITTEN – value came from data set “right”

  45. logical program data vector third iteration:1:1 “NO MATCH” ID X Y A B _N_ _ERROR_ 3 44 55 3 0 MISSING – no values from “right”

  46. Output SAS data set ID X Y A B 1 88 99 A14 B32 3 66 77 A53 B11 3 44 55

  47. DATA Step Conclusions • Understanding internals and default activities allows you to: • make informed coding decisions • write flexible and efficient code • debug and test effectively • interpret results readily

  48. Remember • We have discussed DEFAULTS • As soon as you add options, statements, features, etc., the default actions change; TEST them! • You can use these same tools to track what’s happening.

More Related