590 likes | 862 Views
SAS ® PROC SQL or Vanilla Flavor. Cecilia Mauldin January 31 2007. What is Special in SAS PROC SQL?. Things you already know from data step programming Some differences between data step and proc sql syntax Order of execution How good is my code?. DATA dogs ; LENGTH name breed $20
E N D
SAS® PROC SQL or Vanilla Flavor Cecilia Mauldin January 31 2007
What is Special in SAS PROC SQL? • Things you already know from data step programming • Some differences between data step and proc sql syntax • Order of execution • How good is my code?
DATA dogs; LENGTHname breed $20 dob $10; INPUTname breed dob; CARDS; Nala Airdale 11jan1990 Nina AMBR 22apr1997 Jadzia Pointer 15feb1999 Rocco MMBNR 08may1980 Bruja Rottweiler 15aug2000 Chacal Rottweiler 29jan1990 Chata Boxer UNK Tyson Rottweiler 15mar2000 ; RUN; DATA parents; LENGTHparent name $20; INPUT parent name; CARDS; Ina Nala Ina Chata Ina Bruja Ina Fidonga Chavo Rocco Cecilia Nina Cecilia Jadzia ; RUN; The Data
What is Special in SAS PROC SQL? • Things you already know from data step programming • Some differences between data step and Proc SQL syntax • Order of execution • How good is my code?
Things You Already Know from Data Step Programming • Our old friends: where, keep, drop and rename • In data step • How do we use them? • Why do we use them? • In SAS PROC SQL • How do we use them? • Why do we use them?
Things You Already Know from Data Step Programming DATA new_data (KEEP= RENAME=( ) DROP= WHERE=( )); SET old_data (KEEP= RENAME=() DROP= WHERE=( )); SAS data steps … ; RUN; It is written from the PDV when specified on an output data set. Is applied when the dataset OLD_DATA is read into the PDV.
Data Step: Why Do We Use Our Old Friends? DATA new_data (KEEP= RENAME= ( ) DROP= WHERE=( )); SET old_data (KEEP= RENAME=() DROP= WHERE=( )); SAS data steps … ; RUN; No need to keep observations and/or variables that are not used in the output data set. No need to carry observations or variables that are not going to be used in the PDV.
Proc SQL: How Do We Use Them? PROC SQL SQL_OPTIONS; CREATE TABLE example (KEEP= …. RENAME=( ….) DROP= WHERE=( ….)) AS SELECT * FROM data_set (KEEP= var1 var2…. RENAME=(var1 = newvar1 var2 = newvar2 ….) DROP= WHERE=(….)); QUIT;
Why Do We Use Them? • Typing convenience • Right, left and full joins
Typing Convenience • Your dataset has 50 variables, you need to use 45 • Do you want to type 45 variables? • Do you carry 5 variables that you don’t need?
Typing Convenience cont’d SELECT var1, var2, ∙∙∙, var45 FROM data_set; SELECT * FROM data_set (DROP=var46 … var50);
SELECT * FROM dogs (DROP=dob RENAME=(name=firstname)) ; Typing Convenience cont’d First name Breed Nala Airdale Nina AMBR Jadzia Pointer Rocco MMBNR Bruja Rottweiler Chacal Rottweiler Chata Boxer Tyson Rottweiler SELECT name AS firstname, breed FROM dogs ;
Joins • Avoid ‘warning’ messages when creating a data set because two variables in two different datasets have the same name • Limit the records that come in a join
Avoid WARNING messages when 2 variables have the same name Joins cont’d CREATE TABLE final AS SELECT a.*, b.* FROM data_set1 a,data_set2 b WHERE a.var1=b.var1 AND a.var2=b.var2 AND a.var3=b.var3 ;
Avoid WARNING messages when 2 variables have the same name Joins cont’d CREATE TABLE family AS SELECT a.*, b.* FROM dogs a, parent b WHERE a.name=b.name ; WARNING: Variable Name already exists on file WORK.FAMILY. NOTE: Table WORK.FAMILY created, with 6 rows and 4 columns.
Avoid WARNING messages when 2 variables have the same name Joins cont’d CREATE TABLE final (DROP=VAR1B VAR2B VAR3B) AS SELECT a.*, b.* FROM data_set1A, data_set2 (RENAME=(var1=var1b var2=var2b var3=var3b)) B WHERE a.var1=b.var1b AND a.var2=b.var2b AND a.var3=b.var3b;
Joins cont’d CREATE TABLE family (DROP=nickname) AS SELECT a.*, b.* FROM dogs a, parents (RENAME=(name=nickname)) b WHERE a.name=b.nickname ; Name breed DOB Parent Nala Airdale 11JAN1990 Ina Nina AMBR 22APR2001 Cecilia Jadzia Pointer 15FEB1999 Cecilia Rocco MMBNR 08MAY2001 Chavo Bruja Rottweiler 15AUG2000 Ina Chata Boxer . Ina
Joins cont’d SELECT a.*, b.* FROM dogs a LEFT JOIN parents b ON a.name=b.name AND a.breed='Rottweiler' ; Do we get a warning? Name Breed DOB Parent name Bruja Rottweiler 15AUG2000 Ina Bruja Chacal Rottweiler 29JAN1990 Chata Boxer . Jadzia Pointer 15FEB1999 Nala Airdale 11JAN1990 Nina AMBR 22APR2001 Rocco MMBNR 08MAY2001 Tyson Rottweiler 15MAR2000
Limit records Joins cont’d CREATE TABLE final AS SELECT a.*, b.* FROM data_set1 (WHERE=(var1=’keep’)) a LEFT JOIN data_set2 b ON a.var1=B.var1b AND a.var2=B.var2b AND a.var3=B.var3b ;
Joinscont’d SELECT a.*, b.* FROM dogs (WHERE=(breed='Rottweiler'))a LEFT JOIN parents b ON a.name=b.name ; Name breed DOB Parent name Bruja Rottweiler 15AUG2000 Ina Bruja Chacal Rottweiler 29JAN1990 Tyson Rottweiler 15MAR2000
What is special in SAS PROC SQL? • Some differences between data step and SQL syntax • Things you already know from data step programming • Order of execution • How good is my code?
Some Differences Between Data Step and SQL Syntax • Reserved word • Format modifiers • Descending sorting
Reserved Word • CASE • Why to use it • Is there something similar in data step? • How to use it • What can be the problem • How to avoid a problem
CASE: Why to Use It • There is no IF or SELECT statements in SQL • Is similar to the SELECT clause in data step
CASE: Why to Use It DATA final; SET parents; SELECT (name); WHEN (‘Nina') education =‘Novice'; WHEN (‘Nala') education =‘None'; OTHERWISE education =‘There is hope’; END;
SELECT var1, var2, CASE [vark] WHEN (“value1”) THEN [‘constant value’ | function] WHEN (‘value3’) THEN [‘other constant’ | function] … ELSE [‘last value’ | function ] END AS newvar FROM data_set ; CASE: How to Use It
SELECT name, CASE name WHEN ('Nala') THEN 'None' WHEN ('Nina') THEN 'Novice' ELSE 'There is hope' END AS education LENGTH = 15 FORMAT $15., parent FROM parent ; CASE: How to Use It cont’d Name Education Parent Nala None Ina Chata There is hope Ina Bruja There is hope Ina Fidonga There is hope Ina Rocco There is hope Chavo Nina Novice Cecilia Jadzia There is hope Cecilia
CASE: How to Use It cont’d SELECT name,breed, CASE WHEN INDEX (breed,'MB') THEN PUT(breed,$UKC.) ELSE breed END AS lbreed label "Long Breed" FROM dogs; Name Breed Long breed Bruja Rottweiler Rottweiler Chacal Rottweiler Rottweiler Chata Boxer Boxer Jadzia Pointer Pointer Nala Airdale Airdale Nina AMBR American Mixed Breed Registry Rocco MMBNR Mexican Mixed Breed Non-Registered Tyson Rottweiler Rottweiler
DATA new; LENGTH case $20; SET parents; IF parent ='Ina' THEN case='None'; ELSE IF parent ='Chavo' THEN case='Spoiled'; ELSE case='Work'; RUN; PROC PRINT DATA=new; RUN; CASE: What Can Be the Problem Obs Case Parent Name 1 None Ina Nala 2 None Ina Chata 3 None Ina Bruja 4 None Ina Fidonga 5 Spoiled Chavo Rocco 6 Work Cecilia Nina 7 Work Cecilia Jadzia
CASE: What Can Be the Problem cont’d SELECT * FROM new WHERE parent='Ina'; Case Parent Name None Ina Nala None Ina Chata None Ina Bruja None Ina Fidonga
SELECT * FROM new WHERE case='None'; CASE: What Can Be the Problem cont’d select * from new WHERE case='None'; + 22 + 76 +ERROR 22-322: Syntax error, expecting one of the following: + a name, a quoted string, a numeric constant, + a datetime constant, a missing value, (, +, -, + BTRIM, CALCULATED, CASE, EXISTS, INPUT, LOWER, + NOT, PUT, SELECT, SUBSTRING, TRANSLATE, UPPER, + USER, WHEN, ^, ~. +ERROR 76-322: Syntax error, statement will be ignored.
CASE: How to Avoid a Problem SELECT * FROM new (RENAME=(case=education)) WHERE education='None';
Some Differences Between Data Step and SQL Syntax • Reserved word • Format modifiers • Descending sorting
Format Modifiers • What are format modifiers? • Format modifiers in data step • Format modifiers in proc SQL
Format Modifiers in Data Step DATA ndogs; SET dogs; idate=INPUT(dob,date9.); IF _ERROR_=1 THEN PUT _ALL_; RUN; NOTE: Invalid argument to function INPUT at line 196 column 9. name=Chata breed=Boxer dob=UNK idate=. _ERROR_=1 _N_=3 NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 196:9
Format Modifiers in Data Stepcont’d DATA ndogs; SET dogs; idate=INPUT(dob, ?? date9.); IF _ERROR_=1 THEN PUT _ALL_; RUN; NOTE: There were 8 observations read from the data set WORK.DOGS. NOTE: The data set WORK.NDOGS has 8 observations and 4 variables.
Format Modifiers in PROC SQL CREATE TABLE ndogs AS SELECT name, dob, INPUT(dob, date9.) AS idate FORMAT mmddyy8. FROM dogs; +ERROR: Invalid day value NOTE: Table WORK.NDOGS created, with 8 rows and 3 columns.
Format Modifiers in PROC SQLcont’d CREATE TABLE ndogs AS SELECT name, dob, INPUT(dob,? date9.) AS idate FORMAT mddyy8. FROM dogs; NOTE: Table WORK.NDOGS created, with 8 rows and 3 columns.
Some Differences Between Data Step and SQL Syntax • Reserved word • Format modifiers • Descending sorting
Descending Sorting • Data step and PROCS PROC SORT DATA = ndogs; BY DESCENDING idate; RUN; • PROC SQL SELECT * FROM ndogs ORDER BY idate descending;
Descending Sorting cont’d • A little lagniappe (pronounced nańap) SELECT name, dob, INPUT( dob,? date9.) AS idate FORMAT date9. FROM dogs ORDER BY 3 DESC ; Name DOB Idate Bruja 15Aug2000 15AUG2000 Tyson 15Mar2000 15MAR2000 Jadzia 15Feb1999 15FEB1999 Nina 22APR1997 22APR1997 Chacal 29Jan1990 29JAN1990 Nala 11jan1990 11JAN1990 Rocco 08may1980 08MAY1980 Chata UNK .
What is Special in SAS PROC SQL? • The program data vector (PDV) • Some differences between data step and SQL syntax • Order of execution • How good is my code?
SELECT VAR1, VAR2, … FROM data_set [ FULL | LEFT | RIGHT | INNER ] JOIN TABLE2 ON condition 1 AND condition 2 WHERE condition k AND condition l GROUP BY VAR1, VAR…. HAVING condition p AND condition q ORDER BY var1, var2… ; The Order of Execution
PROC SQL NUMBER; SELECT a.breed, INPUT(a.dob,? date9.) AS ndob format date9., COALESE(a.name, b.name) AS name, b.parent FROM dogs a FULL JOIN parents b ON a.name=b.name; The Order of Execution cont’d Breed NDOB Name Parent 1 Rottweiler 15AUG2000 Bruja Ina 2 Rottweiler 29JAN1990 Chacal 3 Boxer Chata Ina 4 . Fidonga Ina 5 Pointer 15FEB1999 Jadzia Cecilia 6 Airdale 11JAN1990 Nala Ina 7 AMBR 22APR1997 Nina Cecilia 8 MMBNR 08MAY1980 Rocco Chavo
SELECT a.breed, INPUT( a.dob, ? date9.) AS ndob FORMAT date9., COALESCE (a.name, b.name) AS name, b.parent FROM dogs a FULL JOIN parents b ON a.name=b.name AND breed='Rottweiler'; The Order of Execution cont’d Breed NDOB Name Parent 1 Rottweiler 15AUG2000 Bruja Ina 2 Rottweiler 29JAN1990 Chacal 3 Boxer Chata Ina 4 . Fidonga Ina 5 Pointer 15FEB1999 Jadzia Cecilia 6 Airdale 11JAN1990 Nala Ina 7 AMBR 22APR1997 Nina Cecilia 8 MMBNR 08MAY1980 Rocco Chavo
SELECTa.breed,b.parent, COALESCE ( a.name, b.name) FROMdogs aFULL JOINparent b ON a.name=b.name WHERE breed='Rottweiler'; The Order of Execution cont’d Breed Parent Rottweiler Ina Bruja Rottweiler Chacal Rottweiler Tyson SELECTa.breed, b.parent, COALESCE(b.name, a.name) FROMdogs a FULL JOINparent b ON a.breed='Rottweiler' WHERE a.name=b.name; Breed Parent Rottweiler Ina Bruja
The Order of Execution cont’d SELECT a.breed, INPUT(a.dob, ? date9.) AS ndob format date9., COALESCE(a.name, b.name) ASname, b.parent FROM dogs aFULL JOINparent b ONa.name=b.name WHEREbreed='Rottweiler' HAVINGndobGT'01jan2000'd; Breed NDOB Name Parent Rottweiler 15AUG2000 Bruja Ina Rottweiler 15MAR2000 Tyson
How Good is My Code? • Does the options used in data step make my code faster? • Is speed important?
How Good is My Code? cont’d • Do data step options make my code faster? data all; merge dogs parent; by name; do i=1 to 100; do j=1 to 100; output; end; end; run; NOTE: There were 8 observations read from the data set WORK.DOGS. NOTE: There were 7 observations read from the data set WORK.PARENT. NOTE: The data set WORK.ALL has 891000 observations and 6 variables.
How Good is My Code? cont’d • Does the statements used to control the size of my input data set make my code faster? proc sql stimer; NOTE: SQL statement used: real time 0.01 seconds cpu time 0.02 seconds