360 likes | 672 Views
Module 5: SAS/SQL. 15 July 2009. Introduction - What is PROC SQL. Proc SQL is the SAS implementation of SQL Proc SQL is a powerful SAS procedure that combines the functionality of the SAS data step with the SQL language Proc SQL can sort, subset, merge and summarize data – all at once
E N D
Module 5: SAS/SQL 15 July 2009
Introduction - What is PROC SQL • Proc SQL is the SAS implementation of SQL • Proc SQL is a powerful SAS procedure that combines the functionality of the SAS data step with the SQL language • Proc SQL can sort, subset, merge and summarize data – all at once • Proc SQL can combine standard SQL functions with virtually all SAS functions • Proc SQL can work remotely with RDBMS such as Oracle
PROC SQL – What can do? • To perform a query – Using SELECT statement. • To save queried result into SAS dataset – Using CREATE TABLE statement • To save the query itself – Using CREATE VIEW statement • To sort dataset • To merge more than one datasets in a number of ways • To import dataset from Oracle Clinical to SAS • To enter new records into a SAS dataset • To modify/ edit the SAS dataset
PROC SQL - Why • The Advantage of using SQL • Combined functionality • Faster for smaller tables • SQL code is more portable for non-SAS applications • Not require presorting • Not require common variable names to join on. (need same type , length)
Performing Query – SELECT Statement • It is used to perform a query. It does not create any dataset. • The simplest SQL code, need 3 statements • By default, it will print the resultant query, use NOPRINT option to suppress this feature • Begin with PROC SQL, end with QUIT; not RUN; • Need at least one SELECT… FROM statement
Performing Query – SELECT Statement To select all the variables use ‘*’ after SELECT statement PROC SQL; SELECT * FROM VITALS; QUIT;
Performing Query – SELECT Statement To select only particular variable(s) write down the variable names after SELECT statement. Variable names should be separated by commas. PROCSQL; SELECT Patient, pulse FROM VITALS; QUIT;
Performing Query – SELECT Statement To select only distinct observations and to delete duplicate observations. PROCSQL; SELECT DISTINCT Patient FROM VITALS; QUIT;
Ordering/Sorting Query Results PROCSQL ; SELECT * FROM Vitals ORDER BY date; QUIT; Sorting by Date • SELECT * means we select all variables from dataset VITALS • Put ORDER BY after FROM.
Subsetting: - Character searching in WHERE Print observations with name containing ‘J’. PROCSQL; SELECT * FROM vitals WHERE Name CONTAINS 'J'; QUIT; • Always put WHERE after FROM • CONTAINS in WHERE statement only for character variables
Subsetting- Character searching in WHERE Print observations with name containing ‘o’ in between. PROCSQL; SELECT * FROM vitals WHERE Name LIKE ‘%o%'; QUIT; • LIKE in WHERE statement only for character variables
Creating New Data • In SELECT, the results of a query are converted to an output object (printing). • Query results can also be stored as data. • The CREATE TABLE statement creates a table with the results of a query. • The CREATE VIEW statement stores the query itself as a view. Either way, the data identified in the query can beused in later SQL statements or in other SAS steps.
Creating New Data - Create Table PROCSQL; CREATE TABLE bp AS SELECT patient, date, pulse FROM Vitals WHERE temp>98.5; QUIT; CREATE TABLE … AS… Statement Creates a New table from an existing table. PROCSQL; CREATE TABLE bp ASSELECT * FROM Vitals WHERE temp>98.5; QUIT; These statements will copy all the variables to the new dataset
Creating New Data - Create Table We can also assign different variable name, Label, Length, and format name PROCSQL; CREATE TABLE bp AS SELECT patient AS Patient LABEL='Subject number' LENGTH =5, date AS Date LABEL='Date of Expt' FORMAT=WORDDATE8., pulse, temp FROM Vitals WHERE temp>98.5; QUIT;
Creating New Data - Create View PROC SQL; CREATEVIEW bp AS SELECT patient, date, pulse, temp FROM Vitals; WHERE temp>98.5 QUIT; • First step-creating a view,no output is produced. • When a table is created, the query is executed and the resulting data is stored in a file. When a view is created, the query itself is stored in the file. The data is not accessed at all in the process of creating a view.
Source variable New Variable Case Logic - reassigning/recategorize PROCSQL; CREATE TABLE BP AS SELECT Patient, Pulse, CASE Patient WHEN101THEN 'Male' WHEN102THEN 'Female' WHEN103THEN 'Female' ELSE 'Male' ENDAS Gender FROM Vitals; QUIT; • The order of each statement is important • CASE …END AS should in between SELECT and FROM • Use WHEN … THEN ELSE… to redefine variables • New variable GENDER is created from PATIENT.
Combining Datasets: Joins InnerJoin Full Join If a and b; If a or b; Right Join Left Join If b; If a;
Combining Datasets: Joins Dataset: Dosing
Combining Datasets: Joins Dataset: Vitals
Join Tables (Merge datasets) - Inner Join: Using WHERE PROCSQL; CREATETABLE new AS SELECT dosing.patient, dosing.date, dosing.med, vitals.pulse, vitals.temp FROM dosing, vitals WHERE dosing.patient=vitals.patient AND dosing.date=vitals.date; QUIT; • No prior sorting required – one advantage over DATA MERGE • Use comma (,) to separate two datasets in FROM • Without WHERE, all possible combinations of rows from each tables is produced, all columns are included
Join Tables (Merge datasets) - Left Joins using ON PROCSQL; CREATETABLE new1 AS SELECT dosing.patient, dosing.date, dosing.med, vitals.pulse, vitals.temp FROM dosing LEFT JOIN vitals ON dosing.patient=vitals.patient AND dosing.date=vitals.date; QUIT; Resultant dataset will contain all & only those observations which comes from DOSING dataset.
Join Tables (Merge datasets) - Right Joins using ON PROCSQL; CREATE TABLE new1 AS SELECT dosing.patient, dosing.date, dosing.med, vitals.pulse, vitals.temp FROM dosing RIGHT JOIN vitals ON dosing.patient=vitals.patient AND dosing.date=vitals.date; QUIT; Resultant dataset will contain all & only those observations which comes from VITALS dataset.
Join Tables (Merge datasets) - Full Joins using ON PROCSQL; CREATE TABLE new1 AS SELECT dosing.patient, dosing.date, dosing.med, vitals.pulse, vitals.temp FROM dosing FULL JOIN vitals ON dosing.patient=vitals.patient AND dosing.date=vitals.date; QUIT; Resultant dataset will contain all observation if they come from at least one of the datasets.
SQL Functions • PROC SQL supports almost all the functions available to the SAS DATA step that can be used in a proc sql select statement • Common Functions: • COUNT • DISTINCT • MAX • MIN • SUM • AVG • VAR • STD • STDERR • NMISS • RANGE • SUBSTR • LENGTH • UPPER • LOWER • CONCAT • ROUND • MOD
PROC SQL functions PROCSQL; SELECT avg(Age) AS mean, std(Age) AS sd, min(Age) AS min, max(Age) AS max, count(Age) AS count, N (Age) AS Count FROM sashelp.class; quit;
PROC SQL functions PROCSQL; SELECT sex, avg(Age) AS mean, std(Age) AS sd, min(Age) AS min, max(Age) AS max, count(Age) AS count, N (Age) AS Count FROM sashelp.class; GROUP BY Sex quit;
Editing Data – Deleting rows and Dropping columns /*Deleting rows*/ PROCSQL; DELETE FROM class WHERE age le 13; QUIT; /*Droping variables*/ PROCSQL; CREATE TABLE New (DROP=age) AS SELECT * FROM Class; QUIT; • Deleting columns can be done in SELECT or in DROP on created table
Importing data from Excel to SAS PROCIMPORT DATAFILE="c:\myfile.xls" OUT=sasuser.accounts DBMS=EXCEL2000 REPLACE; SHEET='Prices'; GETNAMES=yes; RUN;
Finale • PROC SQL is an additional tool with its own strengths and challenges • Many times it is just another way to do the same thing • BUT other times it might be much more efficient and may cut down the number of sorts, data steps & procedures or lines of code required.