290 likes | 416 Views
S Q L Chapter Two. Overview. Basic Structure Verifying Statements Specifying Columns Specifying Rows . Introduction SQL is a modular language that uses statements and clauses. . Basic structure of PROC SQL :. PROC SQL ; statement (select)
E N D
Overview • Basic Structure • Verifying Statements • Specifying Columns • Specifying Rows
IntroductionSQL is a modular language that uses statements and clauses.
Basic structure of PROC SQL: PROC SQL; statement (select) clauses (from, where, group by, having, order by); QUIT; Note: place semicolon at the end of the last clause only.
Statements select- specifies the columns to be selected Select statement has the following features: -selects data that meets certain conditions -groups data -specifies an order for the data -formats data -calculates new variables
Clauses from - specifies the tables to be queried where- subsets the data based on a condition - optional group by - classifies the data into groups - optional having- subsets groups of data based on a group condition order by- sorts row by the values of specific columns Note: the order of the clauses are significant.
Overview • Basic Structure • Verifying Statements • Specifying Columns • Specifying Rows
Verifying Statements Two functions that can be used to verify if your statement syntax are: validate- used to check the select statement syntax noexec - checks for invalid syntax in all types of SQL statements
proc sql;validateselect timemile, restpulse, maxpulsefrom project.fitnesswhere timemile gt 7; NOTE: PROC SQL statement has valid syntax. proc sql;validateselect timemile, restpulse, maxpulse,from project.fitnesswhere timemile gt 7; Syntax error, expecting one of the following: a quoted string, !, !!, &... Validate
NoExect proc sql noexec;select timemile, restpulse, maxpulsefrom project.fitnesswhere timemile gt 7; NOTE: Statement not executed due to NOEXEC option.
Features of validate: -tests syntax of query without executing the query -checks the validity of column name -prints error messages for invalid queries -is only used for select statements Features of noexec: -Checks for invalid syntax in all types of SQL statements Contrasting
Overview • Basic Structure • Verifying Statements • Specifying Columns • Specifying Rows
Specifying Columns Objectives -Displaying columns directly from a table -Displaying columns calculated from other columns -Calculating columns using a CASE expression
PROC SQL; SELECT* FROM VITALS; QUIT; PATIENT PULSE TEMP BPS BPD 101 72 98.5 130 88 101 75 98.6 133 92 101 74 98.5 136 90 102 81 99 141 93 102 77 98.7 144 97 102 78 98.7 142 93 103 77 98.3 137 79 103 77 98.5 133 74 103 78 98.6 140 80 103 75 99.2 147 89 104 72 98.8 128 83 104 69 99.1 131 86 Displaying data from a tableTo print all of a table columns in the order that they were stored, use an asterisk in the SELECT statement:
PROC SQL;CREATE TABLE TESTMED ASSELECT PATIENT,CASE ((PATIENT/2 = INT(PATIENT/2)) +(PATIENT = .))WHEN 1 THEN 'Med A' WHEN 0 THEN 'Med B' ELSE 'Error' END AS DOSEGRPLENGTH=5FROM VITALSORDER BY PATIENT;QUIT; PATIENT DOSEGRP 101 Med B 101 Med B 101 Med B 102 Med A 102 Med A 102 Med A 103 Med B 103 Med B 103 Med B 103 Med B 104 Med A 104 Med A 104 Med A Printing Specify ColumnsIf you do not want to print out all columns in a table in the order that they were stored, you can specify the columns to be printed in the order that you want them in the SELECT statement or CASEEXPRESSION in the select statement .
Calculate the proportion of Units form each country CODE: OUTPUT: Calculating ColumnsWe can calculate a new column by using data in an existing column and then naming the new column using the as function.
Calculated columns using SAS Dates Recallfrom previous chapters in our SAS book that dates are stored in a different format when run through SAS. We will then use these dates to calculate new columns.
Example:Calculate the range of dates in a Dailyprices dataset. CODE: OUTPUT:
CODE: OUTPUT: Creating new columnsThe use of CASE expression can be used to create a new column
proc sql;create tablestates asselect state_code, state_name from d2data.state;quit; State_ Obs Code State_Name 99 UT Utah 100 VT Vermont 101 VA Virginia 102 WA Washington 103 WV West Virginia 104 WI Wisconsin 105 WY Wyoming 106 N/A Creating a tableTo create and populate a table with the rows from an SQL query, use create table.
Overview • Basic Structure • Verifying Statements • Specifying Columns • Specifying Rows
Specifying Rows in a table Objectives -Selecting a subset of rows -Removing duplicate rows -Subsetting using where clauses, escape clauses, and calculated values
proc sql;title 'large orders'; select Product_ID, total_retail_price from d2data.order_item where total_retail_price > 1000; quit; Large orders Total Retail Price Product ID For This Product 240200100076 $1,796.00 240400200097 $1,250.40 240100400043 $1,064.00 240200200013 $1,266.00 240300100032 $1,200.20240300300070 $1,514.40 230100700009 $1,687.50 230100700008 $1,542.60 240300300090 $1,561.80 230100700009 $1,136.20 230100200025 $1,103.60 240200100173 $1,937.20 Selecting a subset of rows
Where clauseUse a where to specify a condition that data must fulfill before being selected.CODE: OUTPUT: Where clauses uses common comparisons (lt, gt, eq, etc) and logical operators (OR, Not, And, In, Is Null, ...).
Removing duplications Use distinct keyword to eliminate duplications. CODE (without DISTINCT): CODE (withDISTINCT): OUTPUT:
Escape ClauseThe escape clause allows you to designate a single character that will indicate how proc sql will interpret LIKE wildcards when SAS is searching within a character string. CODE: Example: Selectobservations from a string variable containing an underscore ('_'). OUTPUT:
Subsetting calculated values Since the where clause is evaluated before the select, it's possible for an error to show up since the columns used in the where clause must exist in the table or be derived from an existing column. There are two fixes for this, the first would be repeating the calculation in the where clause. The alternative method would be using CALCULATED keyword to refer to an already calculated column in the select.
proc sql;title 'Lack of profit'; select Product_ID, ((total_retail_price/quantity) - costprice_per_Unit) as profit from d2data.order_item where calculatedprofit < 3;quit;title; Lack of profit Product ID profit 230100500045 0.7 230100500068 0.9 240100100433 1.85 240700200004 2 240200100021 1.5 240100100031 2.4 240700200007 2.9 240100100232 1.9 230100500004 1.85 230100500004 1.85 240700100017 -1.41 Subsetting calculated values
Summary Basic Structure • PROC SQL; statement (select) clauses (from, where, group by, having, order by); QUIT; Verifying Statements • validate - used to check the select statement syntax • noexec- checks for invalid syntax in all types of SQL statements Specifying Columns • Displaying columns directly from a table • Displaying columns calculated from other columns • Calculating columns using a CASE expression SpecifyingRows • Selecting a subset of rows • Removing duplicate rows • Subsetting using where clauses, escape clauses, and calculated values