250 likes | 383 Views
Combining Data Sets with SQL. Advanced SAS Programming. Joins. SQL joins refer, in general, to a horizontal combination of tables: i.e.: some form of record matching. This makes SQL joins similar to merging in the data step. A very simple join…. Simple Join. proc sql ;
E N D
Combining Data Sets with SQL Advanced SAS Programming
Joins • SQL joins refer, in general, to a horizontal combination of tables: i.e.: some form of record matching. • This makes SQL joins similar to merging in the data step. • A very simple join…
Simple Join procsql; createtable join1 as selectgoals.*, performance.* from advsas.goals, advsas.performance ; quit; The * operator selects all columns Lists tables for selection
Log • What is the nature of the warning that we see? • Merge expects (requires) variables that we join on to have the same name, SQL has no such requirement.
Consider… procsql; createtable join2 as selectgoals.*, performance.month as month1, performance.sales from advsas.goals, advsas.performance ; quit; • Produces no warning, and 4 variables in the resulting data set. • Should also give some additional notion as to what process SQL undertook in joining these tables.
Cartesian Product • The default joining of multiple data sets listed in the from statement is a full Cartesian product—every record in one table is paired with every other. • Most of the time we want a subset of that product…
More Specific Joins • One way to subset the full product is with a where statement. procsql; createtable join3 as selectgoals.*, performance.sales from advsas.goals, advsas.performance where goals.month eq performance.month ; quit; Select no longer chooses month from performance Subset to those that match on month
Final Product • Result is similar to a data step merge on month…
Inner Join • This is also known as an inner join, which takes rows that match on the specified condition. procsql; createtable join4 as selectgoals.*, performance.sales from advsas.goals join advsas.performance on goals.month eq performance.month ; quit; Join is used as an operator (inner is default) On replaces where as the conditioning clause
Inner Joins vs. Data Step Merge • Inner joins and data step merges are not identical. Consider… procsql; createtable join as selectcrewlist.*, sched.flightnum from advsas.crewlist, advsas.sched where crewlist.empid eq sched.empid ; quit; versus data merge; merge advsas.crewlist advsas.sched; by empid; run;
Results • Inner join takes matches present across all tables, merge preserves all records. • We can get this behavior with merge by using the in= variable
Outer Joins • Outer joins permit records lacking matches into the final table. procsql; createtable joinfull as selectcrewlist.*, sched.flightnum from advsas.crewlist full join advsas.sched on crewlist.empid eq sched.empid ; quit; • Creates the same result as the previous merge.
Left and Right Join • The left join preserves all rows from the first table listed. • Hopefully the right join is self-explanatory… • Consider the following joins and their results.
procsql; createtable joinleft as selectcrewlist.*, sched.flightnum from advsas.crewlist left join advsas.sched on crewlist.empid eq sched.empid ; quit; procsql; createtable joinright as selectcrewlist.*, sched.flightnum from advsas.crewlist right join advsas.sched on crewlist.empid eq sched.empid ; quit; Left & Right Joins Left table had row with no match Right table did not
One-to-Many • SQL behavior is not affected by one-to-many matches. • Consider a merge of the accounts and transaction tables against the possible SQL joins.
One to Many Merge procsortdata=advsas.accounts out=acct; by account_no; run; procsortdata=advsas.transactions out=trans; by account_no; run; data one_many_merge; merge trans acct; by account_no; run;
Inner Join procsql; createtable one_many_inner as selecttransactions.*, fname, lname from advsas.transactions, advsas.accounts where transactions.account_no eq accounts.account_no ; quit;
Full Join procsql; createtable one_many_full as selecttransactions.*, fname, lname from advsas.transactions full join advsas.accounts on transactions.account_no eq accounts.account_no ; quit;
Left Join (transactions on left) procsql; createtable one_many_left as selecttransactions.*, fname, lname from advsas.transactions left join advsas.accounts on transactions.account_no eq accounts.account_no ; quit;
Right Join (transactions on left) procsql; createtable one_many_right as selecttransactions.*, fname, lname from advsas.transactions right join advsas.accounts on transactions.account_no eq accounts.account_no ; quit;
Many-to-Many • The data step does not handle many-to-many merges in what would generally be considered a “useful” manner. • Consider the following merge: data many_to_many; merge advsas.salesstaff advsas.clients; by district_no; run;
Result of many-to-many data step merge (check notes in log). When multiple records have same by value in both data sets, matches one-to-one then one-to-many (see PowerPoint on merge for processing details)
SQL • The SQL join: procsql; createtable many2many as selectsalesstaff.*, company from advsas.salesstaff, advsas.clients where salesstaff.district_no eq clients.district_no ; quit;
Result Cartesian product is formed for all records with a matching key value Is the result different for any of the outer joins?