750 likes | 793 Views
Chapter 4: Introduction to Lookup Techniques. Chapter 4: Introduction to Lookup Techniques. Objectives. Define table lookup. List table lookup techniques. Lookup Values. Table Lookups. lookup. Data Values. Lookup values for a table lookup can be stored in the following: array
E N D
Objectives • Define table lookup. • List table lookup techniques.
Lookup Values Table Lookups lookup Data Values • Lookup values for a table lookup can be stored in the following: • array • hash object • format • data set Lookup techniques include the following: • array subscript value • hash object key value • FORMAT statement, PUT function • MERGE, SET/SET, join
4.01 Multiple Choice Poll • Which of these is an example of a table lookup? • You have the data for January sales in one data set, February sales in a second data set, and March sales in a third. You need to create a report for the entire first quarter. • You want to send birthday cards to employees. The employees’ names and addresses are in one data set and their birthdates are in another. • You need to calculate the amount each customer owes for his purchases. The price per item and the number of items purchased are stored in the same data set.
4.01 Multiple Choice Poll – Correct Answer • Which of these is an example of a table lookup? • You have the data for January sales in one data set, February sales in a second data set, and March sales in a third. You need to create a report for the entire first quarter. • You want to send birthday cards to employees. The employees’ names and addresses are in one data set and their birthdates are in another. • You need to calculate the amount each customer owes for his purchases. The price per item and the number of items purchased are stored in the same data set.
Overview of Table Lookup Techniques • Arrays, hash objects, and formats provide an in-memory lookup table. • The DATA step MERGE statement, multiple SET statements in the DATA step, and SQL procedure joins use lookup values that are stored on disk.
Objectives • Describe arrays as a lookup technique. • Describe hash objects as a lookup technique. • Describe formats as a lookup technique.
4.02 Multiple Answer Poll • Which techniques do you currently use when you perform table lookups with a single data set? • Arrays • Hash object • Formats • None of the above
1 2 3 4 Overview of Arrays ... An array is similar to a numbered row of buckets.
1 2 3 4 Overview of Arrays ... • An array is similar to a numbered row of buckets. • SAS puts a value in a bucket based on the bucket number.
1 2 3 4 Overview of Arrays • An array is similar to a numbered row of buckets. • SAS puts a value in a bucket based on the bucket number. • A value is retrieved from a bucket based on the bucket number.
Overview of Arrays DATAdata-set-name; ARRAYarray-name{ subscript } <$><length> <array-elements> <(initial-value-list)>; < READ statement (s)> new-variable=array-name{subscript-value}; RUN; The ARRAY statement associates variables or initial values to be retrieved using the array name and a subscript value. The assignment statement retrieves values from the array based on the value of the subscript. General form of the ARRAY statement: The READ statement can be the SET, MERGE or INFILE/INPUT statement.
Overview of Arrays data country_info; array Cont_Name{91:96} $ 30 _temporary_ ('North America', ' ', 'Europe', 'Africa', 'Asia', 'Australia/Pacific'); set orion.country; Continent=Cont_Name{Continent_ID}; run; The assignment statement retrieves values from the array based on the value of the subscript. The ARRAY statement associates variables or initial values to be retrieved using the array name and a subscript value. p304d01
Setup for the Poll p304d01 data country_info; array Cont_Name{91:96} $ 30 _temporary_ ('North America', ' ', 'Europe', 'Africa', 'Asia', 'Australia/Pacific'); set orion.country; Continent=Cont_Name{Continent_ID}; run;
4.03 Multiple Choice Poll • In p304d01, how many elements are in the array Cont_name? • 0 • 5 • 6 • unknown
4.03 Multiple Choice Poll – Correct Answer • In p304d01, how many elements are in the array Cont_name? • 0 • 5 • 6 • unknown
Overview of a Hash Object Key Data Data ... A hash object is similar to rows of buckets that are identified by the value of a key.
Overview of a Hash Object Key Data Data ... • A hash object is similar to rows of buckets that are identified by the value of a key. • SAS puts value(s) in the data bucket(s) based on the value(s) in the key bucket.
Overview of a Hash Object Key Data Data • A hash object is similar to rows of buckets that are identified by the value of a key. • SAS puts value(s) in the data bucket(s) based on the value(s) in the key bucket. • Value(s) are retrieved from the data bucket(s) based on the value(s) in the key bucket.
Overview of Hash Objects The syntax within the DOgroup defines and canpopulate the hash object. DATAdata-set-name; < READ statement(s) > IF _N_=1 THEN DO; DECLARE HASH object-name(<attribute:value>); object-name.DEFINEKEY('key-name'); object-name.DEFINEDATA('data-name'); object-name.DEFINEDONE(); END; return-code=object-name.FIND(<key: value>); RUN; The FIND method retrieves the data value based on the key value. General form of the hash object: The READ statement can be the SET, MERGE, or INFILE/INPUT statement.
Overview of Hash Objects The syntax within the DO group defines and populates the hash object. data country_info; length Continent_Name $ 30; if _N_=1 then do; declare hash Cont_Name(dataset:'orion.continent'); Cont_Name.definekey('Continent_ID'); Cont_Name.definedata('Continent_Name'); Cont_Name.definedone(); end; set orion.country; rc=Cont_Name.find(key:Continent_ID); if rc=0; run; The FIND method retrieves the data value based on the key value. p304d02
Setup for the Poll p304d02 data country_info; length Continent_Name $ 30; if _N_=1 then do; declare hash Cont_Name(dataset:'orion.continent'); Cont_Name.definekey('Continent_ID'); Cont_Name.definedata('Continent_Name'); Cont_Name.definedone(); end; set orion.country; rc=Cont_Name.find(key:Continent_ID); if rc=0; run;
4.04 Multiple Choice Poll • In p304d02, how many times do the statements in the DO group execute? • only once • once for every observation in the data set orion.country • once for every observation in the data set orion.continent
4.04 Multiple Choice Poll – Correct Answer • In p304d02, how many times do the statements in the DO group execute? • only once • once for every observation in the data set orion.country • once for every observation in the data set orion.continent
Data Value Label Overview of a Format ... A format is similar to rows of buckets that are identified by the data value.
Data Value Label Overview of a Format ... • A format is similar to rows of buckets that are identified by the data value. • SAS puts data values and label values in the buckets when the format is used in a FORMAT statement, PUT function, or PUT statement.
Data Value Label Overview of a Format • A format is similar to rows of buckets that are identified by the data value. • SAS puts data values and label values in the buckets when the format is used in a FORMAT statement, PUT function, or PUT statement. • SAS uses a binary search on the data valuebucket in order to return the value in the label bucket.
Overview of a Format The FORMAT stepcompiles the formatand stores it on disk. PROC FORMAT; VALUE <$>fmtnamerange-1=label-1 . . . range-n=label-n; RUN; DATAdata-set-name; < READ statement(s)>; new-variable=PUT(variable,fmtname.); RUN; When the PUT function executes, the format is loaded into memory, and a binary search is used to retrieve the format value. General form of the user-defined format: The READ statement can be the SET, MERGE, or INFILE/INPUT statement.
Overview of a Format The FORMAT step compiles the format and stores it on disk. proc format; value Cont_Name 91='North America' 93='Europe' 94='Africa' 95='Asia' 96='Australia/Pacific'; run; data country_info; set orion.country; Continent=put(Continent_ID,Cont_Name.); run; When the PUT function executes, the format is loaded into memory, and a binary search is used to retrieve the format value. p304d03
Objectives • List methods for combining data horizontally. • Use multiple SET statements to combine data horizontally. • Compare methods for combining SAS data sets.
Combining Data Horizontally • DATA step techniques for combining data horizontally include using the following: • MERGE statement • multiple SET statements • UPDATE statement • MODIFY statement In addition, you can use the SQL procedure with an inner or outer join.
4.05 Multiple Answer Poll • Which techniques do you currently use when you perform table lookups with multiple data sets? • MERGE statement • Joins • Multiple SET statements • UPDATE statement • MODIFY statement • None of the above
Overview of Merges and Joins By Value(s) Data Data By Value(s) Data Data The DATA step MERGE and the SQL join operators are similar to multiple stacks of buckets that are referred to by the value of one or more common variables.
Continent_ID DATA Step MERGE Statement DATAdata-set-name; MERGESAS-data-sets; BYvariables; RUN; Continent_ID Continent_ID General form of the DATA step merge: Matches on equal values for like-named variables:
DATA Step MERGE Statement proc sort data=orion.country out=country; by Continent_ID; run; data country_info; merge country orion.continent; by Continent_ID; run; Matches on equal values for like-named variables p304d04
Setup for the Poll p304d04 proc sort data=orion.country out=country; by Continent_ID; run; data country_info; merge country orion.continent; by Continent_ID; run;
4.06 Multiple Choice Poll • In p304d04, if the data set country has seven observations and the data set orion.continent has five observations, what stops the execution of the DATA step? • end of file for work.country, the data set with the most observations • end of file for orion.continent, the last data set listed in the MERGE statement • end of file for the data set that contains the final value of the BY variable Continent_ID
4.06 Multiple Choice Poll – Correct Answer • In p304d04, if the data set country has seven observations and the data set orion.continent has five observations, what stops the execution of the DATA step? • end of file for work.country, the data set with the most observations • end of file for orion.continent, the last data set listed in the MERGE statement • end of file for the data set that contains the final value of the BY variable Continent_ID
The SQL Procedure You can use an SQL procedure inner or outer join to create a SAS data set. General form of the SQL procedure CREATE TABLE statement with an inner join: PROC SQL; CREATETABLESAS-data-setAS SELECTcolumn-1, column-2,… ,column-n FROM table-1, table-2,…,table-n WHERE joining criteria ORDER BY sorting criteria; QUIT; Performs an inner join based on the WHERE criteria
The SQL Procedure proc sql; create table country_info as select country.*, Continent_Name from orion.country, orion.continent where country.Continent_ID= continent.Continent_ID; order by country.Continent_ID; quit; Performs an inner join where the Continent_ID values from both data sets are equal p304d05