1.53k likes | 1.55k Views
Learn about the uses and creation of indexes in SAS, and how they provide direct access to specific observations, yielding faster access, sorted order, and table lookup operations.
E N D
Objectives • Define indexes. • List the uses of indexes. • Use the DATA step to create indexes. • Use PROC DATASETS to create and maintain indexes. • Use PROC SQL to create and maintain indexes.
3.01 Multiple Answer Poll • Do any of the data files that you use have indexes? • Yes, my SAS data sets have indexes. • Yes, I use data from an RDBMS (such as Oracle, Teradata, Sybase, or DB2) that has indexes. • No, none of the data that I use has indexes.
Using Indexes An index is an optional file that you can create for a SAS data file that does the following: • points to observations based on the values of one or more key index variables • provides direct access to specific observations An index locates an observation by value.
Simplified Index File The index file consists of entries that are organized in a tree structure and connected by pointers. Simplified Index
The Purpose of Indexes • Indexes can provide direct access to observations in SAS data sets to accomplish the following: • yield faster access to small subsets (WHERE) • return observations in sorted order (BY) • perform table lookup operations (SET with KEY=) • join observations (PROC SQL) • modify observations (MODIFY with KEY=)
Why Use an Index? data customer14958; set orion.sales_history; where Customer_ID=14958; run; How is data processed if the input data is not indexed?
Reading SAS Data Sets without an Index Input SAS Data ...
Buffers Reading SAS Data Sets without an Index Input SAS Data The number of buffersavailable affects the I/O. Datapages areloaded. ...
Buffers Reading SAS Data Sets without an Index Input SAS Data The WHERE statement selects observations by reading data sequentially. Datapages areloaded. ...
Buffers Reading SAS Data Sets without an Index Input SAS Data The WHERE statement selects observations by reading data sequentially. Datapages areloaded. PDV ...
Buffers Buffers Reading SAS Data Sets without an Index Input SAS Data The WHERE statementselects observations by reading data sequentially. Datapages areloaded. PDV ...
Input SAS Data Buffers OutputSASData Buffers Reading SAS Data Sets without an Index The WHERE statementselects observations by reading data sequentially. Datapages areloaded. PDV
Why Use an Index? data customer14958; set orion.sales_history; where Customer_ID=14958; run; How is data processed if the input data is indexed?
Reading SAS Data Sets with an Index Index Input SAS Data ...
Index Index Reading SAS Data Sets with an Index The index fileis checked. Input SAS Data ...
Buffers Reading SAS Data Sets with an Index The index fileis checked. Index Index Input SAS Data Onlynecessarypages areloaded. ...
Buffers Reading SAS Data Sets with an Index The index fileis checked. Index Index Input SAS Data The WHERE statementselects observations by using direct access. Onlynecessarypages areloaded. ...
Buffers Reading SAS Data Sets with an Index The index fileis checked. Index Index Input SAS Data The WHERE statementselects observations by using direct access. Onlynecessarypages areloaded. PDV ...
Buffers Buffers Reading SAS Data Sets with an Index The index fileis checked. Index Index Input SAS Data The WHERE statementselects observations by using direct access. Onlynecessarypages areloaded. PDV ...
Buffers Input SAS Data OutputSASData Buffers Reading SAS Data Sets with an Index The index fileis checked. Index Index The WHERE statementselects observations by using direct access. Onlynecessarypages areloaded. PDV
Number of Index Buffers (Self-Study) Index Buffer Index Input SAS Data Buffers PDV Buffers OutputSASData The buffer size of a SAS index is the unit of data transfer between the SAS storage device and main memory.
Controlling the Number of Index Buffers (Self-Study) IBUFNO=n | nK | nM | nG | nT You can use the IBUFNO= system option to control the number of index buffers that are simultaneously open in memory.
Buffers OutputSASData Buffers How Is the Index File Checked? The index fileis checked. Index Index Input SAS Data When an index is used,a binary search is doneon the index file. PDV
Using a Binary Search Simplified Index File where Customer_ID=14958; ...
Using a Binary Search Simplified Index File Is 14958 in the top half or the bottom half? where Customer_ID=14958; ...
Using a Binary Search Simplified Index File Is 14958 in the top half or the bottom half? where Customer_ID=14958; ...
Using a Binary Search Simplified Index File where Customer_ID=14958; ...
Using a Binary Search Simplified Index File where Customer_ID=14958;
3.02 Multiple Choice Poll • If a WHERE statement uses an index to retrieve a small subset of data, which of these resources is conserved? • I/O • Disk space • Memory • Programmer time
3.02 Multiple Choice Poll – Correct Answer • If a WHERE statement uses an index to retrieve a small subset of data, which of these resources is conserved? • I/O • Disk space • Memory • Programmer time
Business Scenario • The SAS data setorion.sales_history is often queried with a WHERE statement.
Business Scenario • You need to create three indexes on the most frequently used subsetting columns.
Customer_ID Order_ID Product_ID Product_Group Customer_ID Product_Group SaleID Creating an Index Key variables in orion.sales_history Indexes in the indexfile for orion.sales_history sales_history.sas7bdat sales_history.sas7bndx Directory-based Index File Naming Conventions
Index Terminology There are two types of indexes.
Index Terminology Index options include the following: The concatenation of the values for Order_ID andProduct_ID forms a unique identifier for a row of data.
Index Terminology • Index options include the following: • If there is a large number of missing values for the key variable(s), the NOMISS option can create a smaller index file. • An index created with the NOMISS option is not used for the following processing: • a BY statement • a WHERE expression satisfied by missing values NOMISS cannot be used when you create indexes using PROC SQL.
3.03 Multiple Answer Poll • On which of these indexed variables can you assign the UNIQUE option? • Customer_ID in an orders data set where a customer can place multiple orders • Order_Date in an orders data set • Employee_ID in a data set containing each individual employee and the family members’ names stored in variables Dependent1 –Dependent10 • Product_ID in a data set containing the product identifier and the product description
3.03 Multiple Answer Poll – Correct Answers • On which of these indexed variables can you assign the UNIQUE option? • Customer_ID in an orders data set where a customer can place multiple orders • Order_Date in an orders data set • Employee_ID in a data set containing each individual employee and the family members’ names stored in variables Dependent1 –Dependent10 • Product_ID in a data set containing the product identifier and the product description
Creating Indexes • To create indexes at the same time that you create a data set, use the INDEX= data set option on the output data set. • To create or delete indexes on existing data sets, use one of the following: • DATASETS procedure • SQL procedure
Creating Indexes • When you create the index, do the following: • designate the key variable(s) • specify the UNIQUE and/or the NOMISS index option if appropriate • select a valid SAS name for the index (composite index only) • A data set can have these index features: • multiple simple and composite indexes • character and numeric key variables
Creating an Index with the INDEX= Data Set Option options msglevel=i; data orion.sales_history(index= (Customer_ID Product_Group SaleID=(Order_ID Product_ID)/unique)); set orion.history; Value_Cost=CostPrice_Per_Unit*Quantity; Year_Month=mdy(Month_Num, 15, input(Year_ID,4.)); format Value_Cost dollar12. Year_Month monyy7.; label Value_Cost="Value Cost" Year_Month="Month/Year"; run; dataorion.sales_history; set orion.sales_history; run; p303d01 The following code would delete the indexes:
Creating an Index with the INDEX= Data Set Option SAS-data-file-name (INDEX =(index-specification-1</option> </option> …<index-specification-n</option> </option>>)); General form of the INDEX= data set option: For increased efficiency, use the INDEX= option to create indexes when you initially create a SAS data set.
11 options msglevel=i; • 12 data orion.sales_history(index= • 13 (Customer_ID Product_Group • 14 SaleID=(Order_ID • 15 Product_ID)/unique)); • set orion.sales_history; • 17 run; • NOTE: There were 1500 observations read from the data set ORION.SALES_HISTORY. • NOTE: The data set ORION.SALES_HISTORY has 1500 observations and 22 variables. • NOTE: Composite index SaleID has been defined. • NOTE: Simple index Product_Group has been defined. • NOTE: Simple index Customer_ID has been defined. Viewing Information about Indexes OPTIONS MSGLEVEL=N | I; To display information in the log concerning index creation or index usage, change the value of the MSGLEVEL= system option from its default value of N to I. General form of the MSGLEVEL= system option:
proc datasets library=orion nolist; modify sales_history; index delete Customer_ID Product_Group SaleID; quit; Managing Indexes with PROC DATASETS options msglevel=n; proc datasets library=orion nolist; modify sales_history; index create Customer_ID; index create Product_Group; index create SaleID=(Order_ID Product_ID)/unique; quit; p303d02 The following code would delete the indexes: