Indexing Techniques

Indexing Techniques Storage Technology: Topic 4

Indexes • An index on a collection of records speeds up selections on the search key fields. • Any subset of the fields of a record can be the search key for an index on the collection. • An index is a collection of index entries. • Retrieve all entries k* with key value k • Retrieve all entries k* between two key values • Retrieve entries in search key order

Alternatives for Data Entry k* in Index • Three alternatives: • Data record with search key value k • issue : how much data repetition? • Issue: is this simply a fancy file format? • <k, rid of data record with search key value k> • <k, list of rids of data records with search key k> • Our focus: alternative 2. • Examples of indexing techniques: B+ trees, hash-based structures

Index Classification: Clustering • Clustered vs. unclustered: If order of data records is the same as, or ``close to’’, order of data entries, then called clustered index. • At most one independent clustered index. • Cost of retrieving data through index varies greatly based on whether index is clustered or not! Why? • Usually, clustering desired for sorted access.

Clustered vs. Unclustered Index Title: es_f52.fig Creator: /s/transfig-3.1.1/exe/fig2dev Version 3.1 Patchlevel 1 CreationDate: Wed Oct 11 19:11:29 1995

Sparse Clustering • Dense vs. Sparse: If there is at least one data entry per search key value (in some data record), then dense. • Every sparse index is clustered! Title: l3_f1.fig Creator: /s/transfig-3.1.1/exe/fig2dev Version 3.1 Patchlevel 1 CreationDate: Wed Sep 6 17:49:58 1995

Primary/Secondary Indexes • Definition 1: Primary == Clustered • Definition 2: Primary == search key contains primary key of the relation • We will use Definition 2

Tree-Structured Indexing • Tree-structured indexing techniques support both range searches and equality searches • ``Find all students with gpa > 3.0’’ • If data is in sorted file, use binary search. • Simple idea: Create an `index’ file. • Can do binary search on (smaller) index file!

ISAM • Index file may still be quite large. But we can apply the idea repeatedly! • Leaf pages contain data entries.

Comments on ISAM • File creation: Leaf pages allocated sequentially, sorted by search key; then index pages allocated, then space for overflow pages. • Index entries: <search key value, page id>; they `direct’ search for data entries, which are in leaf pages. • Search: Start at root; use key comparisons to go to leaf. Cost log F N ; F = # entries/index pg, N = # leaf pgs • Insert: Find leaf data entry belongs to, and put it there. • Delete: Find and remove from leaf; if empty overflow page, de-allocate. • Static tree structure: inserts/deletes affect only leaf pages.

Example ISAM Tree • Each node can hold 2 entries; no need for `next-leaf-page’ pointers. (Why?)

After Inserting 23*, 48*, 41*, 42* ...

... Then Deleting 42*, 51*, 97* • Note that 51* appears in index levels, but not in leaf!

Indexing Techniques