140 likes | 407 Views
Indexing. By: Arnold Mesa. Indexing. You can think of an index to a file like a catalogue to a library. There are two kinds. Ordered Indices - sorted ordering of the values.
E N D
Indexing By: Arnold Mesa
Indexing You can think of an index to a file like a catalogue to a library
There are two kinds... • Ordered Indices - sorted ordering of the values. • Hash Indices - a uniform distribution of values across a range of buckets. The distribution is based on a hash function.
Key Concepts • Access Types - types of access that are supported efficiently • Access Time - time it takes to access a particular data item • Insertion Time - time it takes to insert a data item • Deletion Time - time it takes to delete a data item • Space Overhead - additional space occupied by an index structure
There are two kinds of ordered indices • Dense Index - An index record appears for every search-key value in the file. The index record contains the search-key value and a pointer to the first data record. The rest of the records with the same search key-value would be sequentially stored after the first record. • Sparse Index - An index record appears for only some of the search key values. So you have a smaller number of index records. Each index contains a search key and a pointer to the first record, as with the dense index.
234 Hotel Sofitel A-212 321 Hilton B-321 389 Hilton C-002 396 Hilton A-322 112 Westin C-034 253 Westin B-219 501 Marriot B-069 532 Marriot C-304 221 The Ritz A-007 Dense Index Hotel Sofitel Hilton Westin Marriot The Ritz
234 Hotel Sofitel A-212 321 Hilton B-321 389 Hilton C-002 396 Hilton A-322 112 Westin C-034 253 Westin B-219 501 Marriot B-069 532 Marriot C-304 221 The Ritz A-007 Sparse Tree Hotel Sofitel Westin The Ritz
234 Hotel Sofitel A-212 321 Hilton B-321 389 Hilton C-002 396 Hilton A-322 112 Westin C-034 253 Westin B-219 501 Marriot B-069 532 Marriot C-304 221 The Ritz A-007 Suppose we want to find the Marriot #532... Hotel Sofitel Westin The Ritz
Efficiency Issues • Even if we use a sparse index, the index itself may become too large for efficient processing • If an index is sufficiently small to be kept in main memory, the search time would be low • If the index is large that is kept on disk, a search may require several disk block reads
234 Hotel Sofitel A-212 321 Hilton B-321 389 Hilton C-002 396 Hilton A-322 112 Westin C-034 253 Westin B-219 501 Marriot B-069 532 Marriot C-304 221 The Ritz A-007 How to deal ... • With a large index we should construct a sparse index on the primary index. Hotel Sofitel Hotel Sofitel Hilton Marriot Marriot Westin Marriot The Ritz
Is this looking familiar? • Remember B+-trees • B+ trees are said to be of m-order. A number of the designers choosing. • Each leaf has between m and [m-2] children. • All data is stored at the leaf level. • All leaves are at the same depth