160 likes | 175 Views
Learn about various index structures that speed up record retrieval in file systems. Primary and secondary indexes, ordered and unordered indexes, B-trees, and more are discussed.
E N D
Index Structures for Files • Indexes speed up the retrieval of records under certain search conditions • Indexes called secondary access paths do not affect the placement of records • They provide fast access for searches based on the indexing field • Some types of indexes work only in conjunction with a certain file organization
Index Structures for Files • Single level ordered indexes allow us to search for a record by searching the index file using binary search • The index is typically defined on a single field of the file called the indexing field • There are several kinds of ordered indexes • A primary index is specified on the ordering key field of an ordered file of records
Index Structures for Files • If the ordering field is not unique, a clustering index can be used • We can specify a secondary index on any non-ordering field of the file • A primary index is an ordered file whose records have two fields • Key field • Pointer to a disk block
Index Structures for Files • There exists one index entry for each block in the data file • Note that this only works for ordered files. Why? • The first record in each block is called the anchor record of the block or the block anchor • A primary index is an example of a nondense index since we don’t have a pointer to every record in the data file
Index Structures for Files • Insertion of records can be handled with an unordered overflow file and periodic maintenance • Deletion of records can be handled with deletion markers and periodic maintenance • If the ordering field is not unique, we use a clustering index • It is common to reserve a block for each value of the ordering field
Index Structures for Files • We can also create a secondary index on any non-ordering field of the data file • Since the data file is not ordered on the index field, we must have a pointer to each record (not to blocks) • This is an example of a dense index • Larger than primary index file since it is dense
Index Structures for Files • Note that the index file is an ordered file • We can create a primary index on this file with block anchoring to speed up access to this file • Repeat as necessary • This leads to the idea of a multilevel index as illustrate in figure 5.6
Index Structures for Files • B-trees and B+ trees are specialized tree structures • A tree is formed of nodes • Each node in the tree has one parent node and zero or more child nodes - except for the root node which has no parents • A node that has no children is called a leaf node
Index Structures for Files • The level of the root node is zero and the level of every other node is one more than that of its parent • A subtree of a node consists of the node and all of its descendant nodes (its children, grandchildren, etc.) • Each node of a tree used as an indexing structure can contain data and several pointers
Index Structures for Files • A search tree is a specialized type of tree used to guide a search • A search tree of order p is a tree with at most p-1 search values and p pointers to sub-trees • <P1,K1,P2,K2,…,Pq-1,Kq-1,Pq> • Each value in the subtree pointed to by P1 is less than K1 and each value in the subtree pointed to by P2 is greater than K1 (true also for the other subtrees
Index Structures for Files • To find a value in a search tree search the root node, and if the value is not found, search in the appropriate subtree. If there is no subtree (we are at a leaf node) the value doesn’t exist in the tree • Insertion and deletion in a search tree will usually cause a search tree to be unbalanced
Index Structures for Files • Record deletion may leave some nodes nearly empty, wasting space • The B-tree, a constrained search tree, solves these two problems • A B-tree of order p has the following structure for internal nodes • <P1, <K1, Pr1>, P2, <K2, Pr2>, …,<Kq-1,Prq-1 >,Pq>
Index Structures for Files • Each Pi is a tree pointer • Each Pri is a data pointer • K1<K2<…<Kq-1 • The values in the subtree are between the values of the two neighboring key values • Each node has at most p tree pointers • Each node (except the root) has at least ceiling(p/2) tree pointers • A node with q tree pointers has q-1 search key field values
Index Structures for Files • If there are empty spaces in a node, insertion is simple • Otherwise, the node is split, and the middle value is promoted (moved up to the parent node). This in turn, may cause that node to be split • Deletion is (relatively!) easy if there are more than p/2 values after the deletion
Index Structures for Files • Replace the deleted value with the next value if the node is non-leaf, then delete the value • If there are not enough values in the node, combine with left or right neighbor and redistribute • If we don’t have extras in the right or left neighbor, combine with the parent
Index Structures for Files • B+ trees are a variation of B-trees • Each data value occurs in a leaf node along with a pointer to the associated record • The leaf nodes are chained together to provide fast sequential access to the data records