1 / 13

File organization and Indexing

Learn about indexing and hashing methods to speed up data retrieval, key concepts, types of indices, index file organization, and multi-level indexing for efficient disk access in file management systems.

danap
Download Presentation

File organization and Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File organization and Indexing • Cost estimation • Basic Concepts • Ordered Indices Indexing and Hashing

  2. Estimating Costs • For simplicity we estimate the cost of an operation by counting the number of blocks that are read or written to disk. • We ignore the possibility of blocked access which could significantly lower the cost of I/O. • We assume that each relation is stored in a separate file with B blocks and R records per block. Indexing and Hashing

  3. Basic Concepts • Indexing is used to speed up access to desired data. • E.g. author catalog in library • A search key is an attribute or set of attributes used to look up records in a file. Unrelated to keys in the db schema. • An index file consists of records called index entries. • An index entry for key k may consist of • An actual data record (with search key value k) • A pair (k, rid) where rid is a pointer to the actual data record • A pair (k, bid) where bid is a pointer to a bucket of record pointers • Index files are typically much smaller than the original file if the actual data records are in a separate file. • If the index contains the data records, there is a single file with a special organization. Indexing and Hashing

  4. Types of Indices • The records in a file may be unordered or ordered sequentially by some search key. • A file whose records are unordered is called a heap file. • If an index contains the actual data records or the records are sorted by search key in a separate file, the index is called clustering (otherwise non-clustering). • In an ordered index, index entries are sorted on the search key value. Other index structures include trees and hash tables. • A primary index is an index on a set of fields that includes the primary key. Any other index is a secondary index. Indexing and Hashing

  5. Dense Index Files • Dense index – index record appears for every search-key value in the file. Indexing and Hashing

  6. Sparse Index Files • A clustering index may be sparse. • Index records for only some search-key values. • To locate a record with search-key value k we: • Find index record with largest search-key value < k • Search file sequentially starting at the record to which the index record points • Less space and less maintenance overhead for insertions and deletions. • Generally slower than dense index for locating records. • Good tradeoff: sparse index with an index entry for every block in file, corresponding to least search-key value in the block. Indexing and Hashing

  7. Example of Sparse Index Files Indexing and Hashing

  8. Multilevel Index • If an index does not fit in memory, access becomes expensive. • To reduce number of disk accesses to index records, treat the index kept on disk as a sequential file and construct a sparse index on it. • outer index – a sparse index on main index • inner index – the main index file • If even outer index is too large to fit in main memory, yet another level of index can be created, and so on. • Indices at all levels must be updated on insertion or deletion from the file. Indexing and Hashing

  9. Multilevel Index (Cont.) inner index outer index Data Block 0 Index Block 0   Data Block 1  Index Block 1     Indexing and Hashing

  10. Non-clustering Indices • Frequently, one wants to find all the records whose values in a certain field satisfy some condition, and the file is not ordered on the field. • Example 1: In the account database stored sequentially by account number, we may want to find all accounts in a particular branch. • Example 2: As above, but where we want to find all accounts with a specified balance or range of balances. • We can have a non-clustering index with an index record for each search-key value. The index record points to a bucket that contains pointers to all the actual records with that particular search-key value. Indexing and Hashing

  11. Secondary Index on balance field of account Indexing and Hashing

  12. Clustering and Non-clustering • Non-clustering indices have to be dense. • Indices offer substantial benefits when searching for records. • When a file is modified, every index on the file must be updated. Updating indices imposes overhead on database modification. • Sequential scan using clustering index is efficient, but a sequential scan using a non-clustering index is expensive – each record access may fetch a new block from disk. Indexing and Hashing

  13. File organization in detail • http://ecomputernotes.com/database-system/rdbms/types-of-file-organization • Note : read till Indexes Sequential Access Method (ISAM) only. • Refer book for Hashing and collision as I marked in the class. Indexing and Hashing

More Related