1 / 29

Chapter 7 Indexing

File Structures by Folk, Zoellick, and Ricarrdi. Chapter 7 Indexing. Chapter Objectives. Index files. Operations Required to Maintain an Index File. Primary keys. Secondary keys. Index. a tool for finding records in a file consists of: key field field on which the index is searched

stillman
Download Presentation

Chapter 7 Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Structures by Folk, Zoellick, and Ricarrdi Chapter 7Indexing

  2. Chapter Objectives • Index files. • Operations Required to Maintain an Index File. • Primary keys. • Secondary keys.

  3. Index • a tool for finding records in a file • consists of: • key field • field on which the index is searched • reference (address or RRN) field • tells where to find the data file record associated with a particular key.

  4. Examples of an Index • book index • usually at the end of the book • arranged alphabetically by topic • The index in a library (an on-line catalog) allows you to locate items by an author, by a title, or by a call number. • photo thumbnails • usually represents a link to the actual photo • much smaller file, can be loaded quickly • actual photo takes much longer to load • if index was actual photos, would take long to load

  5. Example: Index in Databases • University uses an index file to keep track of its courses. • The data file consists of the following fields in each record: • Department • Title • Professor • Student List • Room & Time

  6. Example: Primary key • Primary key could be • Department • not specific enough • Course Number • not unique • Professor • not unique • Room & Time • possible • classes aren’t identified this way • Department + Course Number • Obvious?

  7. Index file • used to provide rapid access to individual records in the data file via the keys • Example index file consists of the following fields: • key (e.g. CIS402) • reference (address) =address of the corresponding record in the data file

  8. Operations on an Indexed File • Create (when data file is created) • Load into memory (whole file, if possible and prudent) • Write updated file to permanent storage • Record(s) added to data file • Record(s) deleted from data file • Update record(s) in data file • Searches

  9. Creating Files of Data Load Index • Create files • index file • data record file • via • buffer I/O • an array. Writing Back Index File • Can be part of the close operation for the index file • close function in index object can write the buffer/array to the disk before closing file

  10. Record addition • Adding a new data record to the data file requires adding a new record to the index file • If the index file is sorted: • adding a new record may require rearranging the records in this file. • depends upon index file representation in memory • if sort necessary, easily done if the indices are in main memory

  11. Record deletion • Deletion of a data record requires deletion of the corresponding index record. • Can space in data file be reclaimed? • Difficult, as with index file organization all data records are pinned • a pinned data record is one that has a reference to its address in an index file • Other consequences • Resorting difficulty • Solution: Sort the file via the indices

  12. Record Updating • Two categories of updates: • modification of key value • re-ordering of the index file might be required • two possible situations • modifying key reorders file • see below • modification of non-key value • might still require reordering of records in the data file. (WHY?) • size of data record might increase, requiring moving it to space that can hold it • must reset index for that record

  13. Indexes too large for Memory • kept on the secondary storage • disadvantages • time consumption • searching the index file • requires disk accesses instead of just memory accesses • rearranging indexes • requires disk accesses

  14. Solutions to Index Files in 2ndary storage • If the index file is too large to be kept in main memory than the following alternative organizations should be considered: • a hashed organization (if access speed is very important) • probably won’t get to this • a tree structured organization, or a multilevel index such as a B-tree • get ready

  15. Pros of a simple index file • allows for use of binary search • sorting and maintaining an index is much easier than for a data file • true if index entries are much smaller than data records, • if data records are pinned, can rearrange keys without moving data records • apply them to multiple simple indexes...

  16. Indexing with Multiple Key Access • unique primary key often used as a search keyword. • Example primary key • CIS 402 • What if you’d like to include the prof in the search? • Two keys: Course & Prof • Could also be: • Course & Time • Location & Time (?)

  17. Secondary key • A secondary key is a key for which multiple records may exist in the data file. • Example: • Sorting an Excel sheet using two fields (e.g. name & section) • A professor teaches more than one class

  18. Secondary Index File • create for the possible secondary indexes. • secondary keys can be shared • primary keys were unique • Example: • Professor Spiegel 2ndary keys: • Primary keys containing this prof: • CIS 136 • CIS 235 • CIS 402 • Can access those courses via the secondary key • What if course has multiple sections?

  19. Record Addition • Adding a record to the data file likely requires adding a record to the secondary index file. • Costs are similar to the cost of adding a record in the primary index file. • records might have to be shifted • indexes may have to be rearranged

  20. Record Deletion • must remove all references to that record in the file system. • search for primary key in primary index file • remove index • search in secondary index file • for the primary key of the record to be deleted • remove index from the secondary index file. • what if secondary keys are maintained? • secondary key refers to primary key • primary key will have been deleted, and will not exist • if we consider this possiblility, don’t have to delete secondary key • pitfalls?

  21. Record Updating • There are three possible situations: • secondary key altered • may have to rearrange the secondary key index so it stays in sorted order • primary key altered • big impact on the primary key index • in the secondary key index only need to update the affected primary key field • confined to non-key fields • all updates that do not affect either the primary or secondary key fields do not affect the secondary key index, even if the update is substantial. • recall, can affect primary index, since that refers to location in data file

  22. Retrieving Data with Multiple Secondary Keys • Example: • All courses taught by Spiegel or Gordon • Requires two searches • searches produce a list of courses by providing primary keys. • Spiegel: CIS136, CIS235, CIS402 • Gordon: CIS425, CIS520, CIS243

  23. Boolean AND in searches • Example: • Search for courses: • taught by Spiegel • located in Lytle Hall • Courses found are in intersection of • courses taught by Spiegel • courses offered in Lytle Hall

  24. Boolean OR searches • Example: • Search for courses: • taught by Spiegel • located in Lytle Hall • Courses found are in union of • courses taught by Spiegel • courses offered in Lytle Hall

  25. Cons of the Current Secondary Index Structure • index file has to be rearranged every time a new record is added to the file. • for duplicate secondary keys, secondary key field is repeated for each entry.

  26. Improvements to the secondary index key structure • Solution 1 • Allow for multiple primary keys to be associated with a single secondary key by allocating a primary key list (STL vector is best; why?) for each secondary key entry. • Solves the problem of sorting each time when an new entry is added. • According to text: Suffers from internal fragmentation due to fixed nature of list, and the number of allocated entries in the array may prove too small. • STL (or Java) vector fixes this: How?

  27. Improvements to the secondary index key structure • Solution 2 • Create an inverted list of indexes. Have each secondary key point to a list of primary key references associated with it. • This method eliminates most of the problems associated with maintaining a secondary index file. • Which solution is better?

  28. Selective Index • contains keys for only a portion of the records in the data file. • fitting certain criteria • e.g. • Courses offered after 12 noon • states with a ‘G’ in their name

  29. Binding • takes place when a key is associated with a particular physical record in the data file. • takes place: • during the preparation of the data file and indexes • allows faster access • later on during program execution • secondary keys can be bound to primaries, or to actual addresses • advantage: quicker access • disadvantage: if record edited/deleted

More Related