100 likes | 249 Views
Indexed Files. Part One - Simple Indexes All of this material is stolen from Dr. Foster's CSCI325 Course Notes. Non-Indexed Relative Files. Usage direct file manipulation is required when data will not fit into memory Minor Problems :
E N D
Indexed Files Part One - Simple Indexes All of this material is stolen from Dr. Foster's CSCI325 Course Notes
Non-Indexed Relative Files • Usage • direct file manipulation is required when data will not fit into memory • Minor Problems: • binary searching a file is a bit more difficult than binary searching an array • sorting a big file is difficult and slow • Major Problems: • Time - disk operations take a long time!!! • Deleting a record from the middle of a file is more difficult than deleted an element from the middle of an array. • Adding must be done at end-of-file
Indexed Files • An Indexed File is actually two separate, but related, binary files: • the Index File • the Data File • Index File contains information on how to find specific records in the data file. • Our primary objective is speed searching. • adding records gets easier too
Example This is a simple indexed file. The index to data relationship is 1:1. Can we do this with just one file?
Index File What if the Index will not fit into memory? • Key field • uses a unique identifier • same idea as in databases • arranged for fast searching • e.g., sorted by Key for binary searching • Notice that the Index File is much smaller than the Data File. The Index file must fit into memory.
Retrieval Algorithm Does step one need to happen for every search? Best Search algorithm? Read the Index into an array in memory Search the array for the Key File Position = array[index].RRN * sizeof(data record) SeekG (datafile, File Position) Read record from datafile
Add Record How do you know the RRN of the New Record? Does step 4 need to happen for every Add? write new record to end of data file add Key and RRN to end of index array sort the index array write index array to index file
Delete Record When? • Locate the appropriate key in the index array • move all subsequent array elements up one space • Mark record in Data File for deletion • Clean up the Data File • create a new file with only non-deleted records • adjust RRNs in the Index Array • Write new index array into the Index File
Analysis - Indexed v. Non-Indexed • Space • indexed files use a big chunk of main memory for the index array • one more (small) file • Time • searching an array in memory is much faster than searching a file • it is not the comparisons, it is the disk operations • Deletion is time consuming, but it is a rare operation
Limitations? • Adding Records • Deleting Records • Searching for Records