190 likes | 298 Views
CSC 213 – Large Scale Programming. Lecture 11: Indexed Files. Dictionaries in Real World. Often need large database on many machines Split search terms across machines Updating & searching work split between machines Database way too large for any single machine
E N D
CSC 213 – Large Scale Programming Lecture 11:Indexed Files
Dictionaries in Real World • Often need large database on many machines • Split search terms across machines • Updating & searching work split between machines • Database way too large for any single machine • If you think about it, this is incredibly common • Where?
Splitting Keys From Values • In real world, we often have many indices • Simple units measure where we can find values • Values could be searched for in multiple ways
Splitting Keys From Values • In real world, we often have many indices • Simple units measure where we can find values • Values could be searched for in multiple ways
Index & Data Files • Split information into two (or more) files • Data file uses fixed-size records to store data • Index files contain search terms & data locations • Fixed-size records usually used in data file • Each record will use exactly that much space • Extra space wasted if the value is smaller • But limits data size, cannot get more space • Makes it far easier to reuse space & rebuild index
Index File Format • No standard format – depends on type of data • Often variable sized, but this not specific requirement • Each entry in index file begins with exact search term • Followed by position containing matching data • As a result, often find indexes smushed together • Can read indexes at start of program execution • Reasonably assumes index file smaller than data file • Changes written immediately, however • When program starts, do NOT read data file
Indexed Files • Enables splitting search terms across computers • Alphabetical split searches faster on many servers U-X Y-Z A - C S-T D-E Q-R F-H I-P
Indexed Files • Enables splitting search terms across computers • Create indexes for different types of searching Song name Song Length
How Does This Work? • Using index files simplified using positions • Look in index structure to find position of data in file • With this position can then seek to specific record • Create instance & initialize by reading data from file
Starting with Indexed Files IBM 106 IBM AT & T 23 T Ford 2 F
How Does This Work? • Adding new records takes only a few steps • Add space for record with setLength on data file • Update index structure(s) to include new record • Records in data file updated at each change
Adding New Data To The Files IBM 106 IBM AT & T 23 T Ford 2 F 0
Adding New Data To The Files IBM 106 IBM AT & T 23 T Ford 2 F Citibank -2 C
How Does This Work? • Removing records even easier • To prevent using record, remove items from indexes • Do NOT update index file(s) until program completes • Use impossible magic numbers for record in data file
Removing Data As We Go IBM 106 IBM AT & T 23 T Ford 2 F Citibank -2 C
Removing Data As We Go IBM 106 IBM AT & T 23 T Ford 0 Ø Citibank -2 C
For Next Lecture • Weekly assignment still available online • Continues to be due Wednesday at 5PM • Ask me questions, if you have trouble on a problem • Reading Section 9.1 in textbook about Map ADT • How do we look up data? • What other ADTs are out there? • How could they relate to today's lecture?