90 likes | 336 Views
File Processing : Index and Hash. 2004, Spring Pusan National University Ki-Joune Li. What is index ?. Index in a book Index : Keyword Pages Without Index Exhaustive search : Too Expensive Index for a file or database A function or mechanism
E N D
File Processing : Index and Hash Spatiotemporal Database Laboratory Pusan National University 2004, Spring Pusan National University Ki-Joune Li
What is index ? • Index in a book • Index : Keyword Pages • Without Index • Exhaustive search : Too Expensive • Index for a file or database • A function or mechanism • Index : Predicate Blocks (block numbers on hard disk) • e.g. find student records where student.GPA > 4.0 Spatiotemporal Database Laboratory Pusan National University
2nd Phase Search Block Number Databaseon Disk 1st Phase Data Retrieval Time • Data retrieval on disk : Two phases • 1st phase : Search with a condition (Predicate) • 2nd phase : Data access Data Access Time- File Structure- Disk Placement- Clustering, etc.. Spatiotemporal Database Laboratory Pusan National University Search Condition { Block# }
By maximizing blocking factor, we reduce the number of disk accesses Blocking Factor Bf • Blocking Factor • Number of Records in a Block • Blocking Number and Number of Disk Accesses • ND = Nrecord / Bf Spatiotemporal Database Laboratory Pusan National University
How to Accelerate Phase 1 ? • Of course, we could accelerate the phase 1 • by index or by hash • Index vs. Hash • Index : a type of data structures • Needs additional data structures • Hash : a type of mechanism • May not need any additional data structure (not exactly true) Spatiotemporal Database Laboratory Pusan National University
A Simple Idea on Index • Mapping Table from keywords to block numbers • Inverted File • Why inverted file is better than nothing ? • If the table is too large (to fit in main memory) • It have to be stored on disk • Disk Access for Index Access Keyword Block# Juliet Spatiotemporal Database Laboratory Pusan National University Romeo B26 Hamlet B22 … … Carmen B212
30, b27 14, b17 40, b26 34, b17 55, b26 Searching Algorithms and Index • A good way to accelerate searching • Tree : O( logn ) • Reorganize Inverted File to Tree • Binary Search Tree : Branching Factor = 2 • Tree in memory space vs. in disk space • Memory space : Number of Comparisons • Disk space : Number of Block Accesses Spatiotemporal Database Laboratory Pusan National University
34 57, b27 103, b28 … 343, b14 Number of delimiters Block number Delimiter 44 1, b29 … 54, b21 32 58, b17 … 96, b127 Paged Tree : m-way search tree • How to determine m ? • One Node : One Disk Page • e.g. When 1 disk page is 4 K bytes • 4+4m+8(m-1) = 4096 m = 341 • Very fat tree Spatiotemporal Database Laboratory Pusan National University
Problem of m-Way search tree • m-way search tree • Search Performance : determined by the height • Not balanced • Average : O(log n) • Worst case : n / Bf O(n) • Height : determined by insertion order • e.g : insertion by ascending order • How to make it balanced ? • Balanced m-Way search tree : B-tree Spatiotemporal Database Laboratory Pusan National University