1 / 24

Lecture 5: Record Storage and Primary File Organizations

Lecture 5: Record Storage and Primary File Organizations. Storage Devices Storage of Databases Operations on Files Primary vs. Secondary File Organizations Heap Files Sorted Files Hashing. Storage Devices. Computer Storage Medium (Hierarchy) Factors: cost, capacity, speed

ryanadan
Download Presentation

Lecture 5: Record Storage and Primary File Organizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5: Record Storage and Primary File Organizations • Storage Devices • Storage of Databases • Operations on Files • Primary vs. Secondary File Organizations • Heap Files • Sorted Files • Hashing

  2. Storage Devices • Computer Storage Medium (Hierarchy) • Factors: cost, capacity, speed • Primary Storage – data processed directly by the CPU; main memory, cache memory • Secondary (on-line) Storage - data must first be copied into primary storage for processing; magnetic disks • Secondary (off-line) Storage - optical disks (direct access), magnetic tapes (sequential)

  3. Storage of Databases • Main Memory Databases • entire databases are kept in main memory • main memory is a volatile storage: requires a backup copy (on magnetic disk) • Most Databases • are stored permanently on magnetic disk • are too large to fit entirely in main memory • magnetic disk is less expensive

  4. File Records on Disk • Records • file as a sequence of records (fig5.7) • record type = field names + data types • Fixed-Length Records • records with the same size in a file • Variable-Length Records (with separators) • records of different sizes • caused by multi-valued fields, optional fields, or variable-length fields

  5. File Blocks on Disk • Disk Block (fig5.8) • unit of data transfer between disk & memory • records of a file are allocated to disk blocks • usually 512 to 4K bytes (K=1024) • Blocking Factor (bfr) • number of (fixed-length) records in a block • bfr = B/R (floor function) • B = block size, R = record size (in bytes)

  6. File Blocks on Disk • Spanned vs. Unspanned File Org. (fig5.8) • Unspanned: leaves the remaining space in each block unused • Spanned: utilizes the unused space • Contiguous vs. Linked Allocation • Contiguous: file blocks are allocated to consecutive disk blocks • Linked: each file block contains the pointer to the next block

  7. Operations on Files • Types of Operations • Retrieval: do not change data in the file (open/close a file, find/read records) • Update: change the files by insertion, deletion or modification of records • Record-at-a-time: operations are applied to a single record • Set-at-a-time: operations are applied to a set of records or to the whole file

  8. Operations on Files • File Open/Close Operations • Open: readies the file for access, allocates buffers to hold file blocks, sets the file pointer to the beginning of the file • Close: terminates access to the file • Set-at-a-time Operations • Find: searches for the first file record that satisfies a certain condition (selection condition), and makes it the current file record

  9. Operations on Files • FindNext: searches for the next file record (from the current record) and makes it the current file record • Read: reads the current file record • Insert: inserts a new record into the file and makes it the current file record • Delete: removes the current file record from the file by marking the record to indicate that it is no longer valid

  10. Operations on Files • Modify: changes the values of some fields of the current file record • Record-at-a-time Operations • FindAll: locates all the records satisfying a search condition • FindOrdered: retrieves all the records in a specific order • Reorganize: reorganizes the records after update operations

  11. Operations on Files • Operation Factors • Access Type: attribute value(=) or range(>) • Access Time: to find a particular record(s) • Insertion Time: to insert a new record (find the place to insert + index structure update) • Deletion Time: to delete a record (find the record(s) to delete + index structure update) • Space Overhead: additional space occupied by an index structure

  12. Primary vs. Secondary File Organizations • Primary File Organizations • Heap Files • Sorted Files • Hashing • Secondary File Organizations (Index) • Single-level or Multi-level Indexes • B-trees • B+-trees

  13. Heap Files • Files of Unordered Records • simplest and basic file organization • new records are inserted at the end of the file • Access: linear search requires searching through the file block by block (N/2 file blocks on average if the record exists, N file blocks if not), very inefficient (it takes O(N) time) • Insertion: very efficient (random order) • Deletion: must first find its block, inefficient

  14. Heap Files • Direct File • allows direct access by the position of a record in a file • applies only to fixed-length records, contiguous allocation, and unspanned blocks • file records: 0, 1, … , r-1 (i.e., 120) • records in each block (bfr): 0, 1, … , bfr-1 (15) • ith record of a file (43): block position = (i/bfr), record position in the block = (i mod bfr)

  15. Sorted Files • Files of Ordered Records • file records are kept sorted by the values of an ordering field (sequential file): fig5.9 • Access: binary search (on its ordering field) requires reading and searching log2 of the file blocks on the average (O(logN) time), improvement over linear search • Insertion: records must be inserted in the correct order, very inefficient

  16. Sorted Files • Files of Ordered Records (con’t) • Deletion: inefficient, less expensive with deletion marker and periodic reorganization • FindOrdered: reading the records in order of the ordering key values is extremely efficient • Overflow: temporary unordered file for new records to improve insertion efficiency, periodically merged with the main ordered file

  17. Hashing • Hash Functions • records in the file are unordered • determine the address (B) of a record based on the value of the hash field (K) in the record • h(K) -> B • ex) h(K) = K mod M (1, 2, … , M-1) • allow direct access to the target disk block • record search in the block: main memory

  18. Internal Hashing • Internal Hashing • hashing for an internal file • hash table as an array of records (fig5.10) • noninteger hash field value such as names can be transformed into an integer (ASCII) • Collision (of hash addresses) • occurs when two hash field values are mapped into the same hash address

  19. Collision Resolution • Open Addressing • checks the subsequent positions in order until an empty position is found • Chaining • extend the array with a number of overflow positions • use a linked list of overflow records for each hash address • overflow pointer refers to the position of the next record (fig5.10(b))

  20. Collision Resolution • Multiple Hashing • applies a second hash function if the first hash function results in a collision • uses open addressing or applies a third hash function if another collision results • Good Hashing Function • uniform and random distribution of records • hash table 70-90% full to minimize collisions with less unused locations

  21. External Hashing • Hashing Function • target address space is made of buckets (one disk block or a cluster of contiguous blocks) • maps a hash field value into a bucket number • bucket number is then converted to the corresponding disk block address (fig5.11) • collision is less severe with buckets because as many records as will fit in a bucket

  22. External Hashing • Bucket Overflow • when a bucket is filled to capacity • can be solved by chaining method: fig5.12 • a pointer is maintained in each bucket to a linked list of overflow records for the bucket • record pointers include both a block address and a relative record position within the block

  23. External Hashing • Static Hashing • very fast access to records by the hash field • a fixed number of buckets M is allocated • not suitable for dynamic files (grows and shrinks dynamically) • difficult to determine the number of buckets in advance • requires a dynamic hashing technique

  24. Dynamic Hashing • Extendible Hashing (fig5.13) • maintains a directory of 2d bucket addresses • uses first d bits of a hash value to determine a directory entry and then a bucket address • d = global depth, d’ = local depth of a bucket • directory expands and shrinks dynamically • bucket doubling (split) vs. halving (merge) • update directory and local depth appropriately

More Related