1 / 12

Organizing files for performance

Organizing files for performance. Chapter 6. 6.1 Data compression. Advantages of reduced file size Redundancy reduction: state code example Repeating sequences: run length encoding Variable length code static (Morse code) dynamic (Huffman code) Irreversible compression (e.g., jpeg)

oihane
Download Presentation

Organizing files for performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Organizing files for performance Chapter 6

  2. 6.1 Data compression • Advantages of reduced file size • Redundancy reduction: state code example • Repeating sequences: run length encoding • Variable length code • static (Morse code) • dynamic (Huffman code) • Irreversible compression (e.g., jpeg) • Unix routines (append .z to compressed files)

  3. 6.2 Reclaiming space • “Holes” arise when • variable length records are updated • fixed or variable length records are deleted • Compaction (for deleted records) • mark deleted records • allows undelete to be implemented • periodically run compaction program

  4. 6.2.2 Dynamic reclamation • Simple approach: search sequentially until space is found to insert a new record; drawback: very slow • Alternative uses linked list stack to allow immediate access to an empty slot, if available; stack may be kept in deleted record slots, with RRN of top in header record.

  5. 6.2.3 Variable length records • Same scheme (linked list stack) may be used, except byte offset rather than RRN must be used as link • Deleted records go on top of stack, but stack must be searched when adding records to find a space big enough to accommodate each new record

  6. 6.2.4 Fragmentation • Internal • fixed length records • “unsophisticated” variable length scheme • External: variable length records • smaller record is placed in a larger slot • leftover space is added to available list • Coalescing holes (good test question)

  7. 6.2.5 Placement strategies • First fit: first record slot that’s big enough • Best fit: sort slots in ascending order by size, then use first fit • Worst fit: sort in descending order • no need to search: just use first space if it’s big enough • leftover space may be enough for another record

  8. 6.3.2 Binary search • relational ops for search key • retrieval by RRN • object-oriented presentation of algorithm • implementation with templates • compilation with class definitions

  9. 6.3.3-4 Search performance • complexity for binary search is O(log2n), compared to O(n) for sequential search • records must be sorted on search key • disk sort is prohibitively expensive • “internal sort” allows direct accesses in memory

  10. 6.3.5 Limitations • number of disk accesses for binary search is still significant for large files • keeping a file sorted can be less efficient than using sequential search; merge technique addresses this problem • internal sort is limited to small files, that will fit entirely in memory

  11. 6.4 Keysort • only keys are kept in memory • each key is kept with its RRN (keynode) • keynode array is sorted in memory • data file can be sorted by reading records in order or sorted keynodes and writing them to a new file • keynodes can be written as an index file

  12. 6.4.4 Pinned records • available list (of deleted record slots) • records whose physical locations are referenced in other records are pinned

More Related