1 / 41

Chap6. Organizing Files for Performance

File structures by Folk, Zoellick and Ricarrdi. Chap6. Organizing Files for Performance. 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주. Chapter Objectives(1). Look at several approaches to data compression Look at storage compaction as a simple way of reusing space in a file

lamya
Download Presentation

Chap6. Organizing Files for Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File structures by Folk, Zoellick and Ricarrdi Chap6. Organizing Files for Performance 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 SNU-OOPSLA Lab.

  2. Chapter Objectives(1) • Look at several approaches to data compression • Look at storage compaction as a simple way of reusing space in a file • Develop a procedure for deleting fixed-length records that allows vacated file space to be reused dynamically • Illustrate the use of linked lists and stacksto manage an avail list • Consider several approaches to the problem of deleting variable-length records • Introduce the concepts associated with the terms internal fragmentation and external fragmentation SNU-OOPSLA Lab.

  3. Chapter Objectives(2) • Outline some placement strategies associated with the reuse of space in a variable-length record file • Provide an introduction to the idea underlying a binary search • Undertake an examination of the limitations of binary searching • Develop a keysort procedure for sorting larger files; investigate the costs associated with keysort • Introduce the concept of a pinned record SNU-OOPSLA Lab.

  4. Contents 6.1 Data compression 6.2 Reclaiming space in files 6.3 Finding things quickly: An Introduction to internal sorting and binary searching 6.4 Keysorting SNU-OOPSLA Lab.

  5. 6.1 Data Compression Data Compression(1) • Reasons for data compression • less storage • transmitting faster, decreasing access time • processing faster sequentially SNU-OOPSLA Lab.

  6. 6.1 Data Compression Data Compression(2):Using a different notation • Fixed-Length fields are good candidates • Decrease the # of bits by finding a more compact notation ex) original state field notation is 16bits, but we can encode with 6bit notation because of the # of all states are 50 • Cons. • unreadable by human • cost in encoding time • decoding modules => increase the complexity of s/w => used for particular application SNU-OOPSLA Lab.

  7. 6.1 Data Compression Data Compression(3):Suppressing repeating sequences • Run-length encoding algorithm • read through pixels, copying pixel values to file in sequence, except the same pixel value occurs more than once in succession • when the same value occurs more than once in succession, substitute the following three bytes • special run-length code indicator((ex) ff) • pixel value repeated • the number of times that value is repeated • ex) 22 23 24 24 24 24 24 24 24 25 26 26 26 26 26 26 25 24 • 22 23 ff 24 07 25 ff 26 06 25 24 SNU-OOPSLA Lab.

  8. 컴퓨터내 image의 표현 pixel 화면 각 pixel당 빛의 세기 수치화 (digital) 전기 신호 (analog) SNU-OOPSLA Lab.

  9. 컴퓨터내image의 표현 화면 12 8 12 33 99 12 56 7 13 44 66 23 12 4 34 57 99 12 …... SNU-OOPSLA Lab.

  10. 컴퓨터내 color 영상의 표현 12 4 34 57 99 12 …... 화면 56 7 13 44 66 23 12 4 34 57 99 12 …... 12 8 12 33 99 12 56 7 13 44 66 23 12 4 34 57 99 12 …... ** 동영상 --- 초당 25 - 30 개의 정지화상을 교체 (video) (image) SNU-OOPSLA Lab.

  11. 6.1 Data Compression Data Compression(3):Suppressing repeating sequences • Run-length encoding (cont’d) • example of redundancy reduction • cons. • not guarantee any particular amount of space savings • under some circumstances, compressed image is larger than original image • Why? Can you prevent this? SNU-OOPSLA Lab.

  12. 6.1 Data Compression Data Compression(4):Assigning variable-length codes • Morse code: oldest & most common scheme of variable-length code • Some values occur more frequently than others • that value should take the least amount of space • Huffman coding • base on probability of occurrence • determine probabilities of each value occurring • build binary tree with search path for each value • more frequently occurring values are given shorter search paths in tree SNU-OOPSLA Lab.

  13. 6.1 Data Compression Data Compression(5):Assigning variable-length codes • Huffman coding • Letter: a b c d e f g • Prob: 0.4 0.1 0.1 0.1 0.1 0.1 0.1 • Code: 1 010 011 0000 0001 0010 0011 • ex) the string “abde” • 101000000001 SNU-OOPSLA Lab.

  14. 6.1 Data Compression Huffman Tree 0 a(1) 00 01 b(010) c(011) 000 001 d(0000) e(0001) g(0011) f(0010) SNU-OOPSLA Lab.

  15. 6.1 Data Compression Data Compression(6):Irreversible compression techniques • Some information can be sacrificed • Less common in data files • Shrinking raster image • 400-by-400 pixels to 100-by-100 pixels • 1 pixel for every 16 pixels • Speech compression • voice coding (the lost information is of no little or no value) SNU-OOPSLA Lab.

  16. 6.1 Data Compression Compression in UNIX • System V • pack & unpack use Huffman codes • after compress file, appends “.z” to end of packed file • Berkeley UNIX • compress & uncompress use Lempel-Ziv method • after compress file, appends “.Z” to end of compressed file SNU-OOPSLA Lab.

  17. 6. 2 Reclaiming Space in Files Record Deletion and Storage Compaction • Storage compaction • record deletion : just marks each deleted record • reclamation of all deleted records => pros : delete/undelete operation with little effort Ex) • Ames|123|OK|…...| • Morrison|9035|OK| • Brown|625|IA|…...| Delete second record • Ames|123|OK|…| • *|rrison|9035|OK| • Brown|625|IA|…| After compaction • Ames|123|OK|…| • Brown|625|IA|…| SNU-OOPSLA Lab.

  18. 6. 2 Reclaiming Space in Files Deleting Fixed-length Records for Reclaiming Space Dynamically(1) • Reuse the space from deleted records as soon as possible • deleted records must be marked in special way • we could find the deleted space • To make record reuse quickly, we need • a way to know immediately if there are empty slots in the file • a way to jump directly to one of those slots if they exist => Linked lists or Stacksfor avail list * avail list : a list that is made up of deleted records SNU-OOPSLA Lab.

  19. 6. 2 Reclaiming Space in Files Deleting Fixed-length Records for Reclaiming Space Dynamically(2) The Linked List Head pointer ptr -1 ptr ptr ptr The Stack Head pointer RRN 5 RRN 2 (a) (2) 2 -1 Head pointer RRN 3 PRN 5 RRN 2 (b) (3) 5 2 -1 after pushing record of RRN 3 SNU-OOPSLA Lab.

  20. 6. 2 Reclaiming Space in Files Deleting Fixed-length Records for Reclaiming Space Dynamically(3) • Linking and stacking deleted records • arranging and rearranging links are used to make one available record slot point to the next • second field of deleted record points to next record SNU-OOPSLA Lab.

  21. 6. 2 Reclaiming Space in Files • Sample file showing linked list of deleted records • List head(first available record) 5(delete 3, 5 ) 0 1 2 3 4 5 6 (a) Edwards... Betas... Wills... *-1 Masters.. *3 Chavez... • List head(first available record) 1 (delete 1) 0 1 2 3 4 5 6 (b) Edwards... *5 Wills... *-1 Masters.. *3 Chavez... • List head(first available record) -1(insert three new records) 0 1 2 3 4 5 6 (c) Edwards.. 1st new rec Wills... 3rd new rec Masters.. 2nd new rec Chavez... SNU-OOPSLA Lab.

  22. 6. 2 Reclaiming Space in Files Deleting Variable-length Records • Avail list of variable-length records • it has byte count of record at beginning of each record • use byte offset instead of RRN • Adding and removing records • in adding records, search through avail list for right size (=>big enough) SNU-OOPSLA Lab.

  23. 6. 2 Reclaiming Space in Files Removal of a record from an avail list with variable-length records Size 47 Size 38 Size 72 Size 68 -1 (a)Before removal Size 47 Size 38 Size 68 New Link -1 (b)After removal Size 72 Removed record SNU-OOPSLA Lab.

  24. 6. 2 Reclaiming Space in Files Storage Fragmentation • Internal fragmentation (in fixed-length record) • waste space within a record • in variable-length records, minimize wasted space by doing away with internal fragmentation • External fragmentation (in variable-length record) • unused space outside or between individual records • three possible solutions • storage compaction • coalescing the holes: a single, larger record slot • minimizing fragmentation by adopting placement strategy SNU-OOPSLA Lab.

  25. 6. 2 Reclaiming Space in Files Internal Fragmentationin Fixed-length Records Unused space -> Internal fragmentation Ames | John | 123 Maple | Stillwater | OK | 740751 |................................... Morrison | Sebastian | 9035 South Hillcrest | Forest Village | OK | 74820 | Brown | Martha | 625 Kimbark | Des Moines | IA | 50311 | ......................... 64-byte fixed-length records SNU-OOPSLA Lab.

  26. 6. 2 Reclaiming Space in Files External Fragmentationin Variable-length Records Record[1] Record[2] 40 Ames | Jone | 123 Maple | Stillwater | OK | 740751 | 64 Morrison | Sebastian | 9035 South Hillcrest | Forest Village | OK | 74820 | 45 Brown | Martha | 625 Kimb bark | Des Moines | IA | 50311 | record length Record[3] External fragmentation ex) Delete Record[2] and Insert New Record[i] : 12-byte unused space 52 Adams | Kits | 3301 Washington D.C | Forest Village | IA | 43563 | Record[i] SNU-OOPSLA Lab.

  27. 6. 2 Reclaiming Space in Files Placement Strategies • First-fit • select the first available record slot • suitable when lost space is due to internal fragmentation • Best-fit • select the available record slot closest in size • avail list in ascending order • suitable when lost space is due to internal fragmentation • Worst-fit • select the largest record slot • avail list in descending order • suitable when lost space is due to external fragmentation SNU-OOPSLA Lab.

  28. 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching Finding Things Quickly(1) • Goal: Minimize the number of disk accesses • Finding things in simple field and record files may have many seeks • Binary search algorithm for fixed-sized record int BinarySearch(FixedRecordFile &file, RecType &obj, KeyType &key) // binary search for key. { int low = 0; int high = file.NumRecs() - 1; while (low <= high){ int guess = (high - low)/2; file.ReadByRRN(obj, guess); if(obj.Key () == key) return 1; // record found if*obj.Key() < key) high = guess - 1; // search before guess else low = guess + 1; // search after guess } return 0; // loop ended without finding key } SNU-OOPSLA Lab.

  29. Classes and Methods for Binary Search Class KeyType {public int operator == (KeyType &); int operator < (KeyType &); }; class RecType {public: KeyType Key();}; class FixedRecordFile{public: int NumRecs(); int ReadByRRN (RecType & Record, int RRN); }; SNU-OOPSLA Lab.

  30. 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching Finding Things Quickly(2) • Binary search vs. Sequential search • binary search • O(log n) • list is sorted by key • sequential search • O(n) SNU-OOPSLA Lab.

  31. 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching Finding Things Quickly(3) • Sorting a disk file in RAM • read the entire file from disk to memory • use internal sort (=sort in memory) • UNIX sort utility uses internal sort • Limitations of binary search & internal sort • binary search requires more than one or two access c.f.) single access by RRN • keeping a file sorted is very expensive • an internal sort works only on small files SNU-OOPSLA Lab.

  32. 6.3 Finding Things Quickly : An Introduction to Internal Sorting and Binary Searching Internal Sort Read the entire file unsorted file sorted file unsorted file Sort in memory disk memory SNU-OOPSLA Lab.

  33. 6.4 Keysorting Key Sorting & Its Limitations • So called, “tag sort” : sorted thing is “key” only • Sorting procedure • Read only the keys into memory • Sort the keys • Rearrange the records in file by the sorted keys • Advantage • less RAM than internal sort • Disadvantages(=Limitations) • reading records in disk twice is required • a lot of seeking for records for constructing a new(sorted) file SNU-OOPSLA Lab.

  34. 6.4 Keysorting KEYNODES array Records KEY RRN HARRISON 1 2 Harrison|Susan|387 Eastern.... Kellog|Bill|17 Maple.... KELLOG HARRIS Harris|Margaret|4343 West.... 3 Conceptual view before sorting . . . . k BELL Bell|Robert|8912 Hill.... In RAM On secondary storage Records KEY RRN k 3 BELL Harrison|Susan|387 Eastern.... HARRIS Kellog|Bill|17 Maple.... Conceptual view after sorting keys in RAM Harris|Margaret|4343 West.... HARRISON 1 . . . . 2 Bell|Robert|8912 Hill.... KELLOG SNU-OOPSLA Lab.

  35. 6.4 Keysorting Pseudocode for keysort(1) • Program: keysort • open input file as IN_FILE • create output file as OUT_FILE • read header record from IN_FILE and write a copy to OUT_FILE • REC_COUNT := record count from header record • /* read in records; set up KEYNODES array */ • for i := 1 to REC_COUNT • read record from IN_FILE into BUFFER • extract canonical key and place it in KEYNODES[i].KEY • KEYNODES[i].KEY = i • (continued....) SNU-OOPSLA Lab.

  36. 6.4 Keysorting Pseudocode for keysort(2) • /* sort KEYNODES[].KEY, thereby ordering RRNs correspondingly */ • sort(KEYNODES, REC_COUNT) • /* read in records according to sorted order, and write them out in this order */ • for i := 1 to REC_COUNT • seek in IN_FILE to record with RRN of KEYNODES[I].RRN write BUFFER contents to OUT_FILE • close IN_FILE and OUT_FILE • end PROGRAM SNU-OOPSLA Lab.

  37. 6.4 Keysorting Two Solutions:why bother to write the file back? • Write out sorted KEYNODES[] array without writing records back in sorted order • KEYNODES[] array is used as index file SNU-OOPSLA Lab.

  38. 6.4 Keysorting Relationship between the index file and the data file Records KEY RRN k 3 BELL Harrison|Susan|387 Eastern.... HARRIS Kellog|Bill|17 Maple.... Harris|Margaret|4343 West.... HARRISON 1 . . . . 2 Bell|Robert|8912 Hill.... KELLOG Index file Original file SNU-OOPSLA Lab.

  39. 6.4 Keysorting Pinned records(1) • Records that are referenced to physical location of themselves by other records • Not free to alter physical location of records for avoiding dangling references • Pinned records make sorting more difficult and sometimes impossible • solution: use index file, while keeping actual data file in original order SNU-OOPSLA Lab.

  40. 6.4 Keysorting Pinned records(2) Record(i) dangling pointer Record (i+1) Pinned Record delete pinned record Pinned Record File with pinned records SNU-OOPSLA Lab.

  41. Let’s Review !!! 6.1 Data compression 6.2 Reclaiming space in files 6.3 Finding things quickly: An Introduction to internal sorting and binary searching 6.4 Keysorting SNU-OOPSLA Lab.

More Related