1 / 15

Records and Files

Records and Files. Storage Technology: Topic 3. Record Formats: Fixed Length. Information about field types stored in system catalogs. Direct access to i’th field. F1. F2. F3. F4. L1. L2. L3. L4. Base address (B). Address = B+L1+L2. 4. $. $. $. $.

thelma
Download Presentation

Records and Files

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Records and Files Storage Technology: Topic 3

  2. Record Formats: Fixed Length • Information about field types stored in systemcatalogs. • Direct access to i’th field. F1 F2 F3 F4 L1 L2 L3 L4 Base address (B) Address = B+L1+L2

  3. 4 $ $ $ $ Record Formats: Variable Length • Two alternative formats (# fields is fixed): F1 F2 F3 F4 Fields Delimited by Special Symbols Field Count F1 F2 F3 F4 Array of Field Offsets • Second offers direct access to i’th field, efficient storage • of nulls(special don’t know value); small directory overhead.

  4. Files of Records • Page or block is OK when doing I/O, but higher levels of DBMS operate on records, and files of records. • FILE: a collection of pages, each containing a collection of records. Must support: • insert/delete/modify record • read a particular record (specified using record id) • scan all records (possibly with some conditions on the records to be retrieved)

  5. Page Formats: Fixed Length Records Slot 1 Slot 1 Slot 2 Slot 2 • Record id = <page id, slot #>. In first alternative, moving records for free space management changes rid; may not be acceptable. Free Space . . . . . . Slot N Slot N Slot M N . . . 1 1 1 M 0 M ... 3 2 1 number of records number of slots PACKED UNPACKED, BITMAP

  6. Rid = (i,N) Page i Rid = (i,2) Rid = (i,1) Page Formats: Variable Length Records • Can move records on page without changing rid; so, attractive for fixed-length records too. 20 16 24 Pointer to start of free space N . . . 2 1 SLOT DIRECTORY

  7. Unordered (Heap) Files • Simplest file structure contains records in no particular order. • As file grows and shrinks, disk pages are allocated and de-allocated. • To support record level operations, we must: • keep track of the pages in a file • keep track of free space on pages • keep track of the records on a page • There are many alternatives for keeping track of this.

  8. Heap File Implemented as a List • Each page contains 2 `pointers’ plus data. Data Page Data Page Data Page Pages with Free Space Header Page Data Page Data Page Data Page Full Pages

  9. Heap File Using a Page Directory Data Page 1 Header Page • The entry for a page can include the number of free bytes on the page. • The directory is a collection of pages; linked list implementation is just one alternative. Data Page 2 Data Page N DIRECTORY

  10. Indexes • A Heap file allows us to retrieve records: • by specifying the rid, or • by scanning all records sequentially • Sometimes, we want to retrieve records by specifying the values in one or more fields, e.g., • Find all students in the “CS” department • Find all students with a gpa > 3 • Indexes enable us to answer value-based (associative) queries efficiently.

  11. Alternative File Organizations Many alternatives exist, each ideal for some situation , and not so good in others: • Heap files: Suitable when typical access is a file scan retrieving all records. • Sorted Files: Best if records must be retrieved in some order, or only a `range’ of records is needed. • Hashed Files: Good for equality selections. • File is a collection of buckets. Bucket = primary page plus zero or moreoverflow pages. • Hashing functionh: h(r) = bucket in which record r belongs. h looks at only some of the fields of r, called the search fields.

  12. Cost Model for Analysis Ignore CPU costs, for simplicity: • B: The number of data pages • R: Number of records per page • D: (Average) time to read or write disk page • Measuring number of page I/O’s ignores gains of sequential I/O; thus, even I/O cost is only approximated. • Average-case analysis; based on several simplistic assumptions. • Good enough to show the overall trends!

  13. Cost of Operations • Several assumptions underlie these (rough) estimates!

  14. Assumptions • Single record insert and delete. • Heap Files: • Equality selection on key; exactly one match. • Insert always at end of file. • Sorted Files: • Files compacted after deletions. • Selections on sort field(s). • Hashed Files: • No overflow buckets, 80% page occupancy.

  15. Summary • Variable length record format with field offset directory offers support for direct access to i’th field and null values. • Slotted page format supports variable length records and allows records to move on page. • File layer keeps track of pages in a file, and supports abstraction of a collection of records. • Linked list or directory data structure • Sorted and hashed files for query processing

More Related