1 / 10

Schema Versioning and File Organizations in Data Management

Learn about schema versioning techniques and file organizations for effective data management. Understand how to handle existing records when the schema changes, and explore different file organization strategies such as heap files and sorted files. Dive into the comparison and analysis of file organization methods for optimal data retrieval and manipulation.

rsquire
Download Presentation

Schema Versioning and File Organizations in Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS222/CS122C: Principles of Data ManagementUCI,Fall2018Notes#04Schema versioningand File organizations Instructor: Chen Li

  2. SchemaVersioning • Howtohandleexistingrecordswhentheschemaischanged? • Schemeversioningtechniqueaddstheversionoftheschematoeachrecord • Allversionsoftheschemaarekeptinthecatalog • Whenthe schemachanges,createanewschemaversion • Recordsareinterpreted basedonit’s schema versionandcurrentschema during aquery Table Catalog

  3. Example • CreatetableR(A,B,C) • Insert1millionrecords • Allrecordsarewithschema version1 • AddattributeD • Createnew schemaversion2 • Insert10records • 10recordsarewith schemaversion2 • Select * fromR • recordswithversion1arepaddedwithnullforfieldD • DropattributeD • Createnew schemaversion3 • Select * fromR • fieldDaretruncatedfromrecordswithversion2

  4. Next topic: File Organizations Many alternatives exist. Each one is ideal for some situations, but not so good in others: • Heap (random ordered) files:Suitable when typical access is a file scan retrieving all record or access comes through a variety of secondary indexes. • Sorted Files:Best if records must be retrieved in some order, or only a `range’ of records is needed. • Indexes: Data structures to organize records via trees or hashing. • Like sorted files, they speed up searches for a subset of records, based on values in certain (“search key”) fields. • Updates are much faster than in sorted files.

  5. Cost Model We will ignore CPU costs, for simplicity, so: • B: The number of data pages • R: Number of records per page • D: (Average) time to read or write disk page • Counting the number of page I/Os ignores gains of prefetching a sequence of pages; thus, even the real I/O cost is only roughly approximated for now. • Average-case analysis; based on several simplistic assumptions. * Good enough to convey the overall trends!

  6. Comparison of File Organizations • Heap files (random order; insert at eof) • Sorted files, sorted on <age, sal>

  7. Operations to Compare • Scan: Fetch all records from disk • Equality search • Range selection • Insert a record • Delete a record

  8. Assumptions for Our Analysis • Heap Files: • Equality selection on key; exactly one match. • Sorted Files: • File compacted after a deletion (vs. a deleted bit).

  9. Cost of Operations * Several assumptions underlie these (rough) estimates!

  10. Cost of Operations * Several assumptions underlie these (rough) estimates!

More Related