1 / 15

Managing Files of Records

Managing Files of Records. CS 3050, Spring 2007 4/4/2007 Dr Melanie Martin. Assume:. We have a file The file is made up of records The records are made up of fields We want to access a specific record. Identifying the Record. RRN (relative record number) Saw previously

Download Presentation

Managing Files of Records

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing Files of Records CS 3050, Spring 2007 4/4/2007 Dr Melanie Martin

  2. Assume: • We have a file • The file is made up of records • The records are made up of fields • We want to access a specific record

  3. Identifying the Record • RRN (relative record number) • Saw previously • Access fixed length records directly • Byte offset = RRN * size of record in bytes • Variable length • Use index • Fixed length records • At RRN j, index contains byte offset in data file • Adds an extra look-up

  4. Identifying the Record • Key • Field or set of fields • Canonical • Rule for exact format • All caps • Remove or add ‘-’ in SSN or phone # • Distinct (unique) • Required for primary key • ISBN, SSN, Phone #

  5. Identifying the Record • Keys come in two main flavors • Primary • Uniquely identifies a single record • Ex: your specific bank account • Secondary • Identifies a group of records • Ex: all bank customers in Turlock • Ex: all bank customers overdrawn

  6. Finding the Record • Two extremes • Direct access • Sequential search • Lots of algorithms in between, but we’ll start with the extremes

  7. Measuring Algorithm Performance • In general we’ll count reads (seeks) • “Big O” • Asymptotic upper bound - worst case • g(n) = O(f(n)) means c*f(n) is an upper bound for g(n), if there exist constants c, n0 such that to the right of n0 the value of g(n) is always below c*f(n) • Draw Picture

  8. Direct Access • Just go get the record we want • O(1) • No matter how large the file we can get the record in one seek • See previous discussion of using RRN for fixed length or index + RRN for variable length

  9. Sequential Access • Go through the records in the file sequentially until we find the one we’re looking for • RRN or Key • Read one record at a time from disk • O(n) where n is the number of records in the file • I.e.time is proportional to the number of records in the file (average and worst case) • BUT what if we use blocks and read 100 records at a time • STILL proportional to number of records in the file

  10. Why would we ever do this? • Sequential search can be good when • There are few records • Rarely need to search • Ascii files where looking for patterns (grep) • Lots of records that will match a secondary key

  11. Pros and Cons • Sequential search + easy to program + only requires simple file structures - takes too long • Soon we will start looking at ways to get around this and get closer to direct access

  12. Some Miscellaneous Topics • Structure and length • Fixed length fields (think inventory example) • Make sure record size fits evenly into sectors • Ex: 512 byte sectors • 30 byte records -> increase to 32 bytes • Records never span sectors • More challenging with variable length fields (records) • Estimate longest possible field values (waste issues if too big, truncation/data loss if too small) • Averaging effect • Longest name unlikely to occur with longest address in mailing list

  13. Some Miscellaneous Topics • Distinguishing data from unused space • Read length at beginning • Special delimiter at end • Count fields

  14. Some Miscellaneous Topics • Header records • Commonly used • At beginning of file • Might contain • # records • Length of records • Date and time of last update • Name of file • Need to be able to distinguish it from data

  15. Some Miscellaneous Topics • Metadata • Data that describes the primary data in the file • Ex: Astronomer with image data generated by telescopes • Mostly interested in the image • Need info about image • Where and when taken • Which telescope • Names of related files/images • Etc.

More Related