Managing Files of Records

Managing Files of Records CS 3050, Spring 2007 4/4/2007 Dr Melanie Martin

Assume: • We have a file • The file is made up of records • The records are made up of fields • We want to access a specific record

Identifying the Record • RRN (relative record number) • Saw previously • Access fixed length records directly • Byte offset = RRN * size of record in bytes • Variable length • Use index • Fixed length records • At RRN j, index contains byte offset in data file • Adds an extra look-up

Identifying the Record • Key • Field or set of fields • Canonical • Rule for exact format • All caps • Remove or add ‘-’ in SSN or phone # • Distinct (unique) • Required for primary key • ISBN, SSN, Phone #

Identifying the Record • Keys come in two main flavors • Primary • Uniquely identifies a single record • Ex: your specific bank account • Secondary • Identifies a group of records • Ex: all bank customers in Turlock • Ex: all bank customers overdrawn

Finding the Record • Two extremes • Direct access • Sequential search • Lots of algorithms in between, but we’ll start with the extremes

Measuring Algorithm Performance • In general we’ll count reads (seeks) • “Big O” • Asymptotic upper bound - worst case • g(n) = O(f(n)) means c*f(n) is an upper bound for g(n), if there exist constants c, n0 such that to the right of n0 the value of g(n) is always below c*f(n) • Draw Picture

Direct Access • Just go get the record we want • O(1) • No matter how large the file we can get the record in one seek • See previous discussion of using RRN for fixed length or index + RRN for variable length

Sequential Access • Go through the records in the file sequentially until we find the one we’re looking for • RRN or Key • Read one record at a time from disk • O(n) where n is the number of records in the file • I.e.time is proportional to the number of records in the file (average and worst case) • BUT what if we use blocks and read 100 records at a time • STILL proportional to number of records in the file

Why would we ever do this? • Sequential search can be good when • There are few records • Rarely need to search • Ascii files where looking for patterns (grep) • Lots of records that will match a secondary key

Pros and Cons • Sequential search + easy to program + only requires simple file structures - takes too long • Soon we will start looking at ways to get around this and get closer to direct access

Some Miscellaneous Topics • Structure and length • Fixed length fields (think inventory example) • Make sure record size fits evenly into sectors • Ex: 512 byte sectors • 30 byte records -> increase to 32 bytes • Records never span sectors • More challenging with variable length fields (records) • Estimate longest possible field values (waste issues if too big, truncation/data loss if too small) • Averaging effect • Longest name unlikely to occur with longest address in mailing list

Some Miscellaneous Topics • Distinguishing data from unused space • Read length at beginning • Special delimiter at end • Count fields

Some Miscellaneous Topics • Header records • Commonly used • At beginning of file • Might contain • # records • Length of records • Date and time of last update • Name of file • Need to be able to distinguish it from data

Some Miscellaneous Topics • Metadata • Data that describes the primary data in the file • Ex: Astronomer with image data generated by telescopes • Mostly interested in the image • Need info about image • Where and when taken • Which telescope • Names of related files/images • Etc.

Managing Files of Records

Managing Files of Records

Presentation Transcript

Managing Your Files

Managing Your Files

Managing Your Files

Managing Your Files

Managing OSU Records

Piles of Files? Organizing Your Records

Managing Your Files

Managing Your Files

Manually Managing Your E-Records: Setting Up Your Electronic Files

Chap 5. Managing Files of Records

Concepts of Records and Files

Managing 100TB of small files…

Managing Your Files

Chapter 5 â€“ Managing Files of Records

Managing Your Files

Records and Files

6. Files of (horizontal) Records

6. Files of (horizontal) Records

Managing Your Files

Chap 5. Managing Files of Records

Chap 5. Managing Files of Records

6. Files of (horizontal) Records