Learning Objectives

Learning Objectives • Field and record organization • Index file • C++ code that deals with field and record organization. File Structure Fundamentals (D.H.)

Field and Record Organization Field is a basic unit of data organization. It is the smallest logically meaningful unit of information in a file. Each field contains a single data value: e.g. last name, or balance. Aggregates of a list of different fields are called records , e.g. last name, first name and balance. File Structure Fundamentals (D.H.)

A Stream File • Files are written to a persistent storage (such as disks) as streams of bytes. Unless there is some special action done to avoid it, a record of data such as: Mary Ames 123 Maple Stillwater, OK 74075 Will be written (using << operator) to the disk as: Mary Ames123 MapleStillwater, OK 74075 File Structure Fundamentals (D.H.)

A Stream File - cont. • A single line of text such as: Mary Ames123 MapleStillwater, OK 74075 makes it impossible to extract and display correct information. • What is the last name in the above line? Most systems would read Ames123 as the last name and MapleStillwater as a street address. This is obviously NOT correct. File Structure Fundamentals (D.H.)

Fields and Records • Thus we need to organize the files in such way that lets us keep the information divided into fields. • For example the above file should be kept as a set of records with the following fields: • first name • last name • street address • city • state • ZIP code File Structure Fundamentals (D.H.)

Field Structures • Four most common ways of structuring files are: • force the fields into predicable (fixed) length • begin each field with the length indicator • place delimiter at the end of each field to separate the fields • use “keyword=value” expression to identify each field and its contents File Structure Fundamentals (D.H.)

Fixing the Length of Fields • This method relies on creating fields of predictable fixed size. • E.G. One may define the following class: class Person { public: char last[11]; char first[11]; char address[16]; char city[16]; char state[3]; char zip[10]; } File Structure Fundamentals (D.H.)

Fixing the Length of Fields-cont. • The size of each character array in the above example is fixed. If the value of the data requires less space than reserved than characters representing spaces (‘ ‘) are added to fill the array. • For example a string such as “Mary” would have to be padded with six blanks (and terminated with \0) to fill the array: char first[11] File Structure Fundamentals (D.H.)

Fixing the Length of Fields-cont. • Disadvantages: • a lot of wasted space due to “padding” of fields with “blanks” • data values may not fit into the field sizes: • e.g. Michalopoulos is too long to fit in the array char last[11] • Thus the fixed-size field approach is inappropriate for data that inherently contains a large amount of variability in the length of fields such as names or addresses. File Structure Fundamentals (D.H.)

Beginning Each Field with a Length Indicator • This method requires that each field data be preceded with an indicator of its length (in bytes). • E.G. 04Ames04Mary09123 Maple10StillWater02OK0574075 • One of the disadvantages of this method is that it is more complex since it requires extracting of numbers and strings from a single string representing a record. File Structure Fundamentals (D.H.)

The method of separating fields with a delimiter is often used. However choosing a right delimiter is very important. • In many cases white-space characters (blanks) are excellent delimiters because they provide a clean separation between fields when we list them on the console. • However, white spaces would not work as delimiters in our previous example. Why? File Structure Fundamentals (D.H.)

Overloading a stream extraction operator to read a file with delimiters istream & operator >> (istream & streamF, Person &p) {//read delimited fields from file streamF.getline(p.last, 30, ‘|’); if (strlen(p.last)==0) return streamF); streamF.getline(p.first,30,’|’); streamF.getline(p.address,30,’|’); streamF.getline(p.city,30,’|’); streamF,getline(p.state,15,’|’); streamF.getline(p.zip,10,’|’); return stream; } File Structure Fundamentals (D.H.)

Using a “keyword = value” expression • This method requires that each field data be preceded with the field identifier (keyword). • E.G. last=Amesfirst=Maryaddress=123 Maplecity=StillWaterstate=OKzip=574075 • Can be used with the delimiter method to mark the field ends. last=Ames|first=Mary|address=123 Maple|City=StillWater|state=OK|zip=574075 File Structure Fundamentals (D.H.)

Using a “keyword = value” expression • Advantages: • each field provides information about itself • good format for dealing with missing fields • Disadvantages: • In some application a lot of space may be wasted on field keywords (up 50%). File Structure Fundamentals (D.H.)

Record Structures • Files may be viewed as collections of records which are sets of fields • Some of the most often used methods for organizing the records of a file are: • require that the records be a predictable (fixed) number of bytes in length • require that the records be a predicable number of fields in length File Structure Fundamentals (D.H.)

Organizing the Records of a File • begin each record with its length indicator (count of the of bytes in the record) • use a second file to keep track of the beginning byte address for each record • place a delimiter at the end of each record to separate it from the next record File Structure Fundamentals (D.H.)

Fixed-Length Records • This method is a counterpart of is analogous method for organizing files with fix length fields. • Fixing the sizes of fields in a record will produce a fixed-size record. File Structure Fundamentals (D.H.)

Fixed-Length Records • E.G. class Person { public: char last[11]; char first[11]; char address[16]; char city[16]; char state[3]; char zip[10]; } Will produce a fixed size record of size 67 bytes. File Structure Fundamentals (D.H.)

Fixed-Length Records • The fixed length record structure, however, does NOT imply, the fixed -length field structure. • Fixed-length records are frequently used as “containers” to hold variable numbers of variable-length fields. • Fixed-length record structures are among the most commonly used methods for organizing files. File Structure Fundamentals (D.H.)

Records with a Predicable Number of Fields • The method specifies the number of fields in each record. • Regardless of the method for storing fields, this approach allows for relatively easy means for calculating record boundaries. File Structure Fundamentals (D.H.)

Records with a Length Indicator • This method requires that each record begin with a length indicator. • This method is commonly used for handling variable-length records. File Structure Fundamentals (D.H.)

Index File to Keep Track of Record Addresses • This method uses an index file (or an index block) to keep a byte offset for each record in the original data file. The byte offsets (record addresses) allow us to find the beginning of each successive record and compute the length of each record. File Structure Fundamentals (D.H.)

Questions. • What is the byte offset or just offset of a file? • What C++ functions use the byte offset and what is their purpose? File Structure Fundamentals (D.H.)

Records Separated with Delimiters • This method is analogous to the use of delimiters to separate fields. • As with fields the delimiter must be well chosen and it cannot be a part of data. • Common delimiter is the end-of-line character ‘\n’, since records often are read directly to the console. File Structure Fundamentals (D.H.)

A Record Structure that Uses a Length Indicator • Use a memory buffer to store the data that is going to be written to the disk. • Write down the size of the record at the beginning of it. • Write down the buffer contents after writing the size. File Structure Fundamentals (D.H.)

A Sample C++ Code that Uses Records with a Length Indicator const int MaxBufferSize=200; int writePerson (ostream streamF, person &p) {char buffer[MaxBufferSize]; //create a buffer strcpy(buffer, p.last); strcat(buffer,’|’); strcat(buffer, p.first);strcat(buffer,’|’); strcat(buffer, p.address);strcat(buffer,’|’); strcat(buffer, p.city);strcat(buffer,’|’); strcat(buffer, p.state);strcat(buffer,’|’); strcat(buffer, p.state);strcat(buffer,’|’); short length=strlen(buffer); streamF.write(&length, sizeof(length); streamF.write(&buffer, length); } File Structure Fundamentals (D.H.)

Reading the Variable-Length Records from the File int readPerson (istream &streamF, Person &p) { short length; streamF.read(&length,sizeof(length); char *buffer = new char[length+1]; stream.F.read(buffer, length); buffer[length]=0; istrstream strbuff(buffer); strbuff >>p; return 1; } File Structure Fundamentals (D.H.)

Learning Objectives