280 likes | 453 Views
CIS 402: File Management Techniques Chapter 4. Fundamental File Structure Concepts & Managing Files of Records. Outline I: Fundamental File Structure Concepts. Stream Files Field Structures Reading a Stream of Fields Record Structures Record Structures that use a length indicator.
E N D
CIS 402: File Management TechniquesChapter 4 Fundamental File Structure Concepts & Managing Files of Records
Outline I: Fundamental File Structure Concepts • Stream Files • Field Structures • Reading a Stream of Fields • Record Structures • Record Structures that use a length indicator
Field and Record Organization Overview • Goal: Make data persistent • Data written by one program can be read by another, elsewhere • The basic logical unit of data is the field • a single data value. • Fields are organized into aggregates, either: • many copies of a single field (an array) • a list of different fields (a record). • When a record is stored in memory, it is called an object • its fields as members. • There are many ways that objects can be represented as records in files.
Stream Files • In Stream Files, the information is written as a • stream of bytes containing no added information: • Here is the info for a State: • OHIO108471157264.9413303531180317COLUMBUS • Fields alternate with bold • In file, just a bunch of bytes • Problem: There is no way to get the information • back in the organized record format. • Need an organization method • Project 1: Indexing
Field Structures • There are many ways of adding structure to text or binary files to maintain the identity of fields: • Force the field into a predictable length • e.g. all strings exactly 14 characters, pop 8 (text), etc; Admission:8 total OHIO 10847115 7 264.9 4133035 3 1180317COLUMBUS Name: 14 Pop: 8 Density:6 Rank:2 Order:2 Rank:2 Area:6 Capital:14
Field Structures • Begin each field with length indicator (length-based fields) • instead of value in index • can be done with char (byte) is all fields less than 256 chars ♫OHIO◘10847115☺7♣264.9♣41330☻35☺3☺1♦1803☻17◘COLUMBUS • Read the length, read the field; can skip fields by placing ‘\0’ as length
Field Structures • Use delimiters • Each field separated by same delimiter • must account for all • can use consecutive delimiters for missing field • choice of delimiter must not interfere with data • For state, don’t use blank, since some states incorporate blanks in names/capitals
Field Structures • Use a “keyword = value” expression to identify each field and its content. • uses delimiters still, but also names each field by keyword • wastes a lot of space • allows for some records with missing fields
Defining a stream extraction operator to read fields For a state, would be 2 strings and numeric data Adapts to delimited storage well Not limited by language dependency can be done in one language one place, another language elsewhere Reading a Stream of Fields
Record Structure I • A record can be defined as a set of fields that belong together when the file is viewed in terms of a higher level of organization. • Like the notion of a field, a record is another conceptual tool which need not exist in the file in any physical sense. • A record is an important logical building block of a file’s structure.
Record Structures II • Methods for organizing the records of a file include: • Requiring that records be a predictable number of bytes in length. • Requiring that records be a predictable number of fields in length. • Begin each record with a length indicator consisting of a count of the number of bytes that the record contains. • Delimited fields • String Stream buffers; all data written as chars • Use a second file to keep track of the beginning byte address for each record. Indexing!! • Is the second file necessary? • Placing a delimiter at the end of each record to separate it from the next record.
Buffer Classes • The notion of records that we implemented are lacking something: none of the variability in the length of records that was inherent in the initial stream file was conserved. • Implementation: Length at start, delimited fields • Writing the variable-length records to the file • Representing the record length • Reading the variable-length record from the file.
Read-file using File Dump • File-dump gives us the ability to look inside a file at the actual bytes that are stored • Octal Dump: od -xc filename • e.g. The number 40, stored as ASCII characters and as a short integer Decimal value of Hex value stored ASCII number in bytes character form 34 30 '4' '0' (a) 40 stored as ASCII chars: 40 00 28 '\0' "(" 40 (b) 40 stored as a 2-byte integer:
Using Classes to Manipulate Buffers • Examples of three C++ classes to encapsulate operation of buffer object • Function : Pack, Unpack, Read, Write • Output: pack into a buffer & write a buffer to a file • Input: read into a buffer from a file & unpack a buffer • ‘pack and unpack’ deals with only one field • DelimTextBufferclass for delimited fields • LengthTextBufferclass for length-based fields • FixedTextBuffer class for fixed-length fields • Appendix E : Full implementation
Buffer Class for Delimited Text Fields(1) • Variable-length buffer • Fields are represented as delimited text • Class DelimTextBuffer • { public: • DelimTextBuffer (char Delim = ‘|’, int maxBtytes = 1000); • int Read(istream & file); • int Write (ostream & file) const; • int Pack(const char * str, int size = -1); • int Unpack(char * str); • private: • char Delim; // delimiter character • char * Buffer; // character array to hold field values • int BufferSize; // current size of packed fields • int MaxBytes; // maximum # of characters in the buffer • int NextByte; // packing/unpacking position in buffer • };
Buffer Class for Delimited Text Fields(2) int DelimTextBuffer::Unpack(char *str) // extract the value of the next field of the buffer { int len = -1; // length of packed string int start = NextByte; // first character to be unpacked for(int i = start; i < BufferSize; i++) if(Buffer[i] == Delim) {len = i-start; break; } if(len == -1) return FALSE; // delimiter not found NextByte += len + 1; if(NextByte > BufferSize) return FALSE; strncpy (str, &Buffer[start], len); str[len] = 0; // zero termination for string return TRUE; } Unpack() extracts one field from a record in a buffer. Pack() copies the characters of its argument to the buffer and then adds the delimiter characters.
Buffer Class for Delimited Text Fields(3) int DelimTextBuffer :: Pack (const char * str, int size) // set the value of the next field of the buffer; // if size = -1 (default) use strlen(str) as Delim of field { short len; // length of string to be packed if (size >= 0) len = size; else len = strlen (str); C-string length fn: # chars to \0 if (len > strlen(str)) // str is too short! return FALSE; int start = NextByte; // first character to be packed NextByte += len + 1; if (NextByte > MaxBytes) return FALSE; memcpy (&Buffer[start], str, len); Buffer [start+len] = Delim; // add delimeter BufferSize = NextByte; return TRUE; } Pack() copies the characters of its argument to the buffer and then adds the delimiter characters.
Buffer Class for Delimited Text Fields(4) • Read method of DelimTextBuffer • Clears the current buffer contents • Extracts the record size • Read the proper number of bytes into buffer • Set the buffer size int DelimTextBuffer::Read(istream & stream) { Clear(); stream.read((char *)&BufferSize, sizeof(BufferSize)); if (Stream.fail()) return FALSE; if (BubberSize > MaxBytes) return FALSE; // buffer overflow stream.read(Buffer, BufferSize); return stream.good(); }
Extending Class State with Buffer Operations class State { public: // function … int Pack(DelimTextBuffer &buf) const; // buffer operation Pack ... private: char name[15]; char capital[15]; // longest possible is 14 + 1 for null terminator int pop,poprank, area,areaRank … float popDensity; ... } int State::Pack(DelimTextBuffer &buf) const { // pack the fields into a DelimTextBuffer int result; result = buf.Pack(name); result = result && buf.Pack(capital); … return result = result && buf.Pack(popDensity); } • buf.pack deals with only one field (the next one)! • it is used repeatedly • each argument must be passed as a C-string
Buffer Class for Delimitted Text Field(Reprise) Class DelimTextBuffer { public: DelimTextBuffer (char Delim = ‘|’, int maxBtytes = 1000); int Read(istream & file); int Write (ostream & file) const; int Pack(const char * str, int size = -1); int Unpack(char * str); private: char Delim; // delimiter character char * Buffer; // character array to hold field values int BufferSize; // current size of packed fields int MaxBytes; // maximum # of characters in the buffer int NextByte; // packing/unpacking position in buffer };
Buffer Classes for Length-Based Fields • Almost same as the delimited field class (compare with the previous page) • Change in the implementations of the Pack and Unpack class LengthTextBuffer { public: LengthTextBuffer(int maxBytes = 1000); int Read(istream & file); int Write(ostream & file) const; int Pack(const char * field, int size = -1); int Unpack(char * field); private: char * Buffer; // character array to hold field values int BufferSize; // size of packed fields int MaxBytyes; // maximum # of characters in the buffer int NextByte; // packing/unpacking position in buffer };
Buffer Classes for Fixed-length Fields Class FixedTextBuffer { public: FixedTextBuffer (int maxBytes = 1000); int AddField (int fieldSize); int Read(isteram * file); int Write(ostream *file) const; int Pack(const char * field); int Unpack (char * field); private: char * Buffer; // character array to hold field values int BufferSize; // size of packed fields int MaxBytes; // Max # of chars in the buffer int NextByte; // packing/unpacking position in buffer int * FieldSizes; // array of field sizes }
Inheritance in the C++ Stream Classes class istream: virtual public ios { … class ostream: virtual public ios { … class iostream: virtual istream, public ostream { … class ifstream: public fstreambase, public istream { … class ostream: public fstreambase, public ostream {… class fstream: public fstreambase, public iostream { … • Operations that work on base class objects also work on derived class objects
Array only Read/write pack/unpack Class Hierarchy for Record Buffer Objects1) • Inheritance allows multiple classes to share members and functions Appendix F : full implementation; on website (SourceCode)
Class Hierarchy for Record Buffer ObjectsTop-level class class IOBuffer { public: IOBuffer (int maxBytes = 1000); // a MAX of maxByte // pure virtual functions virtual int Read (istream &) = 0; // read a buffer virtual int Write (ostream &) = 0; // write a buffer virtual int Pack (const void * field, int size = -1) = 0; virtual int Unpack (void * field, int maxbytes = -1) = 0; protected: // common to all subclasses char * Bufffer; // character array to hold field values int BufferSize; // sum of the sizes of packed fields int MaxBytes; // MAX # of characters in the buffer };
Class Hierarchy for Record Buffer Objectsall not shown2nd Level Class VariableLengthBuffer: public IOBuffer { public: VariableLengthBuffer (int MaxBytes = 1000); int Read (istream &); int Write (ostream &) const; int SizeOfBuffer () const; // return current size of buffer }; class DelimFieldBuffer: public VariableLengthBuffer { public: DelimFieldBuffer (char Delim = -1, int maxBytes = 1000); int Pack (const void *, int size = -1); int Unpack (void *field, int maxBytes = -1); protected: char Delim; }; 3rd Level
Managing Fixed-Length, Fixed-Field Buffers 3rd Level class FixedFieldBuffer: public FixedLengthBuffer { public: FixedFieldBuffer (int maxFields, int RecordSzie = 1000); FixedFieldBuffer (int maxFields, int *fieldSize); int AddField (int fieldSize); // define the next field int Pack(const void * field, int size = -1); int Unpack(void * field, int maxBytes = -1); int NumberOfFields () const; // return # of defined fields protected: int * FieldSzie; // array to hold field sizes int MaxFields; // MAX # of fields int NumFields; // actual # of defined fields };
Object-Oriented Class for Record Files • Now, encapsulate all file operations • BUT, each BufferFile has to have a header (Section 5.2) so info can be gathered by the BufferFile class BufferFile // file with buffers { public: BufferFile (IOBuffer &); // create with a buffer int Open(char * fname, int MODE); // open an existing file int Create (char * fname, int MODE); // create a new file int Close(); int Rewind(); // reset to the first data record // Input and Output operations int Read(int recaddr = -1); int Write(int recaddr = -1); int Append(); // write the current buffer at the end of file protected: IOBuffer & Buffer; // reference to the file’s buffer fstream File; // the C++ stream of the file }; Usage: DelimFieldBuffer buffer; BufferFile file(buffer); file.Open(myfile); file.Read(); buffer.Unpack(myobject);