Chap 5. Managing Files of Records

Chap 5. Managing Files of Records

Chapter Objectives • Extend the file structure concepts of Chapter 4: • Search keys and canonical forms • Sequential search and Direct access • Files access and file organization • Examine other kinds of the file structures in terms of • Abstract data models • Metadata • Object-oriented file access • Extensibility • Examine issues of portability and standardization.

Contents 5.1 Record Access 5.2 More about Record Structures 5.3 Encapsulating Record I/O Ops in a Single Class 5.4 File Access and File Organization 5.5 Beyond Record Structures 5.6 Portability and Standardization

Record Access • Record Key • Canonical form : a standard form of a key • e.g. Ames or ames or AMES (need conversion) • Distinct keys : uniquely identify asingle record • Primary keys, Secondary keys, Candidate keys • Primary keys should be dataless (not updatable) • Primary keys should be unchanging • Social-securiy-number: good primary key • but, 999-99-9999 for all non-registered aliens

Sequential Search (1) • O(n), n : the number of records • Use record blocking • A block of several records • fields < records < blocks • O(n), but blocking decreases the number of seeking • e.g.- 4000 records, 512 bytes length • Unblocked (sector-sized buffers): 512byte size buffer => average 2000 READ() calls • Blocked (16 recs / block) : 8K size buffer ==> average 125 READ() call

Sequential Search (2) • UNIX tools for sequential processing • cat, wc, grep • When sequential search is useful • Searching patterns in ASCII files • Processing files with few records • Searching records with a certainsecondary key value

Direct Access • O(1) operation • RRN ( Relative Record Number ) • It gives relative position of the record • Byte offset = N X R • r : record size, n : RRN value • In fixed length records • Class IOBuffer includes • direct read (DRead) • direct write (DWrite)

Header Records • General information about file • date and time of recent update, count of the num of records • Header record is often placed at the beginning of the file • Header records are a widely used, important file design tool

Abstract base class for file buffers class IOBuffer public : virtual int Read( istream & ) = 0; // read a buffer from the stream virtual int Write( ostream &) const = 0; // write a buffer to the stream // these are the direct access read and write operations virtual int DRead( istream &, int recref ); //read specified record virtual int DWrite( ostream &, int recref ) const; // write specified record // these header operations return the size of the header virtual int ReadHeader ( istream & ); virtual int WriteHeader ( ostream &) const; protected : int Initialized ; // TRUE if buffer is initialized char *Buffer; // character array to hold field values IO Buffer Class definition(1)

IO Buffer Class definition(2) • The full definition of buffer class hierarchy • write method : adds header to a file and return the number of bytes in the header • read method : reads the header and check for consistency • WriteHeader method : writes the string IOBuffer at the beginning of the file. • ReadHeader method : reads the record size from the header and checks that its value is the same as that of the BufferSize member of the buffer object • DWrite/DRead methods : operates using the byte address of the record as the record reference. Dread method begins by seeking to the requested spot.

ReadHeader, WriteHeader Member Functions static const char * headerStr = "IOBuffer"; static const int headerSize = strlen (headerStr); int IOBuffer::ReadHeader (istream & stream) { char str[headerSize+1]; stream . seekg (0, ios::beg); stream . read (str, headerSize); if (! stream . good ()) return -1; if (strncmp (str, headerStr, headerSize)==0) return headerSize; else return -1; } int IOBuffer::WriteHeader (ostream & stream) const { stream . seekp (0, ios::beg); stream . write (headerStr, headerSize); if (! stream . good ()) return -1; return headerSize; }

WriteHeader Member Function – VariableLengthBuffer int VariableLengthBuffer :: WriteHeader (ostream & stream) const // write a buffer header to the beginning of the stream // A header consists of the //IOBUFFER header //header string //Variable sized record of length fields //that describes the file records { int result; // write the parent (IOBuffer) header result = IOBuffer::WriteHeader (stream); if (!result) return FALSE; // write the header string stream . write (headerStr, headerSize); if (!stream . good ()) return FALSE; // write the record description return stream . tellp(); }

ReadHeader Member Function - VariableLengthBuffer int VariableLengthBuffer :: ReadHeader (istream & stream) // read the header and check for consistency { char str[headerSize+1]; int result; // read the IOBuffer header result = IOBuffer::ReadHeader (stream); if (!result) return FALSE; // read the header string stream . read (str, headerSize); if (!stream.good()) return FALSE; if (strncmp (str, headerStr, headerSize) != 0) return FALSE; // read and check the record description return stream . tellg (); }

Read Member Function - VariableLengthBuffer int VariableLengthBuffer :: Read (istream & stream) // write the number of bytes in the buffer field definitions // the record length is represented by an unsigned short value { if (stream.eof()) return -1; int recaddr = stream . tellg (); Clear (); unsigned short bufferSize; stream . read ((char *)&bufferSize, sizeof(bufferSize)); if (! stream . good ()){stream.clear(); return -1;} BufferSize = bufferSize; if (BufferSize > MaxBytes) return -1; // buffer overflow stream . read (Buffer, BufferSize); if (! stream . good ()){stream.clear(); return -1;} return recaddr; }

Write Member Function - VariableLengthBuffer int VariableLengthBuffer :: Write (ostream & stream) const // write the length and buffer into the stream { int recaddr = stream . tellp (); unsigned short bufferSize; bufferSize = BufferSize; stream . write ((char *)&bufferSize, sizeof(bufferSize)); if (!stream) return -1; stream . write (Buffer, BufferSize); if (! stream . good ()) return -1; return recaddr; }

DRead, DWrite Member Functions int IOBuffer::DRead (istream & stream, int recref) // read specified record { stream . seekg (recref, ios::beg); if (stream . tellg () != recref) return -1; return Read (stream); } int IOBuffer::DWrite (ostream & stream, int recref) const // write specified record { stream . seekp (recref, ios::beg); if (stream . tellp () != recref) return -1; return Write (stream); }

Encapsulation Record I/O Ops in a Single Class(1) • Good design for making objects persistent • provide operation to read and write objects directly • Write operation until now : • two operation : pack into a buffer + write the buffer to a file • Class ‘RecordFile’ • supports a read operation that takes an object of some class and writes it to a file. • the use of buffers is hidden inside the class • problem with defining class ‘RecordFile’: • how to make it possible to support files for different object types without needing different versions of the class

BufferFile Class Definition class BufferFile // file with buffers { public: BufferFile (IOBuffer &); // create with a buffer int Open(char * fname, int MODE); // open an existing file int Create (char * fname, int MODE); // create a new file int Close(); int Rewind(); // reset to the first data record // Input and Output operations int Read(int recaddr = -1); int Write(int recaddr = -1); int Append(); // write the current buffer at the end of file protected: IOBuffer & Buffer; // reference to the file’s buffer fstream File; // the C++ stream of the file }; Usage: DelimFieldBuffer buffer; BufferFile file(buffer); file.open(myfile); file.Read(); buffer.Unpack(myobject);

Encapsulation Record I/O Operation in a Single Class(2) • Class ‘RecordFile’ • uses C++ template features to solve the problem • definition of the template class RecordFile • template <class RecType> • class RecordFile : public BufferFile • { • public: • int Read(RecType& record, int recaddr = -1); • int Write(const RecType& record, int recaddr = -1 ); • RecordFile(IOBuffer& buffer) : BufferFile(buffer) { } • };

// template method bodies template <class RecType> int RecordFile<RecType>::Read (RecType & record, int recaddr = -1) { int writeAdd, result; writeAddr = BufferFile::Read (recaddr); if (!writeAddr) return -1; result = record.Unpack(Buffer); if (!result) return -1; return writeAddr; } template <class RecType> int RecordFile<RecType>::Write (const RecType & record, int recaddr = -1) { int result; result = record . Pack (Buffer); if (!result) return -1; return BufferFile::Write (recaddr); }

File Organization File Access Variable-length Records Sequential access Fixed-length records Direct access File Access and File Organization • There is difference between file access and file organization. • Variable-length records • Sequential access is suitable • Fixed-length records • Direct access and sequential access are possible

Abstract Data Model • Data object such as document, images, sound • e.g. color raster images, FITS image file • Abstract Data Model does not view data as it appears on a particular medium. application-oriented view • Headers and Self-describing files

Metadata • Data that describe the primary data in a file • A place to store metadata in a file is the header record • Standard format • FITS (Flexible Image Transport System) by International Astronomers’ union (see Figure 5.7)

Mixing object Types in a file • Each field is identified using “keyword = value” • Index table with tags • e.g.

Object-oriented file access • Separate translating to and from the physical format and application (representation-independent file access) Program find_star : read_image(“star1”, image) process image : end find_star image : star1 star2 RAM Disk

Extensibility • Advantage of using tags • Identify object within files is that we do not have to know a priori what all of the objects will look like • When we encounter new type of object, we implement method for reading and writing that object and add the method.

Factor affecting Portability • Differences among operating system • Differences among language • Differences in machine architecture • Differences on platforms • EBCDIC and ASCII

Achieving Portability (1) • Standardization • Standard physical record format • extensible, simple • Standard binary encoding for data elements • IEEE, XDR • File structure conversion • Number and text conversion

Achieving Portability (2) • File system difference • Block size is 512 bytes on UNIX systems • Block size is 2880 bytes on non-UNIX systems • UNIX and Portability • UNIX support portability by being commonly available on a large number of platforms • UNIX provides a utility called dd • dd : facilitates data conversion

Portability • 화일 공유 • 화일이 서로 다른 컴퓨터에서, 서로 다른 프로그램에서 접근 가능 • 이식성 (Portability) 과 표준화 (Standardization) • 이식성에 영향을 주는 요인들 • 두 회사가 화일을 공유 • A 회사: sun 컴퓨터, C 프로그래밍, B 회사: IBM PC 에서 Turbo PASCAL 프로그래밍 • 운영체제 사이의 차이점들 • 화일의 궁극적인 물리적 형식은 운영체제 사이의 차이점에 의해 변할 수 있음 • 프로그래밍 언어들 사이의 차이점들

Portability • 이식성의 달성 • 표준이 되는 물리적인 레코드 형식에 동의하고 그것을 따름 • 물리적 표준 : 어떤 언어, 기계, 운영체제에 상관 없이 물리적으로 같게 표현되는 것 • ex) FITS • 데이터 요소를 위한 표준 이진 코드화에 동의 • 기본적 데이터 요소 : 텍스트, 숫자 • ex) IEEE 표준형식과 XDR

Portability • 변환 1: 직접 변환 형태 • 변환 2 : 중간 표준 형태 IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC XDR

Let’s Review !!! 5.1 Record Access 5.2 More about Record Structures 5.3 Encapsulating Record I/O Ops in a Single Class 5.4 File Access and File Organization 5.5 Beyond Record Structures 5.6 Portability and Standardization

Chap 5. Managing Files of Records