270 likes | 286 Views
Chap 5. Managing Files of Records. Chapter Objectives. Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search and Direct access Files access and file organization Examine other kinds of the file structures in terms of Abstract data models
E N D
Chapter Objectives • Extend the file structure concepts of Chapter 4: • Search keys and canonical forms • Sequential search and Direct access • Files access and file organization • Examine other kinds of the file structures in terms of • Abstract data models • Metadata • Object-oriented file access • Extensibility • Examine issues of portability and standardization.
Contents 5.1 Record Access 5.2 More about Record Structures 5.3 Encapsulating Record I/O Ops in a Single Class 5.4 File Access and File Organization 5.5 Beyond Record Structures 5.6 Portability and Standardization
Record Access • Record Key • Canonical form : a standard form of a key • e.g. Ames or ames or AMES (need conversion) • Distinct keys : uniquely identify asingle record • Primary keys, Secondary keys, Candidate keys • Primary keys should be dataless (not updatable) • Primary keys should be unchanging • Social-security-number: good primary key • but, 999-99-9999 for all non-registered aliens
Sequential Search (1) • O(n), n : the number of records • Use record blocking • A block of several records • fields < records < blocks • O(n), but blocking decreases the number of seeking • e.g.- 4000 records, 512 bytes length • Unblocked (sector-sized buffers): 512byte size buffer => average 2000 READ() calls • Blocked (16 recs / block) : 8K size buffer ==> average 125 READ() call
Sequential Search (2) • UNIX tools for sequential processing • cat, wc, grep • When sequential search is useful • Searching patterns in ASCII files • Processing files with few records • Searching records with a certainsecondary key value
Direct Access • O(1) operation • RRN ( Relative Record Number ) • It gives relative position of the record • Byte offset = N X R • r : record size, n : RRN value • In fixed length records • Class IOBuffer includes • direct read (DRead) • direct write (DWrite) int IOBuffer::DRead (istream & stream, int recref) // read specified record { stream . seekg (recref, ios::beg); if (stream . tellg () != recref) return -1; return Read (stream); } int IOBuffer::DWrite (ostream & stream, int recref) const // write specified record { stream . seekp (recref, ios::beg); if (stream . tellp () != recref) return -1; return Write (stream); }
Ames John 123 Maple Stillwater OK74075 Mason Alan 90 Eastgate Ada OK74820 (a) Unused space Ames|John|123 Maple|Stillwater|OK|74075| Unused space Mason|Alan|90 Eastgate|Ada|OK|74820| (b) Choosing a record length and structure • Record length is related to the size of the fields • Access vs. fragmentaion vs. implementation • Fixed length record • (a)With a fixed-length fields • (b) With a variable-length fields • Unused space portion is filled with null character in C
Header Records • General information about file • date and time of recent update, count of the num of records • Header record is often placed at the beginning of the file • Header records are a widely used, important file design tool
Abstract base class for file buffers class IOBuffer public : virtual int Read( istream & ) = 0; // read a buffer from the stream virtual int Write( ostream &) const = 0; // write a buffer to the stream // these are the direct access read and write operations virtual int DRead( istream &, int recref ); //read specified record virtual int DWrite( ostream &, int recref ) const; // write specified record // these header operations return the size of the header virtual int ReadHeader ( istream & ); virtual int WriteHeader ( ostream &) const; protected : int Initialized ; // TRUE if buffer is initialized char *Buffer; // character array to hold field values IO Buffer Class definition(1)
IO Buffer Class definition(2) • The full definition of buffer class hierarchy • write method : adds header to a file and return the number of bytes in the header • read method : reads the header and check for consistency • WriteHeader method : writes the string IOBuffer at the beginning of the file. • ReadHeader method : reads the record size from the header and checks that its value is the same as that of the BufferSize member of the buffer object • DWrite/DRead methods : operates using the byte address of the record as the record reference. Dread method begins by seeking to the requested spot.
Encapsulation Record I/O Ops in a Single Class(1) • Good design for making objects persistent • provide operation to read and write objects directly • Write operation until now : • two operation : pack into a buffer + write the buffer to a file • Class ‘RecordFile’ • supports a read operation that takes an object of some class and writes it to a file. • the use of buffers is hidden inside the class • problem with defining class ‘RecordFile’: • how to make it possible to support files for different object types without needing different versions of the class
Encapsulation Record I/O Operation in a Single Class(2) • Class ‘RecordFile’ • uses C++ template features to solve the problem • definition of the template class RecordFile • template <class RecType> • class RecordFile : public BufferFile • { • public: • int Read(RecType& record, int recaddr = -1); • int Write(const RecType& record, int recaddr = -1 ); • RecordFile(IOBuffer& buffer) : BufferFile(buffer) { } • };
// template method bodies template <class RecType> int RecordFile<RecType>::Read (RecType & record, int recaddr = -1) { int writeAdd, result; writeAddr = BufferFile::Read (recaddr); if (!writeAddr) return -1; result = record.Unpack(Buffer); if (!result) return -1; return writeAddr; } template <class RecType> int RecordFile<RecType>::Write (const RecType & record, int recaddr = -1) { int result; result = record . Pack (Buffer); if (!result) return -1; return BufferFile::Write (recaddr); }
File Organization File Access Variable-length Records Sequential access Fixed-length records Direct access File Access and File Organization • There is difference between file access and file organization. • Variable-length records • Sequential access is suitable • Fixed-length records • Direct access and sequential access are possible
Abstract Data Model • Data object such as document, images, sound • e.g. color raster images, FITS image file • Abstract Data Model does not view data as it appears on a particular medium. application-oriented view • Headers and Self-describing files
Metadata • Data that describe the primary data in a file • A place to store metadata in a file is the header record • Standard format • FITS (Flexible Image Transport System) by International Astronomers’ union (see Figure 5.7)
Mixing object Types in a file • Each field is identified using “keyword = value” • Index table with tags • e.g.
Object-oriented file access • Separate translating to and from the physical format and application (representation-independent file access) Program find_star : read_image(“star1”, image) process image : end find_star image : star1 star2 RAM Disk
Extensibility • Advantage of using tags • Identify object within files is that we do not have to know a priori what all of the objects will look like • When we encounter new type of object, we implement method for reading and writing that object and add the method.
Factor affecting Portability • Differences among operating system • Differences among language • Differences in machine architecture • Differences on platforms • EBCDIC and ASCII
Achieving Portability (1) • Standardization • Standard physical record format • extensible, simple • Standard binary encoding for data elements • IEEE, XDR • File structure conversion • Number and text conversion
Achieving Portability (2) • File system difference • Block size is 512 bytes on UNIX systems • Block size is 2880 bytes on non-UNIX systems • UNIX and Portability • UNIX support portability by being commonly available on a large number of platforms • UNIX provides a utility called dd • dd : facilitates data conversion
Portability • 화일 공유 • 화일이 서로 다른 컴퓨터에서, 서로 다른 프로그램에서 접근 가능 • 이식성 (Portability) 과 표준화 (Standardization) • 이식성에 영향을 주는 요인들 • 두 회사가 화일을 공유 • A 회사: sun 컴퓨터, C 프로그래밍, B 회사: IBM PC 에서 Turbo PASCAL 프로그래밍 • 운영체제 사이의 차이점들 • 화일의 궁극적인 물리적 형식은 운영체제 사이의 차이점에 의해 변할 수 있음 • 프로그래밍 언어들 사이의 차이점들
Portability • 이식성의 달성 • 표준이 되는 물리적인 레코드 형식에 동의하고 그것을 따름 • 물리적 표준 : 어떤 언어, 기계, 운영체제에 상관 없이 물리적으로 같게 표현되는 것 • ex) FITS • 데이터 요소를 위한 표준 이진 코드화에 동의 • 기본적 데이터 요소 : 텍스트, 숫자 • ex) IEEE 표준형식과 XDR
Portability • 변환 1: 직접 변환 형태 • 변환 2 : 중간 표준 형태 IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC IBM VAX Cray Sun 3 IBM PC XDR
Let’s Review !!! 5.1 Record Access 5.2 More about Record Structures 5.3 Encapsulating Record I/O Ops in a Single Class 5.4 File Access and File Organization 5.5 Beyond Record Structures 5.6 Portability and Standardization