300 likes | 443 Views
File Structures by Folk, Zoellick, and Ricarrdi. Chap 5. Managing Files of Records. 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형주 교수. Chapter Objectives. Extend the file structure concepts of Chapter 4: Search keys and canonical forms Sequential search and Direct access
E N D
File Structures by Folk, Zoellick, and Ricarrdi Chap 5. Managing Files of Records 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형주 교수 SNU-OOPSLA Lab
Chapter Objectives • Extend the file structure concepts of Chapter 4: • Search keys and canonical forms • Sequential search and Direct access • Files access and file organization • Examine other kinds of the file structures in terms of • Abstract data models • Metadata • Object-oriented file access • Extensibility • Examine issues of portability and standardization. SNU-OOPSLA Lab
Contents 5.1 Record Access 5.2 More about Record Structures 5.3 Encapsulating Record I/O Ops in a Single Class 5.4 File Access and File Organization 5.5 Beyond Record Structures 5.6 Portability and Standardization SNU-OOPSLA Lab
5.1 Record Access Record Access • Record Key • Canonical form : a standard form of a key • e.g. Ames or ames or AMES (need conversion) • Distinct keys : uniquely identify asingle record • Primary keys, Secondary keys, Candidate keys • Primary keys should be dataless (not updatable) • Primary keys should be unchanging • Social-securiy-number: good primary key • but, 999-99-9999 for all non-registered aliens SNU-OOPSLA Lab
5.1 Record Access Sequential Search (1) • O(n), n : the number of records • Use record blocking • A block of several records • fields < records < blocks • O(n), but blocking decreases the number of seeking • e.g.- 4000 records, 512 bytes length • Unblocked (sector-sized buffers): 512byte size buffer => average 2000 READ() calls • Blocked (16 recs / block) : 8K size buffer ==> average 125 READ() call SNU-OOPSLA Lab
5.1 Record Access Sequential Search (2) • UNIX tools for sequential processing • cat, wc, grep • When sequential search is useful • Searching patterns in ASCII files • Processing files with few records • Searching records with a certainsecondary key value SNU-OOPSLA Lab
5.1 Record Access Direct Access • O(1) operation • RRN ( Relative Record Number ) • It gives relative position of the record • Byte offset = N X R • r : record size, n : RRN value • In fixed length records • Class IOBuffer includes • direct read (DRead) • direct write (DWrite) SNU-OOPSLA Lab
5.2 More about Record Structures Ames John 123 Maple Stillwater OK74075 Mason Alan 90 Eastgate Ada OK74820 (a) Unused space Ames|John|123 Maple|Stillwater|OK|74075| Unused space Mason|Alan|90 Eastgate|Ada|OK|74820| (b) Choosing a record length and structure • Record length is related to the size of the fields • Access vs. fragmentaion vs. implementation • Fixed length record • (a)With a fixed-length fields • (b) With a variable-length fields • Unused space portion is filled with null character in C SNU-OOPSLA Lab
5.2 More about Record Structures Header Records • General information about file • date and time of recent update, count of the num of records • Header record is often placed at the beginning of the file • Header records are a widely used, important file design tool • Pascal does not naturally support header records (File is a repeated collection of the same type) • Use variant records (depending on context) SNU-OOPSLA Lab
5.2 More about Record Structures Abstract base class for file buffers class IOBuffer public : virtual int Read( istream & ) = 0; // read a buffer from the stream virtual int Write( ostream &) const = 0; // write a buffer to the stream // these are the direct access read and write operations virtual int DRead( istream &, int recref ); //read specified record virtual int DWrite( ostream &, int recref ) const; // write specified record // these header operations return the size of the header virtual int ReadHeader ( istream & ); virtual int WriteHeader ( ostream &) const; protected : int Initialized ; // TRUE if buffer is initialized char *Buffer; // character array to hold field values IO Buffer Class definition(1) SNU-OOPSLA Lab
5.2 More about Record Structures IO Buffer Class definition(2) • The full definition of buffer class hierarchy • write method : adds header to a file and return the number of bytes in the header • read method : reads the header and check for consistency • WriteHeader method : writes the string IOBuffer at the beginning of the file. • ReadHeader method : reads the record size from the header and checks that its value is the same as that of the BufferSize member of the buffer object • DWrite/DRead methods : operates using the byte address of the record as the record reference. Dread method begins by seeking to the requested spot. SNU-OOPSLA Lab
5.3 Encapsulating Record I/O Operations in a Single Class Encapsulation Record I/O Ops in a Single Class(1) • Good design for making objects persistent • provide operation to read and write objects directly • Write operation until now : • two operation : pack into a buffer + write the buffer to a file • Class ‘RecordFile’ • supports a read operation that takes an object of some class and writes it to a file. • the use of buffers is hidden inside the class • problem with defining class ‘RecordFile’: • how to make it possible to support files for different object types without needing different versions of the class SNU-OOPSLA Lab
5.3 Encapsulating Record I/O Operations in a Single Class Encapsulation Record I/O Operation in a Single Class(2) • Class ‘RecordFile’ • uses C++ template features to solve the problem • definition of the template class RecordFile • template <class RecType> • class RecordFile : public BufferFile • { • public: • int Read(RecType& record, int recaddr = -1); • int Write(const RecType& record, int recaddr = -1 ); • RecordFile(IOBuffer& buffer) : BufferFile(buffer) { } • }; SNU-OOPSLA Lab
// template method bodies template <class RecType> int RecordFile<RecType>::Read (RecType & record, int recaddr = -1) { int writeAdd, result; writeAddr = BufferFile::Read (recaddr); if (!writeAddr) return -1; result = record.Unpack(Buffer); if (!result) return -1; return writeAddr; } template <class RecType> int RecordFile<RecType>::Write (const RecType & record, int recaddr = -1) { int result; result = record . Pack (Buffer); if (!result) return -1; return BufferFile::Write (recaddr); } SNU-OOPSLA Lab
5.4 File Access and File Organization File Organization File Access Variable-length Records Sequential access Fixed-length records Direct access File Access and File Organization • There is difference between file access and file organization. • Variable-length records • Sequential access is suitable • Fixed-length records • Direct access and sequential access are possible SNU-OOPSLA Lab
5.5 Beyond Record Structures Abstract Data Model • Data object such as document, images, sound • e.g. color raster images, FITS image file • Abstract Data Model does not view data as it appears on a particular medium. ->application-oriented view • Headers and Self-describing files SNU-OOPSLA Lab
5.5 Beyond Record Structures Metadata • Data that describe the primary data in a file • A place to store metadata in a file is the header record • Standard format • FITS (Flexible Image Transport System) by International Astronomers’ union (see Figure 5.7) SNU-OOPSLA Lab
5.5 Beyond Record Structures Mixing object Types in a file • Each field is identified using “keyword = value” • Index table with tags • e.g. SNU-OOPSLA Lab
5.5 Beyond Record Structures Object-oriented file access • Separate translating to and from the physical format and application (representation-independent file access) Program find_star : read_image(“star1”, image) process image : end find_star image : star1 star2 RAM Disk SNU-OOPSLA Lab
5.5 Beyond Record Structures Extensibility • Advantage of using tags • Identify object within files is that we do not have to know a priori what all of the objects will look like • When we encounter new type of object, we implement method for reading and writing that object and add the method. SNU-OOPSLA Lab
5.6 Portability and Standardization Factor affecting Portability • Differences among operating system • Differences among language • Differences in machine architecture • Differences on platforms • EBCDIC and ASCII SNU-OOPSLA Lab
5.6 Portability and Standardization Achieving Portability (1) • Standardization • Standard physical record format • extensible, simple • Standard binary encoding for data elements • IEEE, XDR • File structure conversion • Number and text conversion SNU-OOPSLA Lab
5.6 Portability and Standardization Achieving Portability (2) • File system difference • Block size is 512 bytes on UNIX systems • Block size is 2880 bytes on non-UNIX systems • UNIX and Portability • UNIX support portability by being commonly available on a large number of platforms • UNIX provides a utility called dd • dd : facilitates data conversion SNU-OOPSLA Lab
Addition : Basic Files Basic Files:Sequential File(1) • Organized by appending records to the file in the order they arrive • First record id oldest, last record is newest • Records either fixed or variable length • Most useful for magnetic tape storage • Sometimes called, “sequential-chronological” file SNU-OOPSLA Lab
Addition : Basic Files SS# Name Age 3310 Jo 15 4906 Shin 27 4508 Kim 18 2412 Jang 33 2263 Lee 26 1701 Ko 27 3304 Nam 40 3401 Kim 31 5201 Han 25 2307 Ahn 13 Basic Files:Sequential File(2) • Sequential Files with fixed length records SNU-OOPSLA Lab
Addition : Basic Files Basic Files:Relative Files(1) • Records can be accessed directly by using record number 0, ..., N-1 as external keys • Must use random or pseudorandom access device such as disk • With fixed-length records, the file system can easily compute the sector address of a block containing a certain record when given the record number, especially when the file is stored contiguously(in successive sectors) SNU-OOPSLA Lab
Addition : Basic Files Age SS# Name 1 3310 Jo 15 2 4906 Shin 27 3 4508 Kim 18 4 2412 Jang 33 5 2263 Lee 26 6 1701 Ko 27 7 3304 Nam 40 8 3401 Kim 31 9 5201 Han 25 10 2307 Ahn 13 external key Basic Files:Relative Files(2) • Assume the file system marks all record positions(slots) as empty when the file is create, and enforce the requirement that no record can be read before it has been written Relative Files SNU-OOPSLA Lab
Addition : Basic Files SS# Name Age SS# Name Age 27 1 1701 Ko 1701 Ko 27 2 2263 Lee 26 2263 Lee 26 3 2307 Ahn 13 2307 Ahn 13 Ordered Relative File Ordered Sequential File 4 2412 Jang 33 2412 Jang 33 5 3304 Nam 40 3304 Nam 40 6 3310 Jo 15 15 3310 Jo 7 3401 Kim 31 3401 Kim 31 8 4508 Kim 18 4508 Kim 18 9 4906 Shin 27 27 4906 Shin 10 5201 Han 25 5201 Han 25 Basic Files : Ordered Sequential File & Ordered Relative File SNU-OOPSLA Lab
Let’s Review !!! 5.1 Record Access 5.2 More about Record Structures 5.3 Encapsulating Record I/O Ops in a Single Class 5.4 File Access and File Organization 5.5 Beyond Record Structures 5.6 Portability and Standardization SNU-OOPSLA Lab