1 / 41

By: m_el_ramly@yahoo Presenter: Dr. Mohamamd El- Ramly Many slides by Others

Cairo University FCI. 2014. File Organization & processing. By: m_el_ramly@yahoo.ca Presenter: Dr. Mohamamd El- Ramly Many slides by Others. CS 215. Lecture 11 Rest of Ch4 & Ch5. Using Classes to Manipulate Buffers.

salinasm
Download Presentation

By: m_el_ramly@yahoo Presenter: Dr. Mohamamd El- Ramly Many slides by Others

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cairo University FCI 2014 File Organization & processing By: m_el_ramly@yahoo.ca Presenter: Dr. Mohamamd El-Ramly Many slides by Others CS 215 Lecture 11 Rest of Ch4 & Ch5

  2. Using Classes to Manipulate Buffers • Examples of three C++ classes to encapsulate operation of buffer object • Function : Pack, Unpack, Read, Write • Output: pack into a buffer & write a buffer to a file • Input: read into a buffer from a file & unpack a buffer • ‘pack and unpack’ deals with only one field • DelimTextBufferclass for delimited fields • LengthTextBufferclass for length-based fields • FixedTextBuffer class for fixed-length fields • Appendix E : Full implementation (Buggy)

  3. Buffer Class for Delimited Text Fields(1) • Class DelimTextBuffer • { public: • DelimTextBuffer (char Delim = ‘|’, int maxBtytes = 1000); • int Read(istream & file); • int Write (ostream & file) const; • int Pack(const char * str, int size = -1); • int Unpack(char * str); • private: • char Delim; // delimiter character • char * Buffer; // character array to hold field values • int BufferSize; // current size of packed fields • int MaxBytes; // maximum # of chars in the buffer • int NextByte; // packing/unpacking position in buffer • }; • Variable-length buffer • Fields are represented as delimited text

  4. Buffer Class for Delimited Text Fields(2) int DelimTextBuffer::Unpack(char *str) { start = nextByte from start to buffer end search for delimter if not found return if found read from start till delimeter into str update nextByte if more data return true else return false } Unpack() extracts one field from a record in a buffer.

  5. Buffer Class for Delimited Text Fields(3) int DelimTextBuffer::Unpack(char *str) // extract the value of the next field of the buffer { int len = -1; // length of packed string int start = NextByte; // first character to be unpacked for(int i = start; i < BufferSize; i++) if(Buffer[i] == Delim) {len = i-start; break; } if(len == -1) return FALSE; // delimiter not found NextByte += len + 1; if(NextByte > BufferSize) return FALSE; strncpy (str, &Buffer[start], len); str[len] = 0; // zero termination for string return TRUE; } Unpack() extracts one field from a record in a buffer.

  6. Buffer Class for Delimited Text Fields(4) int DelimTextBuffer::Pack(char * str, int size) { If string is too short return If string will overflow buffer return Else write string in buffer from nextByte Add delimiter Update nextByte Return True } Pack() copies the characters of its argument to the buffer and then adds the delimiter characters.

  7. Buffer Class for Delimited Text Fields(5) int DelimTextBuffer :: Pack (char * str, int size) // set the value of the next field of the buffer; // if size = -1 (default) use strlen(str) as Delim of field { short len; // length of string to be packed if (size >= 0) len = size; else len = strlen (str); //C-string len fn: # chars to \0 if (len > strlen(str)) // str is too short! return FALSE; int start = NextByte; // first character to be packed NextByte += len + 1; if (NextByte > MaxBytes) return FALSE; memcpy (&Buffer[start], str, len); Buffer [start+len] = Delim; // add delimeter BufferSize = NextByte; return TRUE; } • Pack() copies the characters of its argument to the buffer and then adds the delimiter characters.

  8. Buffer Class for Delimited Text Fields (6) • Read method of DelimTextBuffer • Clears the current buffer contents • Extracts the record size • Read the proper number of bytes into buffer • Set the buffer size int DelimTextBuffer::Read(istream & stream) { Clear(); stream.read((char *)&BufferSize, sizeof(BufferSize)); if (Stream.fail()) return FALSE; if (BubberSize > MaxBytes) return FALSE; stream.read(Buffer, BufferSize); return stream.good(); }

  9. Buffer Class for Delimited Text Fields (7) • Write method of DelimTextBuffer • Write size data • Write buffer content int DelimTextBuffer :: Write (ostream & stream){ stream . write ((char*)&BufferSize, sizeof(BufferSize)); stream . write (Buffer, BufferSize); return stream . good (); }

  10. CS215: File Structure and ProcessingChapter 5 Managing Files of Records

  11. Chapter Objectives • Extend the file structure concepts of Chapter 4: • Search keys and canonical forms • Sequential search and Direct access • Files access and file organization • Examine other kinds of the file structures in terms of • Abstract data models • Metadata • Object-oriented file access • Extensibility • Examine issues of portability and standardization.

  12. Record Access • Record Key • Canonical form : a standard form of a key • e.g. Ames or ames or AMES (need conversion) • Distinct keys : uniquely identify asingle record • Primary keys, Secondary keys, Candidate keys • Primary keys should be dataless (not updatable) • Primary keys should be unchanging • Social-securiy-number: good primary key • but, 999-99-9999 for all non-registered aliens • Measurement of work: • Comparisons: occur in main memory • Disk accesses: main bottleneck

  13. Sequential Search Sequential search is least efficient. Our main pursuit for the duration of the term is to present improved search methods • O(n), n : the number of records • Use record blocking to reduce work • A block of several records • fields < records < blocks • O(n), but blocking decreases the number of seek • sequential within each block • e.g.- 4000 records, 512 bytes each, sector size 512 bytes • Unblocked (sector-sized buffers): 512 (½K buffer) • => average 2000 READ() calls • Blocked (16 recs / block) : 8K size buffer => average 125 READ() calls • Can further improve upon performance by using block key containing last record key to avoid searching within blocks where data can’t be

  14. Sequential Search: Best Uses • When is Sequential Search Superior? • Repetitive hits • Searching for patterns in ASCII files • Searching records with a certainsecondary key value • Small Search Set • Processing files with few records • Devices/media most hospitable to sequential access • tape

  15. Direct Access • Access a record without searching • O(1) operation • RRN ( Relative Record Number ) • Gives relative position of the record • O(n) process with variable-length records • Easy with fixed-length records: RRN*sizeof(record) • View file as collection of records, not bytes; all byte info is internal • Byte offset = N X R • r : record size • n : RRN value

  16. Direct Access • Class IOBuffer includes • direct read (DRead) • direct write (DWrite) • take byte offset as argument, along with stream

  17. OHIO 10847115 7264.9 4133035 3 1180317COLUMBUS OHIO|10847115|7|264.9|41330|35|3|1|1803|17|COLUMBUS\0....\0 Choosing Record Length and Structure • Record length is related to the size of the fields • Access vs. fragmentaion vs. implementation • Fixed length record • fixed-length fields • variable-length fields • Unused space portion is filled with null character in C • e.g. delimited

  18. Header Records • File as a Self-Describing Object • General information about file • date and time of recent update, • number of records • size of record, fields (fixed-length record & field) • delimiter (variable-length field) • Often placed at the beginning of the file

  19. Abstract base class for file buffers class IOBuffer public : virtual int Read( istream & ) = 0; // read a buffer from the stream virtual int Write( ostream &) const = 0; // write a buffer to the stream // these are the direct access read and write operations virtual int DRead( istream &, int recref ); //read specified record virtual int DWrite( ostream &, int recref ) const; // write specified record // these header operations return the size of the header virtual int ReadHeader ( istream & ); virtual int WriteHeader ( ostream &) const; protected : int Initialized ; // TRUE if buffer is initialized char *Buffer; // character array to hold field values IO Buffer Class definition

  20. IO Buffer Class definition Full definition of buffer class hierarchy • WriteHeader method : • writes the header string at the beginning of the file. Possible strings: • “Variable” • “Fixed” • Returns size of header written • ReadHeader method : • reads the header id string. Must be the expected record type, variable or fixed length • If the string matches that subclass’ header string, returns size of header • any other string causes return of –1  header doesn’t match buffer

  21. IO Buffer Class definition Full definition of buffer class hierarchy • DWrite/DRead methods : • operates using the byte address of the record as the record reference. Methods begin by seeking to the requested spot.

  22. File Organization File Access Variable-length Records Sequential access Fixed-length records Direct access File Access and File Organization • There is difference between file access and file organization. • Variable-length records • Sequential access is suitable • Fixed-length records • Direct access and sequential access are possible

  23. Abstract Data Model • Data object such as document, images, sound • e.g. images, sound • Abstract Data Model does not view data as it appears on a particular medium. • application-oriented view • application shielded from details of storage on medium • How to specify a file’s content? • Headers and Self-describing files • e.g. images: jpg: ÿØÿà JFIF gif: GIF89a • e.g. sounds: mp3: ÿûD EQ¹à wav: RIFF$P WAVEfmt

  24. Example: GIF • Graphics Interchange Format • Industry standard graphic format for on-screen viewing through the Internet and Web. Not meant to be used for printing. • The best format for all images except scanned photographic images (use JPEG for these). • GIF supports lossless compression.

  25. http://faculty.kutztown.edu/spiegel/CSc402/demo/States/DelimText/http://faculty.kutztown.edu/spiegel/CSc402/demo/States/DelimText/

  26. Metadata • Data that describe the primary data in a file • e.g. <Meta> in html • Store in the header record • Standard format • As shown on next slide

  27. Html: Metadata

  28. Metadata <!DOCTYPE html> <html> <head> <meta name="description" content="My Web tutorials"> <meta name="keywords" content="HTML,CSS,XML,JavaScript"> <meta name="author" content="Hege Refsnes"> <meta charset="UTF-8"> </head> <body> <p>All meta information goes in the head section...</p> </body> </html>

  29. Mixing object Types in a file • Each field is identified using “keyword = value” • Index table with tags • e.g.

  30. Object-oriented file access • Separate translating to and from the physical format and application (representation-independent file access) • provide a function to handle access (OO style) • encapsulate details • read_image() is image file type independent; method determines file type Program find_star : read_image(“star1”, image) process image : end find_star image : star1 star2 RAM Disk

  31. Extensibility • Advantage of using tags • Identify object within files • do not require a priori knowledge of the types of objects • New type of object • implement method for reading and writing in appropriate module (separate concerns) • call the method.

  32. Factor affecting Portability • Differences among operating systems • e.g. CR/LF in DOS • Differences among languages • physical layout of files may be constrained by language limitation • Differences in machine architectures • byte order: e.g. Unix: hton, ntoh • Differences on platforms • e.g. EBCDIC vs. ASCII

  33. Achieving Portability • Standardization • Standard physical record format • extensible, simple • Standard binary encoding for data elements • IEEE, XDR • File structure conversion • Number and text conversion • Established, well-known methods of conversion

  34. Achieving Portability • File system difference • Block size is 512 bytes on UNIX systems • Block size is 2880 bytes on many non-UNIX systems • UNIX and Portability • UNIX support portability by being commonly available on a large number of platforms • UNIX provides a utility called dd • dd : facilitates data conversion

More Related