230 likes | 453 Views
File Processing. BASIC CONCEPTS & TERMINOLOGY. File Processing. What is it? Why Learn It?. Basic Terms. File, Record, Field File Collection of (homogenous) records Record element of a file (tuple) contains information about a single entity
E N D
File Processing BASIC CONCEPTS & TERMINOLOGY
File Processing • What is it? • Why Learn It?
Basic Terms • File, Record, Field • File • Collection of (homogenous) records • Record • element of a file (tuple) • contains information about a single entity • Consists of fields of information about entity
Basic Terms • Field • attribute of an entity • May be of many "types" • string - “254 Niagra Street” • numeric value - 345 • logical value - True or False • structure (e.g. date) - 12/5/96
Files, Records, and Fields Field File Record
Example files: • consider records for: • real estate listings • student files • Books in library
Records • Record type • record definition or template • Record instance • a particular record with specific data. ID FIRST LAST BOX 3241 Alice Smith 23-546 3755 William Butler 62-819
File access and file organization • file access - technique used to store and retrieve particular records in a file. • file organization - characteristics of how a file is structured or actually stored on the disk.
File access and file organization • Example: • Library (books are like records) • what is the file access? • what is the file organization? • file cabinet • what is the file access? • what is the file organization?
File organizations • Two major underlying file organizations • Direct files • works like an array of records in memory • user selects records directly by its position in the file. • Sequential files • must be accessed in the order the records appear in the file. • "tape" metaphore.
File access techniques • Two major underlying file access techniques • Direct access • can be used with direct files to randomly select records. • cannot be used with sequential file organization. • Sequential access • can be used with either direct or sequential organizations.
File organization & access techniques • These techniques are basic, since directly supported by hardware. • All other (more complex) techniques built using direct & sequential. • Tree • Hashed • Indexed
Primary vs. Secondary storage • Primary • Semiconductor memory • fast, but small, temporary, expensive • memory, registers • Secondary • Disks, Tapes • slow, but large, permanent, inexpensive • orders of magnitude slower then memory
Motivation for File Structures • Primary Memory (RAM) • Small - 106 bytes to 108 bytes (1 - 100 megabytes) • Fast - 120* 10-9 seconds (120 nanoseconds) access time • Secondary Memory • Large - 109 bytes to 1010 bytes (1 - 10 gigabytes) • Slow - 3*10-3 seconds (30 millisecond) access time
Motivation for File Structures • Consider a comparison with a bookself verses a library: • Bookshelf holds 100 books • Access time is 20 seconds • How many books does library hold? • How long does it take to get to library?
Primary vs. Secondary storage • Tradeoffs - where should files and programs be kept? • primary - limited in size, loss occurs if power fails • secondary - much slower but nonvolatile
Primary vs. Secondary storage • Balance • Consider the top of your desk vs. a file cabinet • keep what is likely to be needed soon in primary memory • keep in secondary memory information which hasn't been accessed recently, and isn't like to be needed soon. • Principles of temporal and spatial locality.
Virtual Memory • some operating systems have built-in support to balance between primary and secondary storage. • They allow programs to be much larger then memory, and automatically "shuttle" information as needed between primary and secondary storage. • Keep recently accessed areas of code and data in primary memory • off-load segments not used for a while to secondary storage (or never load in the first place!) • Thus memory appears to be bigger then it actually is
C++ Objects and File Processing • One of our goals is to use C++ objects to represent our file components. • Consider the following C++ class: class Person { public: // data members char LastName[20], FirstName[20], Address[16] char City[16], State[3], ZipCode[10]; // methods Person (); //Default constructor Person (const char *); // Construct from string Person (const Person&); // Copy constructor };
C++ Objects and File Processing • “public” members are accessible by users of the object • “private” members are accessible from within the object only. Person p; // Automatic creation Person * p1_ptr = new person; // dynamic creation Person * p2_ptr = new person(“Smith”); cout << p.Lastname << “,“ << p.FirstName << endl; p_ptr->FirstName = p.FirstName; Person p2(p);
C++ Objects and File Processing • Member functions Person::Person () { // set each field to an empty string LastName[0]= 0; FirstName[0]= 0; Address[0]= 0; City[0]= 0; State[0]= 0; ZipCode[0]= 0; } Person::Person (const char *LN) { // Make a Person with the Last name set strcpy(LastName,LN); FirstName[0]=0; Address[0]=0; City[0]=0; State[0]=0; ZipCode[0]=0; }
C++ Objects and File Processing • Member functions Person::Person (const Person& p); // copy constuctor { // Make a Person with the Last name set strcpy(LastName,p.LastName); strcpy(FirstName ,p.FirstName; strcpy(Address,p.Address; strcpy(City,p.City; strcpy(State,p.State; strcpy(ZipCode,p.ZipCode; }