2.6k likes | 2.61k Views
File structure & Data processing. Syllabus
E N D
File structure & Data processing. Syllabus UNIT I. Introduction : File structure design, File processing operations: open, close, read, write, seek. Unix directory structure.Secondary storage devices: disks, tapes, CD-ROM. Buffer management. I/O in Unix. • UNIT II. File Structure Concepts : Field & record organization, Using • classes to manipulate buffers, Record access, Record structures. File access & file organization. Abstract data models for file access. Metadata. Extensibility, Portability & • standardization. • UNIT III. Data Compression, Reclaiming spaces in files, Introduction to internal sorting and Binary searching. Keysorting. Indexing concepts. Object I/O. Multiple keys indexing., Inverted lists,Selectiveindexes, Binding. Collected By CJS
UNIT IV. Cosequential processing : Object-Oriented model, its application. Internal sorting : a second look. File Merging :Sorting of large files on disks. Sorting files on tapes. Sortmerge packages. Sorting and Cosequential processing inUnix. UNIT V. Multilevel indexing : Indexing using Binary Search trees. OOP based B-trees. B-tree methods Search, Insert and others.Deletion, merging & redistribution. B*trees. Virtual B-trees. VL records & keys. Indexed sequential file access and Prefix B+trees. UNIT VI. Hashing : Introduction, a simple hashing algorithm. Hashing functions and record distributions. Collision resolution. Buckets. Making deletions. Pattern of record access. External hashing. Implementation. Deletion. Performance. Alternative approaches. Collected By CJS
Textbook : Michael J.Folk, Bill Zoellick, Greg Riccard :File Structures : An Object-Oriented Approach using C++. (Addison-Wesley) (LPE) References : 1. M.Loomis Data Management & File Processing (PHI) 2. O.Hanson Design of Computer Data Files McGraw-Hill (IE) Collected By CJS
File Structure Design • A file structure is combination of representation of data in file and of operation for accessing the data.& allow read,write and modify data. • Good file structure design give us access to all capacity without spending lot of time waiting for disk. Collected By CJS
Goals of Research and development in File Structure. • To get the information with one access to the disk. • If it is impossible to get info in one access,we need such structure that find target info with few accesses .ex.binary search among 55 thousands records …with 16 comparisons. • F.S with group info so we get all in one trip.Ex client name,address,ph.no& acc.bal Collected By CJS
Making F.S usable in applicationusing c++ Class person { public: // data member char lastname[11],firstname[11]; char address[16],city[16],state[10]; //method person(); }; Person::person() { Lastname[0]=0;firstname[0]=0;address[0]=0;city[0]=0; state[0]=0;} Collected By CJS
Physical file &logical file Physical file Logical file 1 The file are seen by program. 2 The use of logical files allows a program to describe the operations to be performed on a file without knowing what physical file will be used. 1 A file that actually present On secondary storage. 2 It is the file as known by the computer operating system. Collected By CJS
Ex: association between a logical file called inp-file and physical file myfile.doc. • Select inp-file assign to myfile.doc This statement ask the operating system to find the physical file myfile and then to make the hookup by assigning a logical file(phone line) to it. Collected By CJS
Opening Files • Once we have a logical file identifier hooked up to physical file or device. • We declare what to do: 1.Open existing file 2. create new file. Collected By CJS
Opening Files • Open an existing file • Create new file Ex. Fd = open (filename,flags,[pmode]); Fd.- int - The file descriptor.if error this value is -ve. Filename-character string Collected By CJS
Flags: Flags is set by bit-wise OR.(I) • O_Append-Append every write op to the end of the file. • O_CREAT- create &open a file for writing. this has no effect if the file already exist. • O_RDONLY- open file for reading only. • O_RDWR- open file for reading& writing. • O_WRONLY- open file for writing. Collected By CJS
Pmode- int- if O_create is specified ,pmode is required. protection mode for file. In unixpmode is three digits. r w e EX.pmode=0751= 111 101 001 owner group world R=read,w=write,e=execute Ex.fd=open(filename,o_RDWR I o_create,0751) Collected By CJS
Closing files • In terms of telephone line analogy closing file is like hanging up the phone. • When we hang up the phone the phone line is available for taking or placing another call. • When u close a file the logical file name or file descriptor is available for use with another file. Collected By CJS
Closing files • Files are closed automatically by the operating system when program terminates normally. • Execution of a close statement within a program is needed only to protect it against data lose. • Closing file ensures that the buffer for that file has been flushed of data. • Everything written has been sent to the file. Collected By CJS
Reading and writing • Input/output operation. • Read write statements used in diff lang varies. • Ex.Read(s_file,D_Addr,size); • Read (sourcefile,destinationfile, size) source_file-: : where it is to read from. The read call must know from where it is to read from . We must have already opened the file so the connection between a logical file and physical file(device)exists. Collected By CJS
Reading and writing Destination_addr-:: where to place the information.by giving first address of the memory block where we want to store. Size-:: finaly How much info to bring in the file. Here argument is supplied as byte count. Collected By CJS
Write Functions • Write(D_file,source_addr,Size) Destination_file-::logical file name is used for sending the data source_Addr-:: write must know where to find the info it will send. Size-:: the no of bytes to be written Collected By CJS
Files with c & c++ • Two different ways 1.stdio.h 2.iostream.h &fstream.h • file=fopen(filename,type); File * --- A pointer to the file descriptor. Filename — char * ---- the file name type --- char* ----control the operation “r”—open an existing file for input. ”w”---create a new file . ”a”—create n or append Collected By CJS
Programs to display the contents of file. • Program steps 1 Display for the name of input file. 2 Read the users response from keybord. 3 open the file for input. 4 close the input file. Collected By CJS
#include<iostream.h> • #include<fstream.h> main() { char ch; fstream file; char filename[20]; cout<<“enter file name”; Cin>>filename; File.open(filename,ios::in); While(1) { file>>ch; If(file.fail()) break; Cout<<ch;} file.close();} Collected By CJS
Detecting End of file • If(file.fail()) break; Function fail which return true (1) if previous operation failed. Collected By CJS
Seeking • We read through the file sequentially ,reading one byte after another until we reach the end of file .every time a byte is read the operating system moves the read/write pointer head . • If we need ten thousand bytes away so we want to jump there. • Action of moving directly to certain position in file is often called seeking. Collected By CJS
Seeking • Seek(source_file,offset) source_file – l.file name in which the seek will occur. Offset- The number of position in the file the ptr is to move. Seeking with c streams Pos=fseek(file,byte_offset,origin) Collected By CJS
Pos---A long integer value • File—the file descriptor • Byte_offset - the no of bytes to move from in the file. Origin-- 0:- beginning of the file 1:- current position 2:- from the end of file. Collected By CJS
Seeking with c++ stream classes • File.seekg(byte_offset,origin) Ios::beg ex:file.seekg(373,ios::beg) Ios::cur Ios::end Collected By CJS
Unix file system commands • Cat filename… print the contents of the named text files. • Tail filename….print the last ten lines of the text file. • Cp file1 file2…. Copy file1 to file2 • Mv file1 file2…move(rename)file1 to file2. • Chmodmodefilename …change the protection mode on the named files. Collected By CJS
Ls…..list of contents of the directory • Mkdir name….. Create a directory with given name. Rmdir name……..remove the named directory. Collected By CJS
Disks • Disks drive belong to a class of devices known as direct access storage devices.(DASDs). because they make it possible to access data directly. • Magnetic tape permit only serial access • Hard disks r the most common disk used in everyday file processing . • Floppy disks r inexpensive but they r slow & hold very small data. • Floppy good for backup of single files. Collected By CJS
Organisation of disks • The info r stored on disk is stored on the surface of one or more platters. • Info r stored in successive tracks on surface of the disk. • Each track is divided into number of sectors. • O.s find the correct surface,track and sector read the entire sector into buffer & the find the requested byte within that buffer. Collected By CJS
Organisation of disks • Disk drives typically have a number of platters. • Tracks that are directly above and below one another form a cylinder. Collected By CJS
Estimating capacities and space Needs • Track capacity= number of sectors per track*bytes per sector • Cylinder capacity= number of tracks per cylinder*track capacity • Drive capacity=number of cylinder*cylinder capacity Collected By CJS
Organizing track by sector • There are two basics ways to organize data on disk. • 1.by sector • 2.by user define block. Collected By CJS
Organizing track by sector • The physical placement of sectors. • The most practical logical organization of sectors on a track is that sectors are adjacent, fixed-sized segments of a track that happens to hold a file. • Physically, is not optimal: after reading the data, it takes the disk controller some time to process the received information before it is ready to accept more. • Consequently we would be able to read only one sector per revolution. • Traditional Solution: Interleave the sectors Collected By CJS
Interleave the sectors • They leave an interval of several physical sectors. • Suppose our disk had an interleaving factor of 5. • Its take five revolutions to read the entire thirty-two sectors. • That is big improvement over 32 revolutions. Collected By CJS
Organizing track by sector • Clusters The file can also be viewed as a series of clusters of sectors which represent a fixed number of contiguous sectors. Once a cluster has been found on a disk, all sectors in that cluster can be accessed without requiring an additional seek The File Allocation Table(FAT)ties logical sectors to the physical clusters they belong to. Collected By CJS
Extents • Lot of free room on disk we may be possible to make file consist entirely of contiguous clusters.we say file consists of one extents. • If there is no enough space avail. To contain entire file is divided & each part is an extents. • Imp thing @ extents is that a no of extents in a file increases the file spread more on the disk & amt of seeking increases Collected By CJS
Fragmentation • G. all sectors have same no of bytes. • Ex size of sector is 512bytes & size of all records in file is 300bytes. • 2 ways…1)store only one record per sector. 2)allow records to span sectors so beginning in one record & end in another. Collected By CJS
Fragmentation • The first option has the advantage that any record can be retrieved by retrieving just one sector . • but it has the disadvantage that it might leave an enormous amount of unused space within each sector. Collected By CJS
Advantages & Disadvantages Advantages &disadvantages First option has advantage that any record can be retrieved by just one sector. Leave enormous amt of space within each sector this loss of space within a sector is called internal fragmentation. Second option ..no loss of space but accesing more sector for onerecord. Collected By CJS
Use of cluster- when the no of bytes in a file is not an exact multiple of cluster size there will be internal fragmentation. • Large cluster for large file • Small cluster for small file Collected By CJS
Organisation tracks by block • Sometimes disk tracks are divided into integral no of user define blocks. • Data transfer in single i/o op vary depend on needs of software designer. • Blocks can normally be either fixed or variable in length depending on req.of the file designer. • Block organisation does not present the fragmentation problems becoz blocks can vary in size. Collected By CJS
Organisation tracks by block • Blocking factor indicate no of records that r Stored in each block in a file. • Suppose a file with 300 byte records,this method define block of multiple of 300bytes • No space lost in internal fragmentation. • Each block contains one or more sub.blocks. Containing extra info. • count block—no of bytes in data block. • Key subblock- key for last record in DB. Collected By CJS
Nondata overhead • Both blocks and sectors require that a certain amount of space be taken up on the disk in the form of non-data overhead. • 1 on sector-addressable disks • 2 on block organized disk. Collected By CJS
on sector-addressable disks • Preformating involves storing at the beginning of each sector,information such as sector address,trackaddress,and condition(whether the sector is usable or defective.) • on block organized disk:- some of the nondata overhead subblockinterblock gaps have to be provided with every block . G more nondata info provided with blocks than with sectors . Collected By CJS
The cost of disk access • Factors contributing toal amount of time needed to access disk. • 1)seek time • 2)Rotational delay • 3)Transfer time Collected By CJS
SEEK TIME • Seek time is the time required to move the access arm to the correct cylinder. • Depends on how far the arm has to move. • Costly in multiuser than Single user where disk usage dedicated to one process. • Its usually impossible to know exactly how many tracks will be traversed every seek. • So we go for avg.seek time. • Today’s harddiskavg seek time is less than 10 miliseconds. Collected By CJS
Rotational delay • Time takes for disk to rotate so sector we want is under read/write head. • Harddisk rotate at- 5000rpm……7200rpm • Floppy disk - 360rpm. • Suppose that you have a file that requires two or more tracks ,that there are plenty of available tracks on one cylinder and that you write the file to disk sequentially with one write call. • When first track is filled the disk can immediately begin writing to the second track without any rotational delay. Collected By CJS