320 likes | 491 Views
Files – Chapter 2. Basic File Processing Operations. Outline. Physical versus Logical Files Opening and Closing Files Reading, Writing and Seeking Special Characters in Files The Unix Directory Structure Physical Devices and Logical Files Unix File System Commands.
E N D
Files – Chapter 2 Basic File Processing Operations
Outline • Physical versus Logical Files • Opening and Closing Files • Reading, Writing and Seeking • Special Characters in Files • The Unix Directory Structure • Physical Devices and Logical Files • Unix File System Commands
Physical versus Logical Files • Physical File: A collection of bytes stored on a disk or tape. • Logical File: A “Channel” (like a telephone line) that hides the details of the file’s location and physical format to the program. • When a program wants to use a particular file, “data”, the operating system must find the physical file called “data” and make the hookup by assigning a logical file to it. This logical file has a logical name which is what is used inside the program.
Opening Files • Once we have a logical file identifier hooked up to a physical file or device, we need to declare what we intend to do with the file: • Open an existing file • Create a new file • That makes the file ready to use by the program • We are positioned at the beginning of the file and are ready to read or write.
Opening Files in UNIX/C • The UNIX system function open( ) is used to open an existing file or create a new file. fd = open(filename, flags, [pmode]); • fd: the file description -- the logical file name. The fdis an integer. If there is an error in the attempt to open the file, fd is negative (-1). • filename: the physical file name. The filename argument can be a pathname.
flags: an integer argument that controls the operation of the open function. The values of flag is set by performing a bitwise OR of the following values: • O_APPEND: Append every write operation to the end of the file. • O_CREAT: Create and open a file for writing. • O_EXCL: Return an error if O_CREAT opens an existing file. • O_RDONLY: Open a file for reading only. • O_RDWR: Open a file for reading and writing. • O_TRUNC: Truncate an existing file to a length of 0, destroying its contents. • O_WRONLY: Open a file for writing only. • and many others for synchronization.
Opening Files in UNIX/C (cont’d) • pmode: An integer argument to specify the protection mode. • If O_CREAT is specified, pmode is required. • In UNIX, the pmode is a three-digit octal that indicates how the file can be used by the owner (1st digit), by members of the owner’s group (2nd digit), and by everyone else (3rd digit). r: read permission, w: write permission, e: execute permission. pmode = 751 = r w e r w e r w e 1 1 1 1 0 1 0 0 1 owner group world • File protection is tied more to the operating system than to a specific language.
Examples: fd = open(filename, O_RDWR | O_CREAT, 0751); fd = open(filename, O_RDWR | O_CREAT | O_TRUNC, 0751); fd = open(filename, O_RDWR | O_CREAT | O_EXCL, 0751);
Closing Files • Makes the logical file name available for another physical file (it’s like hanging up the telephone after a call). • Ensures that everything has been written to the file [since data is written to a buffer prior to the file]. • Files are usually closed automatically by the operating system (unless the program is abnormally interrupted).
Reading • Read(Source_file, Destination_addr, Size) • Source_file = location the program reads from, i.e., its logical file name • Destination_addr = first address of the memory block where we want to store the data. • Size = how much information is being brought in from the file (byte count).
Writing • Write(Destination_file, Source_addr, Size) • Destination_file = the logical file name where the data will be written. • Source_addr = first address of the memory block where the data to be written is stored. • Size = the number of bytes to be written.
A program does not necessarily have to read through a file sequentially: It can jump to specific locations in the file or to the end of file so as to append to it. • The action of moving directly to a certain position in a file is often called seeking. • Seek(Source_file, Offset) • Source_file = the logical file name in which the seek will occur • Offset = the number of positions in the file the pointer is to be moved from the start of the file.
The seek function in UNIX/C: lseek( ) pos = lseek(fd, byte_offset, origin) • pos: a long integer value returned by lseek( ) equal to the number of bytes from the beginning to the file pointer after it has been moved. • fd: the file descriptor. • byte_offset: the number of bytes to move from some origin in the file. The byte_offset is a long integer and can be a negative value. • origin: a value that specifies the starting position from which the byte_offset is to be taken. The values of origin: • SEEK_SET: lseek( ) from the beginning of the file; • SEEK_CUR: lseek( ) from the current position; • SEEK_END: lseek( ) from the end of the file.
C/C++ streams • In C/C++, a file (and other devices like keyboard) is a stream of data. • There are two sets of I/O operations. • C streams in stdio.h • C++ stream classes in iostream.h and fstream.h • Comparison between UNIX/C operations and C/C++ streams • both support a complete set of file operations • UNIX/C • Available mostly on UNIX, (also in Microsoft Visual C++) • Fast • Low level • C/C++ Streams • Standard C/C++ features, available on almost all operating systems • Provide structured I/O
C Streams • Three standard streams: stdin, stdout, and stderr. • Opening file fopen(const char *filename, const char *mode) • Closing file fclose(FILE *fp) • Reading file fread(void *buf, size_t size, size_t num, FILE *fp) //read num items of size bytes into buf from fp fgetc(FILE *fp) // return the next character from fp fgets(char *buf, int size, FILE *fp) // read a line or up to size bytesinto buf from fp fscanf(FILE *fp, const char *format, …) // read and format data from fp
C Streams (Cont.) • Writing file fwrite(const void *buf, size_t size, size_t num, FILE *fp) //write num items of size bytes from buf to fp fputc(int ch, FILE *fp) //write the character ch to fp fputs(const char *buf, FILE *fp) // write the string in buf to fp fprintf(FILE *fp, const char *format, …) // write formatted data to fp • Seeking file fseek(FILE *fp, long offset, int origin)
C++ handles file I/O by creating objects of the stream classes. • Standard stream objects: cin, cout, cerr, clog • Stream classes: in file iostream.h: ios, istream, ostream, iostream, in file fstream.h: ifstream, ofstream, fstream ios istream ostream ifstream iostream ofstream fstream
Opening file constructor member function open • Closing file destructor member function close • Reading file overloaded extracting operator << many others: read, get, getline • Writing file overloaded inserting operator >> many others: write, put • Seeking file seekg: set the read/get pointer seekp: set the write/put pointer
The LIST Program • A simple file processing program: LIST • Display a prompt for the name of the input file. • Read the user’s response from the keyboard into a variable called filename. • Open the file for input. • While there are still characters to be read from the input file, • read a character from the file and, • write the character to the terminal screen. • Close the input file.
/* read characters from a file and write them to the terminal screen */ #include <stdio.h> #include <fcntl.h> main( ) { char c; int fd; /* file descriptor */ char filename[20]; printf(“Enter the name of the file: “); /* step 1 */ gets(filename); /* step 2 */ fd = open(filename, O_RDONLY); /* step 3 */ while (read(fd, &c, 1) != 0) /* step 4a */ putchar(c); /* write(stdout, &c, 1); does not work step 4b */ close(fd); /* step 5 */ }
// listc.cpp // program using C streams to read characters from a file // and write them to the terminal screen #include <stdio.h> main( ) { char ch; FILE * file; // file descriptor char filename[20]; printf("Enter the name of the file: "); // Step 1 gets(filename); // Step 2 file =fopen(filename, "r"); // Step 3 while (fread(&ch, 1, 1, file) != 0) // Step 4a fwrite(&ch, 1, 1, stdout); // Step 4b fclose(file); // Step 5 }
// listcpp.cpp DO THIS ONE... // list contents of file using C++ stream classes #include <fstream.h> void main () { char ch; fstream file; // declare fstream unattached char filename[20]; cout <<"Enter the name of the file: " // Step 1 <<flush; // force output cin >> filename; // Step 2 file.open(filename, ios::in); // Step 3 file.unsetf (ios::skipws); // include white space in read while (1) { file >> ch; // Step 4a if (file.fail()) break; cout << ch; // Step 4b } file.close(); // Step 5 }
Detecting End-of-File • In UNIX/C • read returns 0 • Using C streams • fread returns -1 • feof returns true • Using C++ stream classes • fail returns true • eof returns true
Special Characters in Files I • Sometimes, the operting system attempts to make “regular” user’s life easier by automatically adding or deleting characters for them. • These modifications, however, make the life of programmers building sophisticated file structures (YOU) more complicated!
Special Characters in Files II: Examples • Control-Z is added at the end of all files (MS-DOS). This is to signal an end-of-file. • <Carriage-Return> + <Line-Feed> are added to the end of each line (again, MS-DOS). • <Carriage-Return> is removed and replaced by a character count on each line of text (VMS)
The Unix Directory Structure I • In any computer systems, there are many files (100’s or 1000’s). These files need to be organized using some method. In Unix, this is called the File System. • The Unix File System is a tree-structured organization of directories. With the root of the tree represented by the character “/”. • Each directory can contain regular files or other directories. • The file name stored in a Unix directory corresponds to its physical name.
The Unix Directory Structure II • Any file can be uniquely identified by giving it its absolute pathname. E.g., /usr6/mydir/addr. (see the next slide) • The directory you are in is called your current directory. • You can refer to a file by the path relative to the current directory. • “.” stands for the current directory and “..” stands for the parent directory.
Physical Devices and Logical Files • Unix has a very general view of what a file is: it corresponds to a sequence of bytes with no worries about where the bytes are stored or where they come from. • Magnetic disks or tapes can be thought of as files and so can the keyboard and the console. • No matter what the physical form of a Unix file (real file or device), it is represented in the same way in Unix: by an integer.
Stdout, Stdin, Stderr • Stdout --> Console • fwrite(&ch, 1, 1, stdout); • Stdin --> Keyboard • fread(&ch, 1, 1, stdin); • Stderr --> Standard Error (again, Console) • [When the compiler detects an error, the error message is written in this file]
I/O Redirection and Pipes • < filename [redirect stdin to “filename”] • > filename [redirect stdout to “filename”] • E.g., a.out < my-input > my-output • program1 | program2 [take any stdout output from program1 and use it in place of any stdin input to program2. • E.g., list | sort
Unix System Commands • cat filenames --> Print the content of the named textfiles. • tail filename --> Print the last 10 lines of the text file. • cp file1 file2 --> Copy file1 to file2. • mv file1 file2 --> Move (rename) file1 to file2. • rm filenames --> Remove (delete) the named files. • chmod mode filename --> Change the protection mode on the named file. • ls --> List the contents of the directory. • mkdir name --> Create a directory with the given name. • rmdir name --> Remove the named directory.