470 likes | 636 Views
HKUST Summer Programming Course 2008. C API ~ Interfacing C programming. Overview. Introduction Streams Output Functions Input Functions File-Related Functions Memory Management Functions Other aspects of C Programming. C API. Introduction. Motivation.
E N D
HKUST SummerProgramming Course 2008 C API ~ Interfacing C programming
Overview • Introduction • Streams • Output Functions • Input Functions • File-Related Functions • Memory Management Functions • Other aspects of C Programming
C API Introduction
Motivation • There is quite a lot of functionalities that the system offers to the programmers, as examples: • Creating and destroying processes. • Reading/Writing files. • Opening network connection. • …. • Historically, these functionalities were implemented as C functions, some library authors make C++ wrappers, but many programmer still like to use C versions. • C functions maybe faster then C++ functions. • There are some overhead in the wrappers.
C functions to cover in this lecture • Formatted I/O • Output: printf, fprintf, sprintf, snprintf • Input: scanf, fscanf, sscanf • File I/O • fopen, feof, fgetc, fputc, fread, fwrite, fclose, fseek, rewind, fflush • Memory Allocation • malloc, realloc, calloc, free, memcpy, memset
References • ALWAYS look up manual pages before using the function for the first time. • Important: Learn how to read manual page • Most material in this website is extracted from: • http://www.cplusplus.com/ref/ • If there is anything unclear, you can always reference the aforementioned website.
C API Streams
Streams in C attach Stream1 File I/O Code Other Code file1 attach Stream2 file2 attach …………… StreamM Our C Program fileN Screen and keyboard are emulated as files in OS OS
Streams in C • There are three standard I/O streams. • stdin – standard input (ie. keyboard) (C++: cin) • stdout – standard output (ie. screen) (C++: cout) • stderr – standard error (ie. screen) (C++: cerr) • You have been using stdout whenever you use cout! • You can use cerr << "ERROR" << endl; to print to standard error stream. • I/O operations are performed in C using streams. • Files are also accessed through streams, these streams can be created or destroyed whenever necessary. • In C, streams are identified by a number, called the File Descriptor. • Eg. printf output to stdout, while fprintf can be used to output to any streams. (We will cover fprintf function soon).
Output Streams in C • Instead of standard output (stdout), there is another way for us to output data to the screen, it is called the standard error stream (stderr). • With two different output streams (both are directed to the screen), we can have a better management of display • Normal output (input prompts, messages, information) stdout • Error output (error message) stderr • We can even separate two types of output by redirection (next slide)
Capture Output in OS • It is possible to capture ALL the output generated by a program using redirection operator in the shell (e.g. C-shell, or Bash Shell in your Linux, or cmd.exe in Windows). • Log files are very useful in debugging!!! • Example (Linux, C-shell): • ./a.out > output.txt • In the above, instead of printing output to the screen, the normal output will be saved to a file named output.txt. • Data printed to error stream will not be redirected (still displayed on the screen). • (./a.out > output.txt) >& error.txt • This can redirect the normal output to output.txt, while the error output to error.txt.
Redirect Input Stream • Similar to output streams, we can redirect a file content to the standard input stream. • Example: int main( ) { // program int x, y; double z; cin >> x >> y >> z; return 0; } 100 90 // input.txt 80.2 ./a.out < input.txt // command line
C API Output Functions
Formatted I/O • The printf function provides a convenient way to output formatted data to stdout. int printf(const char * format [ , argument , ...]); • Print formatted dataPrints the arguments formatted as the format argument specifies. • The format is a C-string. The C-string specifies what the output looks like. • The arguments are data to substitute into the format template. • Recall that this accepts any number of parameters, which is implemented by ellipse list in C. • It returns the number of characters written, or a negative number when error occurs.
Formatted I/O • Example • printf(“Hello\n”); • printf(“Hello %d\n”, 100); • printf(“Hello %f %s\n”, 2.5, “abc”); • …. • The % means that it is a space to substitute arguments into the place to be outputted. • %d -> integer • %f -> floating point number • %% -> the % sign • …. and many others of them (check the documents).
Formatted I/O • Recall that the ellipse list cannot check the type of parameters. • A common pitfall in using printf is substituting a mismatched type into the arguments. • For example, • printf(“Hello %f\n”, 100); • This will print some garbage value, since it trys to interpret a binary number representing an integer as a floating point number.
Formatting strings • The same formatted result can be used to format a string using sprintf. int sprintf ( char * buffer, const char * format [ , argument , ...] ); • Print formatted data to a string.Writes a sequence of arguments to the given buffer formatted as the format argument specifies, instead of stdout. • The first argument is a character array “buffer” that the function will write the formatted string into it. • fprintf is a similar function, which can print to any stream, including file streams.
sprintf Example int main() { buffer = new char[30]; char buffer2[40]; sprintf(buffer, “Hello %s”, “World”); sprintf(buffer2, “Mario %s”, “World”); delete[] buffer; return 0; }
The buffer –and the buffer overflow problem • The sprintf() function’s first argument requires a buffer to store the formatted output of the string. • The buffer should be allocated, either on the stack (static array) or on the heap (dynamic array). • What if it points to somewhere that cannot be written? • Runtime error “may” occurs. Because it corrupts the memory location, which may be used for another variable. • It just like throwing rubbish on the street. If there is a policeman, then you are caught, otherwise you are safe. • DON’T take the chance. • What if it points to somewhere allocated but not sufficient space to store the formatted output. char buffer[5]; sprintf( buffer, "%s", "Hello World" ); • Heap -> may corrupt other objects. • Stack -> may corrupt the stack -> serious problem.
Buffer Overflow Attack • Buffer overflow attack is a typical method for crackers to break a program. • Usually, they input a string into the program so that it is too long and will eventually corrupt the stack to do something “strange”. • Don’t do this if you are not expert. • To an extreme, crackers can execute arbitrary code on another machine. • Dangerous? Yes! One might be careful about buffer and if it will overflow. • The function snprintf helps to avoid buffer overflow attack.
snprintf int snprintf(char *s, size_t size, const char *template, ...) • The snprintf function is similar to sprintf, except that the size argument specifies the maximum number of characters to produce. The trailing null character is counted towards this limit, so you should allocate at least size characters for the string s. • As a kind reminder, size_t is just an unsigned integer.
C API Input Functions
scanf • Now we turned our head from formatted output to formatted input. int scanf(const char * format [ , argument , ...]); • Read formatted data from stdin.Reads data from the standard input (stdin) and stores it into the locations given by argument(s). Locations pointed by each argument are filled with their corresponding type of value requested in the format string. • There is NO pass-by-reference in C, passing a non-const pointer is the only way to allow modify a parameter in C. • There must be the same number of type specifiers in format string as that of arguments passed.
scanf - pointers • Notice, to use scanf, you must specify the address of the variable storing the input. • Examples: • int x; scanf(“%d”, &x); • input: 22 • x = 22 • float f, int y; scanf(“%f=%d”, &f, &y); • Input: 0.2=3 • f = 0.2, y = 3 • int x; char remain[1024]; scanf( "%d", &x ); scanf( "%s", remain ); • Input: 2.2 • x = 2, remain = .2
fscanf, sscanf • fscanf– accept extra parameter to specify which stream to read from. • sscanf– accept input from a C-string. • You can use this to “parse” a string. • Separating a string into separate components according to a format.
scanf returns … • Return Value. • The number of items succesfully read. This count doesn't include any ignored fields with asterisks (*). • Technically, but seldom used, you can add a format argument to specify a type but don’t want its value. • int a, b; scanf(“%d %*d %d”, &a, &b); • Input: 1 2 3 • a = 1, b = 3 • returns 2 • If EOF is returned an error has occurred before the first assignment could be done. • Tips: • Always check return value. • Output understandable error message and ask for re-entering input or quitting program. • For more robust parsing, using a more sophisticated parsing method instead of using scanf.
C API File-Related Functions
File I/O: Overview • File I/O is a service provided by the operating system through a set of functions. • These functions must be able to know which file the user is reading/writing. • That’s why all file related function (except fopen), requires a parameter of type FILE*. The argument is called the file descriptor, which is a unique identifier for each opened file in the operating system. • There are other properties stored in the structure FILE (such as access mode).
Various functions for File I/O • fopen– open a file, with various access modes. • feof– check whether the file ended. • fgetc– read a character from the file • fputc– write a character to the file • fread– read a block of characters from the file • fwrite– write a block of characters to the file • fclose– close the file • fseek– move the currently reading position • rewind– move the currently reading position back to the beginning of the file. • fflush– flush the buffer and write to the file immediately.
Binary File / Text File • A text file is simply a human-readable file. • A binary file is not human-readable. • Comparison between text file and binary file: • Advantages of text files: • Human readable. • Compressible (since it wastes lots of space inherently). • Advantages of binary files: • No precision problem for printing floating point numbers. • Usually, smaller in size. • Hidden some information (you can’t understand the file without the file format). • Easier to read an arbitrary element in an array (eg. each integer is stored as 4 bytes, the 101th element is located at the 401th byte)
Typical File I/O Scenarios (1) • The first thing you need to consider is that the file is a text file or is a binary file. • The next thing you need to know is that you want to read, write or append the file. • Now you can open the file, using the fopen function. Remember to check if you can successfully open the file or not (return NULL when fails). • Then you need to know whether you want to process the file byte, block, or token. • For byte-level access, you may want to use fgetc or fputc(text file). • For block, you may want to use fread/fwrite (binary file). • For token, you can use fscanf (text file).
Typical File I/O Scenarios (2) • Whenever you read/write a file, the file pointer, associated with the file descriptor, moves to the end of you last accessed location and your next operation starts there. Sometimes you want to move around the file pointer in the file, then you use fseek/rewind. • You can also check whether you have reached the end of the file, using the function feof. • Last, but still very important, is to fclose the file. • Otherwise, the content might not be saved into the file (or sometimes other cannot delete/open the file)
More about fgetc() • Recall that the default behavior of cin is to skip the whitespace (space, newline). • fgetc (or getc, which read from standard input) won’t skip white space. • You can hence use fgetc to count how many spaces in an input.
More about feof() • The concept of “end-of-file” is a little bit strange, it can be interpreted in this way. • The file is appended with a character, called the “eof” character. • Whenever this character is read, after reading, the feof() return true. Before reading, however feof() return false even if the last character of the original file is already read. • For example • while (!feof(file)) {cout << fgetc(file) << endl;} • This is NOT correct! Since it will actually output the end-of-file character once. Even the last character is read, the feof() still return true. • The corrected version: while (true) {int v = fgetc(file); if (feof(file)) break; cout << v<< endl;}
More about fflush() • By default, C I/O are buffered (that’s true for C++ I/O as well). • Buffered I/O means that the data to be outputted are stored in a piece of memory before it is truly written out. This is to take advantage of block I/O performance. • Writing to memory (RAM) is much more efficient than writing to disk. • Eg. DMA technique can be efficiently used to write a block of data to disk. • It is up to the library when to truly written out the data or wait. To force the library to write out the data, you should use fflush(). • When the file is closed using fclose(), the data is flushed.
C API Memory Management Functions
Memory Management • Memory is NOT an infinite asset, and is shared across all programs. • Recall that deallocate memory once it is not needed. • The operating system has a module called the memory manager to ensure each process access it’s own memory, and will not corrupt others. • Memory management is a highly computational activity. If memory allocation/deallocation happens too often, it will leads to: • Slowing down of system performance • Memory fragmentation – where big piece of continuous is hard to find.
Memory Fragmentation Illustrated • Suppose there are 100 bytes on memory. A cell means 10 bytes Marked with “A” means it is allocated. • However, when another 20 bytes of memory is requested, it cannot be fulfilled. However 30 bytes of memory is actually not allocated.
Memory Related Functions • malloc– allocate continuous piece of memory of specified size • realloc– resize the allocated piece of memory, the contents are retained. • calloc – same as malloc, except initializing the memory to zero • free– release the memory back to the memory manager • Same function is used both for dynamic variables and dynamic arrays (this is different from delete and delete[] in C++) • memcpy– bitwise copy from a memory location to another memory location • memset– quickly set a value the each byte of a piece of memory
Operator sizeof • Many memory management functions require the size of a type. • It is a bad practice to hard-code the size (say, hard-code 4 for integer). • sizeof operator is used to return the size of a variable • int x; sizeof(x) 4 bytes (in Win32 machine) • double x; sizeof(x) 8 bytes (in Win32 machine) • You can also apply sizeof to a type • sizeof(int) • Always use sizeof, even if you know the size of the datatype in your machine. • This produces a more portable code.
More about malloc / free • To allocate a dynamic object, you can use • int* pInt = (int*) malloc( sizeof(int) ); • Obj* pObj = (Obj*) malloc( sizeof(Obj) ); • To allocate a dynamic array of length = ELEMENT, you can use • int* pIntArray = (int*) malloc( sizeof(int)* ELEMENT ); • Obj* pObjArray = (Obj*) malloc( sizeof(Obj)* ELEMENT ); • If there is not enough space to allocate, it return NULL. • Always check this in programming. • To deallocate memory, you can use • free(pInt); free(pObj); • free(pIntArray); free(pObjArray);
More about realloc • Why realloc is faster than allocate then copy? • Because it tries to expand the original allocated piece by the specified size, if there are sufficient free space around the original allocation. • This may improve the efficiency of the solution for the last question in Midterm. • What will happen if I use realloc but the current address do NOT allow expansion? • It falls back do allocate then copy.
More about memset / memcpy • You can initialize an array to 0 by: memset( pIntArray, 0, sizeof(int)*ELEMENT ); • You can copy an array to another by: int* pIntArray2 = (int*) malloc( sizeof(int) * ELEMENT); memcpy( pIntArray2, pIntArray, sizeof(int)*ELEMENT ); • Again, this is much more efficient than the solution for the last question in Midterm, which uses a for-loop to copy the elements.
C API Other aspects of C Programming
Other aspects of C Programming • There are some more strange convention when writing a C program. • All variables must be declared at the beginning of a scope. • Use a C compiler – gcc, instead of g++. • Use extern "C" in function prototype to link functions compiled with a C compiler (more about this in next slide). • You can also use C++ compiler to compile a C program (C++ is a superset).
Other aspects of C Programming • Assume func_c.obj, which defined a function add, is compiled with C compiler. • And assume you cannot access the source code of that object file. • Now, if we want to use the function add in a C++ program: // A C++ program, using C function #include <iostream> using namespace std; extern "C" { int add( int, int ); } int main( ) { int x = 10, y = 12; cout << add(x,y) << endl; return 0; }