310 likes | 475 Views
Characters and Strings. Character and String definitions, algorithms, library functions. Character and String Processing. A common programming issue involves manipulation of text, usually referred to as string, or text, processing To achieve solutions typically requires capabilities to:
E N D
Characters and Strings Character and String definitions, algorithms, library functions
Character and String Processing • A common programming issue involves manipulation of text, usually referred to as string, or text, processing • To achieve solutions typically requires capabilities to: • perform input and output of characters and strings • query what a single character is, or is not • determine if a character, a substring, or any of a set of characters is included, or not, in a string • determine the attributes of a character (eg. upper versus lower case) or string (eg. length) • convert between character string and machine representations of different data types • break large strings into smaller substrings recognized by tokens • join substrings into larger strings (catenation)
Characters and Strings in C • The concept of a string refers to a sequence of items. • The sequence, or string, may contain zero or more elements, and a delimiter that denotes the end (termination) of the string. • A string of characters, in computer science terms, usually refers to a vector, or list, of char values • ASCII is commonly used • UniCode is another • In the C language, the special delimiter character ‘\0’ (called character null) is recognized by the compiler and assigned a specific integer value • Strings of bits (or other encoded symbols) provides abstraction possibilities for more general strings.
Fundamentals • Defining a string container • Example: #define STRLEN 256 char strName [ STRLEN ] ; • Example: char strName [ ] ; char * strPtr ; • Initialization • Example: char strName1 [ ] = “My name is Bob!” ;const char * strStatic = “String that cannot be changed!” ; char strName2 [ ] = { ‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\0’ } ; • Example: char strName [ 50 ] ;int k ; for( k=0; k<49; k++ ) strName[k] = ‘#’ ; // Fill with # symbolsstrName[49] = ‘\0’ ; • Consider a variation of the second example, using pointers: char strName [ 50 ] , * strPtr ;int k ; for( k=0, strPtr = strName; k<49; k++, strPtr++ ) *strPtr= ‘#’ ; *strPtr= ‘\0’ ; l o H e l \0 String length Delimiter (character null, terminal) Sequence of characters (value of the string)
Character Handling Library • The C language standard supports the notion of char data type, and the delimiter character code ‘\0’. • We do not need to know the details of how character data is represented in bit form • In programming and algorithm design it is useful to know and use a wide variety of functions that query or manipulate (transform) both individual character data as well as strings of characters • We will discuss functions from four libraries • #include <ctype.h> • #include <stdlib.h> and #include <stdio.h> • #include <string.h> • We start with the character function library, <ctype.h>
Character Handling Library <ctype.h> • Begin with character query functions • General prototype form: • intisValueRange ( int c ) ; // Returns 1 if a match, or 0 • ValueRangerefers to a single value or a range of values • Note that the input argument c has the date type int • Intuition would suggest c should be char type • Technical considerations (involving representation of non-ASCII data encodings) recommend for using int, recalling that char is a compatible sub-type of int (and short int).
Character Handling Library • Additional query functions provide information about the nature of the character data • Transformative functions modify the character data
Character Handling Library • And still more query functions for non-alphanumeric character data (eg. graphical, control signals, punctuation)
Example: Counting characters • Problem: Determine the frequencies of occurrence for each alphabetic character (ignoring case) in a text file. • Solution: #include <ctype.h> #include <stdio.h>int main ( ) { int N=0, K, C[26] ; double F[26] ; char Ch ; for( K=0; K<26; K++ ) { C[K]=0; F[K]=0.0; } for( Ch=getchar(); Ch!= EOF; N++, Ch=getchar() ) { if( isalpha( Ch ) ) { K = toupper( Ch ) – ‘A’ ; C[K]++ ; } for( K=0; K<26; K++) { F[K] = C[K] * 1.0 / N ;printf( “Frequency of letter %c: %lf\n”, (char) (K+’A’), F[K] ) ; } return 0 ; }
String Conversion Functions: <stdlib.h> • Purpose of these functions is to convert a string (or portion) to (1) an integer or (2) a floating point type • General prototype form: • resultTypestrtoOutputType ( constchar * nPtr, char **endPtr[, int base ] ) ; • nPtr points at the input string (protected as constant) • resultTyperefers to one of double, long int, or unsigned long int • OutputTyperefers to one of d, l, or ul • base refers to the base of the input string (0, or 2..36) • endPtr points at the position within the input string where a valid numeric representation terminates endPtr nPtr - 1 2 3 . 8 9 5 $ b C \0
String Conversion Functions Note that one can also determine the size of the initial substring used to determine the double value returned, namely: intNumChars ; NumChars = ( EP – S ) / sizeof( char ) ; // sizeof(char) usually 1 endPtr nPtr Example usage: double D ; const char * S = “ -123.895 $bC” ; char * EP ; D = strtod( S, &EP ) ; if( EP != S ) printf( “Value converted is %lf\n”, D ) ; else printf( “No value could be converted\n” ) ; - 1 2 3 . 8 9 5 $ b C \0
String Conversion Functions long int LI ; const char * S = “ -1234.$bC” ; char * EP ; LI = strtol( S, &EP, 0 ) ; // 0 base => base = 8, 10, 16 if( EP != S ) printf( “Value converted is %ld\n”, LI ) ; else printf( “No value could be converted\n” ) ; endPtr nPtr - 1 2 3 4 . $ b C \0
String Conversion Functions long int LI ; const char * S = “ –Ab2$” ; char * EP ; LI = strtol( S, &EP, 13 ) ; // base = 13 if( EP != S ) printf( “Value converted is %ld\n”, LI ) ; else printf( “No value could be converted\n” ) ; // Value outputted is the negative of: // A*13*13 + b*13 + 2 = 1690+143+2 = 1835 (base-10) • The base argument value (for integer conversions only!) defines the base of the input string. • For base=0, the input string digits may be in base 8, 10 or 16. • The case base=1 is not used. • For 2 <= base <= 36 the characters that are interpretable as base digits lie in the range from 0 to (base-1)
String Conversion Functions • The C standard utilities library <stdlib.h> also includes two additional conversion functions for long longint, both signed and unsigned.
Useful <stdio.h> Functions • The C standard input/output library contains useful functions • I/O of characters and strings • Conversion to and from character and internal data representations
Useful <stdio.h> Functions #include <stdio.h> int main () { int C ; // can also use char while( (C = getchar() ) != EOF && C != ‘\n’ ) putchar( C ) ; return 0 ; } CAUTION: When stdin is the keyboard, remember that pressing the Enter key to signal input generates a character and this must be accounted for. #include <stdio.h> #define MAX 256 int main () { char S [ MAX ], * sPtr ; while( (sPtr = fgets( S, MAX, stdin )) != NULL ) puts( S ) ; return 0 ; }
Useful <stdio.h> Functions #include <stdio.h> int main () { int A ; float X ; char S[100], M[100] ; char FormatStr[7] = “%d%f%s” ; scanf( FormatStr, &A, &X, S ); printf( FormatStr, A, X, S ) ; fgets( M, 100, stdin ); sscanf( M, FormatStr, &A, &X, S ); sprintf( M, FormatStr, A, X, S ); puts( M ); return 0 ; } • The functions sprintf() and sscanf() are used for processing of character (string) data and machine representations of data (according to different data types). • All data processing is done in RAM – no I/O is involved!
String Manipulation Functions • Two functions are provided to perform copying of one string into another string.
String Manipulation Functions • Joining together of two strings is called string catenation (also called concatenation). • For instance, one might combine various words and phrases to form sentences and paragraphs.
String Comparison Functions • Comparison of two strings is based on the notion of lexical ordering. • All characters are encoded (eg. ASCII) and the numeric values of the characters defines the possible orderings. • String comparisons are done based on both (a) character by character comparison, and (b) use of relative length of each string.
Strings - Search Functions • C provides functions for searching for various characters and substrings within a string • This is a huge advantage in text processing
Strings - Search Functions • Consider the problem of a string of text S1 that contains various words (substrings) separated by specially designated characters used as delimiters (and contained in a string S2). The objective is to extract the words from the text. This can be accomplished using the function strtok() repeatedly. • Each identified substring in S1, delimited by a character in S2, is called a token. Thus, strtok() is called the string tokenizer function.
Strings - Search Functions #include <stdio.h> #include <string.h> int main () { int N = 0 ; char S[] = “This is a sentence with tokens separated by blanks.” ; char * tokenPtr ; printf( “The following tokens were found in S.\n” ) ; tokenPtr = strtok( S, “ “ ) ; // First time use S; ‘ ‘ is the only delimiter while( tokenPtr != NULL ) { N++ ; printf( “%s\n”, tokenPtr ) ; tokenPtr = strtok( NULL, “ “ ) ; // Use NULL in successive calls } printf( “Number of tokens found = %d\n”, N ) ; return 0 ; }
Strings - Search Functions #include <stdio.h> #include <string.h> int main () { int N = 0 ; char S[] = “This is a sentence with tokens separated by various characters.” ; char * tokenPtr, * DelimList = “ .,;:$“ ; printf( “The following tokens were found in S.\n” ) ; tokenPtr = strtok( S, DelimList ) ; // First time use S; various delimiters while( tokenPtr != NULL ) { N++ ; printf( “%s\n”, tokenPtr ) ; tokenPtr = strtok( NULL, DelimList) ; // Use NULL in successive calls } printf( “Number of tokens found = %d\n”, N ) ; return 0 ; }
Memory Functions in <string.h> • C also provides functions for dealing with blocks of data in RAM • The blocks may be characters, or other data types, hence the functions typically return a void * pointer value. • A void * pointer value can be assigned to any other pointer type, and vice versa. • However, void * pointers cannot be dereferenced, thus the size of the block must be specified as an argument. • None of the functions discussed perform checks for terminating null characters (delimiters).
Secure C programming • C11 standard with Annex K • Addresses issues related to robustness of array based manipulation of character data (and other data containers) • Stack overflow detection • Array overflow detection • Read more: • CERT guideline INT05-C • www.securecoding.cert.org • Additional online Appendices E-H for the textbook • www.pearsonhighered.com/deitel/
Summary Concepts of character and strings, query functions, transformation functions, search functions, generalization to abstract strings (memory functions).
Topic Summary • Characters and Strings in the C language • Multiple library sources • Query functions • Transformative functions • Conversion functions • Memory functions • Reading – Chapter 8 • Review Pointers as well, especially the const qualifier, and also the use of ** for modifying pointer values on return (through arguments) from functions. • Reading – Chapter 9: Formatted Input and Output • This chapter is straightforward and is assigned for self-directed independent study and learning – it will be tested! Practice, practice, practice !