1 / 31

Characters and Strings

Characters and Strings. Character and String definitions, algorithms, library functions. Character and String Processing. A common programming issue involves manipulation of text, usually referred to as string, or text, processing To achieve solutions typically requires capabilities to:

arav
Download Presentation

Characters and Strings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Characters and Strings Character and String definitions, algorithms, library functions

  2. Character and String Processing • A common programming issue involves manipulation of text, usually referred to as string, or text, processing • To achieve solutions typically requires capabilities to: • perform input and output of characters and strings • query what a single character is, or is not • determine if a character, a substring, or any of a set of characters is included, or not, in a string • determine the attributes of a character (eg. upper versus lower case) or string (eg. length) • convert between character string and machine representations of different data types • break large strings into smaller substrings recognized by tokens • join substrings into larger strings (catenation)

  3. Characters and Strings in C • The concept of a string refers to a sequence of items. • The sequence, or string, may contain zero or more elements, and a delimiter that denotes the end (termination) of the string. • A string of characters, in computer science terms, usually refers to a vector, or list, of char values • ASCII is commonly used • UniCode is another • In the C language, the special delimiter character ‘\0’ (called character null) is recognized by the compiler and assigned a specific integer value • Strings of bits (or other encoded symbols) provides abstraction possibilities for more general strings.

  4. Fundamentals • Defining a string container • Example: #define STRLEN 256 char strName [ STRLEN ] ; • Example: char strName [ ] ; char * strPtr ; • Initialization • Example: char strName1 [ ] = “My name is Bob!” ;const char * strStatic = “String that cannot be changed!” ; char strName2 [ ] = { ‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\0’ } ; • Example: char strName [ 50 ] ;int k ; for( k=0; k<49; k++ ) strName[k] = ‘#’ ; // Fill with # symbolsstrName[49] = ‘\0’ ; • Consider a variation of the second example, using pointers: char strName [ 50 ] , * strPtr ;int k ; for( k=0, strPtr = strName; k<49; k++, strPtr++ ) *strPtr= ‘#’ ; *strPtr= ‘\0’ ; l o H e l \0 String length Delimiter (character null, terminal) Sequence of characters (value of the string)

  5. Character Handling Library • The C language standard supports the notion of char data type, and the delimiter character code ‘\0’. • We do not need to know the details of how character data is represented in bit form • In programming and algorithm design it is useful to know and use a wide variety of functions that query or manipulate (transform) both individual character data as well as strings of characters • We will discuss functions from four libraries • #include <ctype.h> • #include <stdlib.h> and #include <stdio.h> • #include <string.h> • We start with the character function library, <ctype.h>

  6. Character Handling Library <ctype.h> • Begin with character query functions • General prototype form: • intisValueRange ( int c ) ; // Returns 1 if a match, or 0 • ValueRangerefers to a single value or a range of values • Note that the input argument c has the date type int • Intuition would suggest c should be char type • Technical considerations (involving representation of non-ASCII data encodings) recommend for using int, recalling that char is a compatible sub-type of int (and short int).

  7. Character Handling Library • Additional query functions provide information about the nature of the character data • Transformative functions modify the character data

  8. Character Handling Library • And still more query functions for non-alphanumeric character data (eg. graphical, control signals, punctuation)

  9. Example: Counting characters • Problem: Determine the frequencies of occurrence for each alphabetic character (ignoring case) in a text file. • Solution: #include <ctype.h> #include <stdio.h>int main ( ) { int N=0, K, C[26] ; double F[26] ; char Ch ; for( K=0; K<26; K++ ) { C[K]=0; F[K]=0.0; } for( Ch=getchar(); Ch!= EOF; N++, Ch=getchar() ) { if( isalpha( Ch ) ) { K = toupper( Ch ) – ‘A’ ; C[K]++ ; } for( K=0; K<26; K++) { F[K] = C[K] * 1.0 / N ;printf( “Frequency of letter %c: %lf\n”, (char) (K+’A’), F[K] ) ; } return 0 ; }

  10. String Conversion Functions: <stdlib.h> • Purpose of these functions is to convert a string (or portion) to (1) an integer or (2) a floating point type • General prototype form: • resultTypestrtoOutputType ( constchar * nPtr, char **endPtr[, int base ] ) ; • nPtr points at the input string (protected as constant) • resultTyperefers to one of double, long int, or unsigned long int • OutputTyperefers to one of d, l, or ul • base refers to the base of the input string (0, or 2..36) • endPtr points at the position within the input string where a valid numeric representation terminates endPtr nPtr - 1 2 3 . 8 9 5 $ b C \0

  11. String Conversion Functions Note that one can also determine the size of the initial substring used to determine the double value returned, namely: intNumChars ; NumChars = ( EP – S ) / sizeof( char ) ; // sizeof(char) usually 1 endPtr nPtr Example usage: double D ; const char * S = “ -123.895 $bC” ; char * EP ; D = strtod( S, &EP ) ; if( EP != S ) printf( “Value converted is %lf\n”, D ) ; else printf( “No value could be converted\n” ) ; - 1 2 3 . 8 9 5 $ b C \0

  12. String Conversion Functions long int LI ; const char * S = “ -1234.$bC” ; char * EP ; LI = strtol( S, &EP, 0 ) ; // 0 base => base = 8, 10, 16 if( EP != S ) printf( “Value converted is %ld\n”, LI ) ; else printf( “No value could be converted\n” ) ; endPtr nPtr - 1 2 3 4 . $ b C \0

  13. String Conversion Functions long int LI ; const char * S = “ –Ab2$” ; char * EP ; LI = strtol( S, &EP, 13 ) ; // base = 13 if( EP != S ) printf( “Value converted is %ld\n”, LI ) ; else printf( “No value could be converted\n” ) ; // Value outputted is the negative of: // A*13*13 + b*13 + 2 = 1690+143+2 = 1835 (base-10) • The base argument value (for integer conversions only!) defines the base of the input string. • For base=0, the input string digits may be in base 8, 10 or 16. • The case base=1 is not used. • For 2 <= base <= 36 the characters that are interpretable as base digits lie in the range from 0 to (base-1)

  14. String Conversion Functions • The C standard utilities library <stdlib.h> also includes two additional conversion functions for long longint, both signed and unsigned.

  15. Useful <stdio.h> Functions • The C standard input/output library contains useful functions • I/O of characters and strings • Conversion to and from character and internal data representations

  16. Useful <stdio.h> Functions #include <stdio.h> int main () { int C ; // can also use char while( (C = getchar() ) != EOF && C != ‘\n’ ) putchar( C ) ; return 0 ; } CAUTION: When stdin is the keyboard, remember that pressing the Enter key to signal input generates a character and this must be accounted for. #include <stdio.h> #define MAX 256 int main () { char S [ MAX ], * sPtr ; while( (sPtr = fgets( S, MAX, stdin )) != NULL ) puts( S ) ; return 0 ; }

  17. Useful <stdio.h> Functions #include <stdio.h> int main () { int A ; float X ; char S[100], M[100] ; char FormatStr[7] = “%d%f%s” ; scanf( FormatStr, &A, &X, S ); printf( FormatStr, A, X, S ) ; fgets( M, 100, stdin ); sscanf( M, FormatStr, &A, &X, S ); sprintf( M, FormatStr, A, X, S ); puts( M ); return 0 ; } • The functions sprintf() and sscanf() are used for processing of character (string) data and machine representations of data (according to different data types). • All data processing is done in RAM – no I/O is involved!

  18. String Manipulation Functions • Two functions are provided to perform copying of one string into another string.

  19. String Manipulation Functions • Joining together of two strings is called string catenation (also called concatenation). • For instance, one might combine various words and phrases to form sentences and paragraphs.

  20. String Comparison Functions • Comparison of two strings is based on the notion of lexical ordering. • All characters are encoded (eg. ASCII) and the numeric values of the characters defines the possible orderings. • String comparisons are done based on both (a) character by character comparison, and (b) use of relative length of each string.

  21. Strings - Search Functions • C provides functions for searching for various characters and substrings within a string • This is a huge advantage in text processing

  22. Strings - Search Functions

  23. Strings - Search Functions • Consider the problem of a string of text S1 that contains various words (substrings) separated by specially designated characters used as delimiters (and contained in a string S2). The objective is to extract the words from the text. This can be accomplished using the function strtok() repeatedly. • Each identified substring in S1, delimited by a character in S2, is called a token. Thus, strtok() is called the string tokenizer function.

  24. Strings - Search Functions #include <stdio.h> #include <string.h> int main () { int N = 0 ; char S[] = “This is a sentence with tokens separated by blanks.” ; char * tokenPtr ; printf( “The following tokens were found in S.\n” ) ; tokenPtr = strtok( S, “ “ ) ; // First time use S; ‘ ‘ is the only delimiter while( tokenPtr != NULL ) { N++ ; printf( “%s\n”, tokenPtr ) ; tokenPtr = strtok( NULL, “ “ ) ; // Use NULL in successive calls } printf( “Number of tokens found = %d\n”, N ) ; return 0 ; }

  25. Strings - Search Functions #include <stdio.h> #include <string.h> int main () { int N = 0 ; char S[] = “This is a sentence with tokens separated by various characters.” ; char * tokenPtr, * DelimList = “ .,;:$“ ; printf( “The following tokens were found in S.\n” ) ; tokenPtr = strtok( S, DelimList ) ; // First time use S; various delimiters while( tokenPtr != NULL ) { N++ ; printf( “%s\n”, tokenPtr ) ; tokenPtr = strtok( NULL, DelimList) ; // Use NULL in successive calls } printf( “Number of tokens found = %d\n”, N ) ; return 0 ; }

  26. Memory Functions in <string.h> • C also provides functions for dealing with blocks of data in RAM • The blocks may be characters, or other data types, hence the functions typically return a void * pointer value. • A void * pointer value can be assigned to any other pointer type, and vice versa. • However, void * pointers cannot be dereferenced, thus the size of the block must be specified as an argument. • None of the functions discussed perform checks for terminating null characters (delimiters).

  27. Memory Functions in <string.h>

  28. Other Functions in <string.h>

  29. Secure C programming • C11 standard with Annex K • Addresses issues related to robustness of array based manipulation of character data (and other data containers) • Stack overflow detection • Array overflow detection • Read more: • CERT guideline INT05-C • www.securecoding.cert.org • Additional online Appendices E-H for the textbook • www.pearsonhighered.com/deitel/

  30. Summary Concepts of character and strings, query functions, transformation functions, search functions, generalization to abstract strings (memory functions).

  31. Topic Summary • Characters and Strings in the C language • Multiple library sources • Query functions • Transformative functions • Conversion functions • Memory functions • Reading – Chapter 8 • Review Pointers as well, especially the const qualifier, and also the use of ** for modifying pointer values on return (through arguments) from functions. • Reading – Chapter 9: Formatted Input and Output • This chapter is straightforward and is assigned for self-directed independent study and learning – it will be tested! Practice, practice, practice !

More Related