280 likes | 291 Views
Text Manipulation. Chapter 7. Text Terminology. Text in MATLAB can be represented using: character vectors (in single quotes) string arrays, introduced in R2016b (in double quotes) Many functions that manipulate text can be called using either a character vector or a string
E N D
Text Manipulation Chapter 7 Attaway MATLAB 5E
Text Terminology • Text in MATLAB can be represented using: • character vectors (in single quotes) • string arrays, introduced in R2016b (in double quotes) • Many functions that manipulate text can be called using either a character vector or a string • Additionally, new functions have been created for the string type • Prior to R2016b, the word “string” was used for what are now called character vectors
Characters • Both character vectors and string scalars are composed of groups of characters • Charactersinclude letters of the alphabet, digits, punctuation marks, white space, and control characters • Control characters are characters that cannot be printed, but accomplish a task (such as a backspace or tab) • White space characters include the space, tab, newline, and carriage return • Leading blanks are blank spaces at the beginning • Trailing blanks are blank spaces at the end • Individual characters in single quotation marks are the type char (so, the class is char) • Function newline returns a newline character
Character Vectors • Character vectors: • consist of any number of characters • are contained in single quotation marks • are displayed using single quotation marks • are of the type, or class, char • Examples: ‘x’, ‘’, ‘abc’, ‘hi there’ • Since these are vectors of characters, many built-in functions and operators that we’ve seen already work with character vectors as well as numbers – e.g., length to get the length of a character vector, or the transpose operator • You can also index into a character vector to get individual characters or to get subsets (called a substring for a character vector or a string)
String Scalars • A string scalar is a single string, and is used to store a group of characters such as a word • Strings can be created • using double quotes, e.g. “Awesome” • using the string function, e.g. string(‘Awesome’) • Strings are displayed using double quotes • The class of a string scalar is ‘string’
Dimensions of character vectors • A single character is a 1 x 1 scalar >> letter = 'x'; >> size(letter) ans = 1 1 • A character vector is 1 x n where n is the length >> myword = 'Hello'; >> size(myword) ans = 1 5
Dimensions of strings • A string is a 1 x 1 scalar >> mystr = "Awesome"; >> size(mystr) ans = 1 1 • A character vector is contained in a string; curly braces can be used to index: >> mystr(1) ans = "Awesome" >> mystr{1} ans = 'Awesome' >> mystr{1}(1) ans = 'A'
Length of strings • We have seen that the length function can be used to find the number of characters in a character vector • However, since a string is a 1 x 1 scalar, the length of a string is 1 • Use the strlength function for strings: >> length("Ciao") ans = 1 >> strlength("Ciao") ans = 4 • Even an empty string is a 1 x 1 scalar, so the length is 1 (not 0)
Groups of strings • Groups of strings can be stored in • string arrays • character matrices • cell arrays (in Chapter 8) • String arrays are very easy to use: >> greets = ["hi", "ciao"] greets = 1×2 string array "hi" "ciao" >> greets(1) ans = "hi"
Character matrices • Creating a column vector of character vectors actually creates a character matrix (a matrix in which every element is a single character) • There are 2 ways to do this: • Using [ ] and separating with semicolons • Using char • Since all rows in a matrix must have the same number of characters, shorter character vectors must be padded with blank spaces so that all are the same length; the built-in function char will do that automatically • So, string arrays are so much easier!!
Creating Character Matrices • Both [ ] and char can be used to create a matrix in which every row has a character vector: >> cmat = ['Hello';'Hi '; 'Ciao ']; >> cmat = char('Hello', 'Hi', 'Ciao’); • Both of these will create a matrix cmat: • Shorter strings are padded with blanks e.g. ‘Hi ’ • Again, string arrays are so much easier!
Operations on Character Vectors • Character vectors can be created using single quotes • The input function creates a character vector (don’t forget the second argument, ‘s’) >> cvword = input('Enter a word: ','s') Enter a word: science cvword = 'science' • The blanks function creates a character vector with n blank spaces >> bb = blanks(3) bb = ' '
Operations on Strings • Strings are created using double quotes or by passing a character vector to the string function • Without any arguments, the string function creates an empty string (which, don’t forget, is still a 1 x 1 scalar so the length is 1) • The plus function or operator can join, or concatenate, two strings together >> "abc" + "xyz" ans = "abcxyz"
Text Functions • Many functions can have either character vectors or strings as input arguments • Generally, if a character vector is passed to the function, a character vector will be returned, and if a string is passed, a string will be returned • For example, changing case using upper or lower: >> upper('abc') ans = 'ABC' >> lower("HELP") ans = "help"
Removing Characters • deblank removes trailing blanks • strtrim removes leading and trailing blanks • (Note: neither function removes blanks in the middle of strings) • strip removes leading and/or trailing characters; either whitespace characters by default or else a specified character • erase removes all occurrences of a substring within a string or character vector (so could be used to erase blanks in the middle of strings)
String Concatenation • There are several ways to concatenate, or join, strings (using +) and character vectors (using [ ]) >> "abc"+"d" ans = "abcd" >> ['abc' 'd'] ans = 'abcd' • The strcat function can be used; it will remove trailing blanks for character vectors but not for strings >> strcat('Hi ', 'everyone') ans = 'Hieveryone' >> strcat("Hi ", "everyone") ans = "Hi everyone”
The sprintf function • sprintf works just like fprintf, but instead of printing, it creates text– so it can be used to customize the format of text • So, sprintf can be used to create customized text to pass to other functions (e.g., title, input) >> maxran = randi([1, 50]); >> prompt = sprintf('Enter an integer from 1 to %d: ', maxran); >> mynum = input(prompt); Enter an integer from 1 to 46: 33 • Any time text is required as an input, sprintf can create customized text
String Comparisons • strcmp compares two strings or character vectors and returns logical 1 if they are identical or 0 if not (or not the same length) • Variations: • strncmp compares only the first n characters • strcmpi ignores case (upper or lower) • strncmpi compares n characters, ignoring case
The equality operator • To use the equality operator with character vectors, they must be the same length, and each element will be compared. For strings, however, it will simply return 1 or 0: >> 'cat' == 'car' ans = 1 1 0 >> 'cat' == 'mouse' Matrix dimensions must agree. >> "cat" == "car" ans = 0 >> "cat" == "mouse" ans = 0
Find and replace functions • Note that the word “string” here can mean string or character vector • strfind(string, substring): finds all occurrences of the substring within the string; returns a vector of the indices of the beginning of the strings, or an empty vector if the substring is not found • strrep(string, oldsubstring, newsubstring): finds all occurrences of the old substring within the string, and replaces with the new substring • the old and new substrings can be different lengths • count(string, substring):counts the number of occurrences of a substring within a string
The strtok function • The strtok function takes a string and breaks it into two pieces and returns both strings • It looks for a delimiter (by default a blank space) and returns a token which is the beginning of the string up to the delimiter, and also the rest of the string, including the delimiter • A second argument can be passed for the delimiter • So – no characters are lost; all characters from the original string are returned in the two output strings • Since the function returns two strings, the call to strtok should be in an assignment statement with two variables on the left to store the two strings
Examples of strtok >> mystring = 'Isle of Skye'; >> [first, rest] = strtok(mystring) first = Isle rest = of Skye >> length(rest) ans = 8 >> [f, r] = strtok(rest, 'y') f = of Sk r = ye
Functions of String Arrays • These functions work with string arrays, but not character vectors: • strings: preallocates a string array to empty strings • strjoin: concatenates strings in a string array together • strsplit: splits a string into string scalars • join: concatenates strings in corresponding elements of columns in a string array • Also, the plus function or operator can concatenate the same string to all string scalars in a string array
“is” Functions to determine type • ischar true if the input argument is a character vector • isstring true if the input argument is a string • isStringScalar true if the input argument is a string scalar
“is” and true/false Functions • functions for strings and character vectors: • isletter true if the input argument is a letter of the alphabet • isspace true if the input argument is a white space character • isstrprop determines whether the characters in a string are in a category specified by second argument, e.g. ‘alphanumeric’ • contains determines whether a substring is somewhere within a string • endsWith/startsWith determine whether text ends with (or starts with) a specified string
String/Number Functions • Converting from strings to numbers and vice versa: • int2str converts from an integer to a character vector storing the integer • num2str converts a real number to a character vector containing the number • string converts number(s) to strings • str2num (and str2double) converts from a string or character vector containing number(s) to a number array • Note: this is different from converting to/from ASCII equivalents
Common Pitfalls • Trying to use == to compare character vectors for equality, instead of the strcmp function (and its variations) • Confusing sprintf and fprintf. The syntax is the same, but sprintf creates a string whereas fprintf prints • Trying to create a vector of character vectors with varying lengths (use string arrays instead) • Forgetting that when using strtok, the second argument returned (the “rest” of the string) contains the delimiter.
Programming Style Guidelines • Make sure the correct string comparison function is used; for example, strcmpi if ignoring case is desired