1 / 28

Chapter 7

Text Manipulation. Chapter 7. Text Terminology. Text in MATLAB can be represented using: character vectors (in single quotes) string arrays, introduced in R2016b (in double quotes) Many functions that manipulate text can be called using either a character vector or a string

janebrown
Download Presentation

Chapter 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Manipulation Chapter 7 Attaway MATLAB 5E

  2. Text Terminology • Text in MATLAB can be represented using: • character vectors (in single quotes) • string arrays, introduced in R2016b (in double quotes) • Many functions that manipulate text can be called using either a character vector or a string • Additionally, new functions have been created for the string type • Prior to R2016b, the word “string” was used for what are now called character vectors

  3. Characters • Both character vectors and string scalars are composed of groups of characters • Charactersinclude letters of the alphabet, digits, punctuation marks, white space, and control characters • Control characters are characters that cannot be printed, but accomplish a task (such as a backspace or tab) • White space characters include the space, tab, newline, and carriage return • Leading blanks are blank spaces at the beginning • Trailing blanks are blank spaces at the end • Individual characters in single quotation marks are the type char (so, the class is char) • Function newline returns a newline character

  4. Character Vectors • Character vectors: • consist of any number of characters • are contained in single quotation marks • are displayed using single quotation marks • are of the type, or class, char • Examples: ‘x’, ‘’, ‘abc’, ‘hi there’ • Since these are vectors of characters, many built-in functions and operators that we’ve seen already work with character vectors as well as numbers – e.g., length to get the length of a character vector, or the transpose operator • You can also index into a character vector to get individual characters or to get subsets (called a substring for a character vector or a string)

  5. String Scalars • A string scalar is a single string, and is used to store a group of characters such as a word • Strings can be created • using double quotes, e.g. “Awesome” • using the string function, e.g. string(‘Awesome’) • Strings are displayed using double quotes • The class of a string scalar is ‘string’

  6. Dimensions of character vectors • A single character is a 1 x 1 scalar >> letter = 'x'; >> size(letter) ans = 1 1 • A character vector is 1 x n where n is the length >> myword = 'Hello'; >> size(myword) ans = 1 5

  7. Dimensions of strings • A string is a 1 x 1 scalar >> mystr = "Awesome"; >> size(mystr) ans = 1 1 • A character vector is contained in a string; curly braces can be used to index: >> mystr(1) ans = "Awesome" >> mystr{1} ans = 'Awesome' >> mystr{1}(1) ans = 'A'

  8. Length of strings • We have seen that the length function can be used to find the number of characters in a character vector • However, since a string is a 1 x 1 scalar, the length of a string is 1 • Use the strlength function for strings: >> length("Ciao") ans = 1 >> strlength("Ciao") ans = 4 • Even an empty string is a 1 x 1 scalar, so the length is 1 (not 0)

  9. Groups of strings • Groups of strings can be stored in • string arrays • character matrices • cell arrays (in Chapter 8) • String arrays are very easy to use: >> greets = ["hi", "ciao"] greets = 1×2 string array "hi" "ciao" >> greets(1) ans = "hi"

  10. Character matrices • Creating a column vector of character vectors actually creates a character matrix (a matrix in which every element is a single character) • There are 2 ways to do this: • Using [ ] and separating with semicolons • Using char • Since all rows in a matrix must have the same number of characters, shorter character vectors must be padded with blank spaces so that all are the same length; the built-in function char will do that automatically • So, string arrays are so much easier!!

  11. Creating Character Matrices • Both [ ] and char can be used to create a matrix in which every row has a character vector: >> cmat = ['Hello';'Hi '; 'Ciao ']; >> cmat = char('Hello', 'Hi', 'Ciao’); • Both of these will create a matrix cmat: • Shorter strings are padded with blanks e.g. ‘Hi ’ • Again, string arrays are so much easier!

  12. Operations on Character Vectors • Character vectors can be created using single quotes • The input function creates a character vector (don’t forget the second argument, ‘s’) >> cvword = input('Enter a word: ','s') Enter a word: science cvword = 'science' • The blanks function creates a character vector with n blank spaces >> bb = blanks(3) bb = ' '

  13. Operations on Strings • Strings are created using double quotes or by passing a character vector to the string function • Without any arguments, the string function creates an empty string (which, don’t forget, is still a 1 x 1 scalar so the length is 1) • The plus function or operator can join, or concatenate, two strings together >> "abc" + "xyz" ans = "abcxyz"

  14. Text Functions • Many functions can have either character vectors or strings as input arguments • Generally, if a character vector is passed to the function, a character vector will be returned, and if a string is passed, a string will be returned • For example, changing case using upper or lower: >> upper('abc') ans = 'ABC' >> lower("HELP") ans = "help"

  15. Removing Characters • deblank removes trailing blanks • strtrim removes leading and trailing blanks • (Note: neither function removes blanks in the middle of strings) • strip removes leading and/or trailing characters; either whitespace characters by default or else a specified character • erase removes all occurrences of a substring within a string or character vector (so could be used to erase blanks in the middle of strings)

  16. String Concatenation • There are several ways to concatenate, or join, strings (using +) and character vectors (using [ ]) >> "abc"+"d" ans = "abcd" >> ['abc' 'd'] ans = 'abcd' • The strcat function can be used; it will remove trailing blanks for character vectors but not for strings >> strcat('Hi ', 'everyone') ans = 'Hieveryone' >> strcat("Hi ", "everyone") ans = "Hi everyone”

  17. The sprintf function • sprintf works just like fprintf, but instead of printing, it creates text– so it can be used to customize the format of text • So, sprintf can be used to create customized text to pass to other functions (e.g., title, input) >> maxran = randi([1, 50]); >> prompt = sprintf('Enter an integer from 1 to %d: ', maxran); >> mynum = input(prompt); Enter an integer from 1 to 46: 33 • Any time text is required as an input, sprintf can create customized text

  18. String Comparisons • strcmp compares two strings or character vectors and returns logical 1 if they are identical or 0 if not (or not the same length) • Variations: • strncmp compares only the first n characters • strcmpi ignores case (upper or lower) • strncmpi compares n characters, ignoring case

  19. The equality operator • To use the equality operator with character vectors, they must be the same length, and each element will be compared. For strings, however, it will simply return 1 or 0: >> 'cat' == 'car' ans = 1 1 0 >> 'cat' == 'mouse' Matrix dimensions must agree. >> "cat" == "car" ans = 0 >> "cat" == "mouse" ans = 0

  20. Find and replace functions • Note that the word “string” here can mean string or character vector • strfind(string, substring): finds all occurrences of the substring within the string; returns a vector of the indices of the beginning of the strings, or an empty vector if the substring is not found • strrep(string, oldsubstring, newsubstring): finds all occurrences of the old substring within the string, and replaces with the new substring • the old and new substrings can be different lengths • count(string, substring):counts the number of occurrences of a substring within a string

  21. The strtok function • The strtok function takes a string and breaks it into two pieces and returns both strings • It looks for a delimiter (by default a blank space) and returns a token which is the beginning of the string up to the delimiter, and also the rest of the string, including the delimiter • A second argument can be passed for the delimiter • So – no characters are lost; all characters from the original string are returned in the two output strings • Since the function returns two strings, the call to strtok should be in an assignment statement with two variables on the left to store the two strings

  22. Examples of strtok >> mystring = 'Isle of Skye'; >> [first, rest] = strtok(mystring) first = Isle rest = of Skye >> length(rest) ans = 8 >> [f, r] = strtok(rest, 'y') f = of Sk r = ye

  23. Functions of String Arrays • These functions work with string arrays, but not character vectors: • strings: preallocates a string array to empty strings • strjoin: concatenates strings in a string array together • strsplit: splits a string into string scalars • join: concatenates strings in corresponding elements of columns in a string array • Also, the plus function or operator can concatenate the same string to all string scalars in a string array

  24. “is” Functions to determine type • ischar true if the input argument is a character vector • isstring true if the input argument is a string • isStringScalar true if the input argument is a string scalar

  25. “is” and true/false Functions • functions for strings and character vectors: • isletter true if the input argument is a letter of the alphabet • isspace true if the input argument is a white space character • isstrprop determines whether the characters in a string are in a category specified by second argument, e.g. ‘alphanumeric’ • contains determines whether a substring is somewhere within a string • endsWith/startsWith determine whether text ends with (or starts with) a specified string

  26. String/Number Functions • Converting from strings to numbers and vice versa: • int2str converts from an integer to a character vector storing the integer • num2str converts a real number to a character vector containing the number • string converts number(s) to strings • str2num (and str2double) converts from a string or character vector containing number(s) to a number array • Note: this is different from converting to/from ASCII equivalents

  27. Common Pitfalls • Trying to use == to compare character vectors for equality, instead of the strcmp function (and its variations) • Confusing sprintf and fprintf. The syntax is the same, but sprintf creates a string whereas fprintf prints  • Trying to create a vector of character vectors with varying lengths (use string arrays instead) • Forgetting that when using strtok, the second argument returned (the “rest” of the string) contains the delimiter.

  28. Programming Style Guidelines • Make sure the correct string comparison function is used; for example, strcmpi if ignoring case is desired

More Related