Matlab Regular Expression

Matlab Regular Expression 2012.4.09

Let’s begin with ‘*’ • Asterisk ‘*’ can match all words when we interact with program, e.g. MS office software. • ‘*.*’ is match all files name.

Regular Expression -- regexp • In computing, a regular expression provides a concise and flexible means to "match" (specify and recognize) strings of text, such as particular characters, words, or patterns of characters. -- From Wikipedia

Matlab Regular Expression • regexp: Match regular expression (case sensitive) • regexpi: Match regular expression (case insensitive) • regexprep: Replace string using regular expression

Examples str= 'bat cat can car COAT court cut ct CAT-scan'; regexpi(str, 'c[aeiou]+t') ans = 5 17 28 35 • [aeiou]: 1 char within [] should be matched; • +: matches the preceding pattern element 1+ times; • defalut output: starting indices of each match.

str = 'Madrid, Spain'; s1 = regexp(str, '[A-Z]'); s2 = regexp(str, '\s'); s1 = 1 9 s2 = 8 • [A-Z]: a char from A to Z of alphabet should be contained; • \s: space characters, matches a whitespace character.

str = 'regexp helps you relax'; [m s e] = regexp(str, '\w*x\w*', 'match', 'start', 'end') m = 'regexp' 'relax' s = 1 18 e = 6 22 • match, start, end: specifies the output format, text, starting and ending indices of matched words; • \w: space characters, same as [A-Za-z0-9_] • *: matches the preceding pattern element 0+ times.

Output Arguments

str = 'She sells sea shells by the seashore.'; [matchstrsplitstr] = regexp(str, '[Ss]h.', 'match', 'split') matchstr= 'She' 'she' 'sho' splitstr = '' ' sells sea ' 'lls by the sea' 're.' • split: specifies the output format, find the substrings delimited by the pattern.

Character Type Operators

str = 'The rain in Spain falls mainly on the plain.'; expr = '..ain'; matchStr = regexp(str, expr, 'match') matchStr = ' rain' 'Spain' ' main' 'plain‘ [mat idx] = regexp(str, '[rpm]ain', 'match', 'start') mat = 'rain' 'pain' 'main' idx = 5 14 25 [mat ix1 ix2] = regexp(str, '\w*n\s', 'match', 'start', 'end') mat = 'rain ' 'in ' 'Spain ' 'on ' ix1 = 5 10 13 32 ix2 = 9 12 18 34

Positional Operators

Quantifiers

Character Representation

Grouping Operators

regexp('B5 A2 6F 63 R6 P4 B2 BC', '(?:[A-Z]\d\s?){2,}', 'match') ans = 'B5 A2 ' 'R6 P4 B2 ‘ regexpi(pstr, '(let|tel)\w+', 'match') ans = 'lets' 'telegram'

Operators Used with Tokens

String Replacement

Introduction to Using Tokens poestr = ['While I nodded, nearly napping, ' 'suddenly there came a tapping,']; [mat tokext] = regexp(poestr, '(\S)\1', 'match', 'tokens', 'tokenExtents'); mat = 'dd' 'pp' 'dd' 'pp' The tokens returned in cell array tok are: 'd', 'p', 'd', 'p‘ Starting and ending indices for each token in the input string poestr are: 11 11, 26 26, 35 35, 57 57

Using Tokens in a Replacement String poestr= ['While I nodded, nearly napping, ' 'suddenly there came a tapping,']; regexp(poestr, '(?<anychar>.)\k<anychar>', 'match') ans= 'dd' 'pp' 'dd' 'pp‘ When referencing a named token within the expression, use the syntax \k<name> instead of the numeric \1, \2, etc.:

Using Tokens in a Replacement String regexprep('NormaJeanBaker', '(\w+\s\w+)\s(\w+)', '$2, $1') ans = Baker, Norma Jean Cstr = 'Whose woods these are I think I know.' ; s = regexprep(cstr, '(.)\1', '--', 'ignorecase') s = 'Whose w--ds these are I think I know.‘ ss='Hello <a href="world">world</a>. 2 < 5'; b='<.*?>'; sr=regexprep(ss,b,'') sr = 'Hello world. 2 < 5'

Named tokens are denoted by the pattern (?<name>...). The 'names' result structure will have fields corresponding to the named tokens in EXPRESSION str = 'John Davis; Rogers, James'; pat = '(?<first>\w+)\s+(?<last>\w+)|(?<last>\w+),\s+(?<first>\w+)'; n = regexp(str, pat, 'names'); n(1).first = 'John' n(1).last = 'Davis' n(2).first = 'James' n(2).last = 'Rogers'

When one of STRING or EXPRESSION is a cell array of strings, REGEXP matches the string input with each element of the cell array input str= {'Madrid, Spain' 'Romeo and Juliet' 'MATLAB is great'}; pat = '\s'; regexp(str, pat) ans = {[8]; [6 10]; [7 10]}

When both STRING and EXPRESSION are cell arrays of strings, REGEXP matches the elements of STRING and EXPRESSION sequentially. The number of elements in STRING and EXPRESSION must be identical str= {'Madrid, Spain' 'Romeo and Juliet' 'MATLAB is great'}; pat = {'\s', '\w+', '[A-Z]'}; regexp(str, pat) ans = {[8]; [1 7 11]; [1 2 3 4 5 6]}

Nonmatching Operators

LookaroundOperators

http://www.mathworks.com/help/techdoc/matlab_prog/f0-42649.htmlhttp://www.mathworks.com/help/techdoc/matlab_prog/f0-42649.html http://www.mathworks.com/help/techdoc/ref/regexp.html

Matlab Regular Expression

Matlab Regular Expression

Presentation Transcript

Regular Expression 1. What is regular expression?

Regular- expression Generator

Regular Expression

Regular Expression

Regular Expression: Pattern Matching

^Regular Expression$

Regular Expression - Intro

Regular Expression

Regular Expression

Regular Expression

Regular Expression

Regular Expression Support

Regular Expression

Regular Expression

Regular Expression (1)

Regular Expression Support