Regular Expressions

Regular Expressions • A regular expression is a pattern that defines a string or portion thereof. When comparing this pattern against a string, it'll either be true or false. If true, it'll return something. • The return value will depend on the specific function used and its attributes.

Regular Expressions:Basics • Basics of the function • REFind(reg_expression, string [, start] [, return_sub]) • compares a Regular Expression to a string and if it matches all or part of the string, returns the numeric position in the string where the match starts. • The optional start position allows the search to start anywhere in the string. • An additional option is to return sub expressions. • We'll deal with that a little later.

Regular Expressions:Basics • Any Ascii character which is not a special character matches itself. • A matches A • b matches b • A does not match a unless the NoCase version of the function is used • REFindNoCase() • This is slightly slower, but only someone totally anal about run times will be able to tell you how much slower. :)

Regular Expressions:Basics • REFindNoCase('is', 'This is a test') = 3 • REFindNoCase(‘This', 'This is a test') = 1 • REFind (‘t', 'This is a test') = 11

Regular Expressions:Special Characters • A period (.) matches any single character • A pipe (|) means either what comes before it or what comes after it. • A caret (^) at the beginning of a RegEx means that the regex will only match if it starts at the beginning of the comparison string • A dollar sign ($) at the end of a RegEx means that the regex will only match if it ends at the end of the comparison string

Regular Expressions:Special Characters • A backslash (\) means escape the next character if it is a special one • If the character after the backslash is not a special one, then it may be an escape sequence • Displaying a backslash (\) is done by escaping it

Regular Expressions:Special Characters • REFindNoCase(‘i.', 'This is a test') = 3 • REFindNoCase(‘^is', 'This is a test') = 0 • REFindNoCase(‘^t', 'This is a test') = 1 • REFindNoCase(‘t$', 'This is a test') = 14 • REFindNoCase(‘th|te', 'This is a test') = 1 • REFind (‘th|te', 'This is a test') = 11

Regular Expressions:Escape Sequences • When certain non-special characters have a backslash (\) before them, they become special. REFindNoCase(‘\d’, ‘this is 4’) = 9 \d means any number REFindNoCase(‘is \d’, ‘this is 4’) = 6

Regular Expressions:Sets • A character set is a group of characters from which only one is desired [0123456789] – matches any single number Sets can use ranges of characters (think ascii table) [0-9] – matches any single character A dash can be represented in a set by placing it first (I.e. not in a range) [-aeiou] – matches a dash or a vowel A Carat (^) at the beginning of a set negates if (I.e. anything BUT characters in the set

Regular Expressions:Sets • REFindNoCase(‘[AEIOU]’, ‘This is a test’) = 3 • REFindNoCase(‘[0-9]’, ‘this is a test’) = 0 • REFindNoCase(‘[0-9]’, ‘this is a 4th test’) = 11 • REFindNoCase(‘[-0-9]’, ‘this-is a test’) = 5 • REFindNoCase(‘[^0-9]’, ‘this is a test’) = 1 • REFindNoCase(‘[^-]’, ‘this is a test’) = 1 • REFindNoCase(‘[-^]’, ‘this-is a ^’) = 5

Regular Expressions:Sets • ColdFusion also includes a number of predefined sets: • A predefined set is called using a special name surrounded by colons • :alpha: Used within a set, it would look like REFindNoCase(‘[:alpha:]’, ‘123abc’) = 4 Can be combined with other characters in a set REFindNoCase(‘[123[:alpha:]]’, ‘123abc’) = 1

Regular Expressions:Groups • A group allows a portion of a regular expression to be separated from another portion • Also known as subexpressions • Uses parenthesis to group things together REFindNoCase(‘(this|that):’, ‘find this:’) = 6 More uses later

Regular Expressions:Modifiers • A modifier will take the previous character, set or group and say how many times it can or should exits. REFindNoCase(‘ha+’, ‘hahaha’) = 1 REFindNoCase(‘ha*’, ‘hhaha’) = 1 REFindNoCase(‘ha?’, ‘hahaha’) = 1 REFindNoCase(‘ha{2}’, ‘hahaaha’) = 3 REFindNoCase(‘ha{2,3}’, ‘hahaha’) = 3 REFindNoCase(‘ha+{3,}’, ‘hahaha’) = 0 REFindNoCase(‘(ha)+’, ‘hahaha’) = 1

Regular Expressions:Modifiers • Normal modifiers are greedy, I.e. they want to match as much as they can. • Using a question mark (?) after a modifier makes it lazy, I.e. it will match as little as possible • REFindNoCase('a+', 'baaaa',1,1) • will return aaaa • REFindNoCase('a+?', 'baaaa',1,1) • will return a

Regular Expressions:Line Modifiers A line modifier changes how a Regular Expression is processed REFind (‘(?i)This’, ‘this’) = 1 (?i) means perform a case insensitive search REFindNoCase('(?x)is a', 'this is a isa') = 11 (?x) means perform a search ignoring spaces REFindNoCase('(?m)^line3', ‘line1 Line2 line3)') = 13 (?m) means pay attention to the lines

Regular Expressions:Returning Structures Rather than returning a number, a Regular Expression function can be set to return a structure The structure will contain 2 keys names pos and len Each will contain a matching array holding the start position of a match and its length The first item always contains the entire match and all others contains matches from sub expressions Use the mid() function to get easy access to the return data

Regular Expressions:Returning Structures The start location must be specified and the 4th attribute must be set to yes (1, true) String= ‘this is a finder’ Test=REFindNoCase(‘f[aeiou]n.’,string, 1, 1) Mid(string, test.pos[1], test.len[1])=“Find” Test=REFindNoCase(‘f([aeiou])n.’,string, 1, 1) Mid(string, test.pos[1], test.len[1])=“Find” Mid(string, test.pos[2], test.len[2])=“i”

Regular Expressions:Replacing REReplace(string, regex, replace, scope) Replaces the regex match in the string with the replace value Scope is one(default) or all Show lots of examples here on in 

Regular Expressions:Replacing New in MX is the ability to modify the replace values using special escape codes REReplaceNoCase(‘make upper’, ‘u.+’, ‘\u\1’) Upper REReplaceNoCase(‘make upper’, ‘u.+’, ‘\U\1’) UPPER

Regular Expressions