160 likes | 236 Views
Advanced Regular Expressions. Or What’s special about RegEx in MX. Your Presenter. Michael Dinowitz Head of House of Fusion Publisher of Fusion Authority Founding member of Team Macromedia Doing this since June 95 Called on for the black magic code. Disclaimer & Introduction.
E N D
Advanced Regular Expressions Or What’s special about RegEx in MX
Your Presenter • Michael Dinowitz • Head of House of Fusion • Publisher of Fusion Authority • Founding member of Team Macromedia • Doing this since June 95 • Called on for the black magic code
Disclaimer & Introduction • If you don’t know the basics – get out • No real changes from CF 5 or CFMX 6
Basic additions • Greedy vs. Lazy • + is one or more and as many as it can • +? Is one or more but only as many as it needs • ++ Same as greedy but does not allow back references (not in CFMX) • Nested sub expressions • In order of execution from outside it • Then left to right
Character Vs. Posix classes • Non-special characters become special • Uses a backslash (\) to specify being special • Shorter than posix classes • Harder to ‘read’ for newbies
Basic Character Classes • \b – word boundary • Any jump from alphanumeric to non-alphanumeric • refindnocase('\bbig\b', 'big') • \B – any 2 of the same ‘types’ of characters • refindnocase('\B', 'big') = 2
More Character Classes • \A - same as ^ (not combined with (?m) • \Z – same as $ (not combined with (?m) • \n – newline • \r – carriage return • \t – tab • \d – any digit ([0-9]) • \D – any non digit ([^0-9])
More Character Classes • \w - Any alphanumeric character ([[:alnum:]] ) • \W - Any non-alphanumeric character ([^[:alnum:]] ) • \s - Any whitespace character including tab, space, newline, carriage return, and form feed ([\t\n\r\f ]) • \S – any non-whitespace character ([^ \t\n\r\f])
Expression Modifiers • At beginning of expression • (?i) Causes expression to be case insensitive (same as NoCase version) • (?m) Multi-line mode • ^ and $ matches line, not entire string • Carriage return Chr(13) is ignored as new line
Expression Modifiers • (?x) ignores all white space • Also allows usage of ## for comments • ## will comment to end of line • reFind("(?x) one ##first option • |two ##second option • |three\ point\ five ## note escaped spaces • ", "three point five")
Group Modifiers • Affects only the group its in • Must be at beginning of group • (?##) comment • Must escape # • (?:) does not add group to return collection • (?=) Positive look ahead • (?!) negative look ahead
Positive Lookahead • Tests if the text in the parenthesis exists • Does not save the text into return collection • Does not ‘consume text’ • <a(?=.+href).+?href="([^"]+).+?>
Negative Lookahead • Tests if the text in the parenthesis does not exist • Does not save the text into return collection • Does not ‘consume text’ • (<a(?!.+?target) [^>]+>)
Replace conversion • Used in REReplace()/REReplaceNoCase() • Either converts the ‘next’ character or a specific section of characters • \u – converts next character to uppercase • \l – converts the next character to lowercase • \U…\E – converts block to uppercase • \L…\E – converts block to lowercase
Not Supported • Positive Lookbehinds • Negative Lookbehinds • Other features • All accessible through the Java RegEx engine • Massimo has a CFC pre-built to do this
Resources • Chapters in most CFMX books • CF-RegEx mailing list • This presentation • Books: • Mastering Regular Expressions, 2nd Edition • Teach Yourself Regular Expressions in 10 Minutes • Java Regular Expressions Taming the java Dot util Dot regex Engine