280 likes | 359 Views
Basic Regular Expressions. By Michael Dinowitz 04/10/2005. Definitions. String - Any collection of 0 or more characters. Example: “ This is a String ” SubString - A segment of a String Example: “ is a ” Case Sensitivity - detection if a character is upper or lower case. Simple Task.
E N D
Basic Regular Expressions By Michael Dinowitz 04/10/2005
Definitions • String - Any collection of 0 or more characters. • Example: • “This is a String” • SubString - A segment of a String • Example: • “is a” • Case Sensitivity - detection if a character is upper or lower case.
Simple Task • Find the word “Name” inside a string: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#Find(‘Name’, String)# • </CFOUTPUT> • Position=0
Simple Text • Find the word “Name” inside a string: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#Find(‘name’, String)# • </CFOUTPUT> • Position=4
Simple Task • Find the word “Name” inside a string: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position= #FindNoCase(‘Name’, String)# • </CFOUTPUT> • Position=4
Simple Task • Find the word “Name” inside a string using Regular Expressions: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#REFindNoCase(‘Name’, String)# • </CFOUTPUT> • Position=4
Intro to Regular Expressions • Refereed to as RegEx • Matches patterns of characters • Used in many languages (ColdFusion, Perl, JavaScript, etc.) • Uses a small syntax library to do ‘dynamic’ matches • Can be used for Search and/or Replace actions • Slightly slower than similar Find() and Replace() functions • Has both a case sensitive and a non-case sensitive version of each function operation • REFind() • REFindNoCase() • REReplace() • REReplaceNoCase
RegEx Basics • Rule 1: A character matches itself as long as it is not a control character. • Example: • A=“A” • A=“a” (non-case sensitive) • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position= #REFindNoCase(‘n’, String)# • </CFOUTPUT> • Position=4
RegEx Basics • Rule 1a: A search will return the first successful match. To get a different match, set the start position (third attribute of the function - optional) • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position1= #REFindNoCase(‘M’, String)# • Position2= #REFindNoCase(‘M’, String, 2)# • </CFOUTPUT> • Position1=1 • Position2=12
RegEx Basics • Rule 2: A collection of non-control characters matches another collection of non-control characters. • AA=“AA” • AA!=“Aa” (case sensitive) • AA=“Aa” (non-case sensitive) • A A=“A A” (notice the space) • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#REFindNoCase(‘y n’, String)# • </CFOUTPUT> • Position=2
RegEx Basics • Rule 3: A period (.) is a control character that matches ANY other character. • Example: • . = “A” • A. = “Ac” • A.A=“A A” • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position= #REFindNoCase(‘N.me’, String)# • </CFOUTPUT> • Position=4
RegEx Basics • Rule 4: A control character can be ‘escaped’ by using a backslash (\) before it. This will cause the control character to match a text version of itself. • Example: • . = “.” • \. = “.” • A\.A = “A.A” • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘tz\.’, String)# • </CFOUTPUT> • Position=26
RegEx Anchoring • Rule 5a: Using the caret (^) will make sure the text your searching for is at the start of the string. • Example: • ^A= “A” • ^M != “AM” • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘^My’, String)# • Position2=#REFindNoCase(‘^is’, String)# • </CFOUTPUT> • Position1=1 • Position2=0
RegEx Anchoring • Rule 5b: Using the dollar sign ($) will make sure the text your searching for is at the end of the string. • Example: • A$ = “A” • M$ = “MAM” (second M will be returned) • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘\.$’, String)# • </CFOUTPUT> • Position1=28
RegEx Ranges • Rule 6: When looking for one of a group of characters, place them inside square brackets ([]). • Example: • ‘[abc]’ will match either a, b, or c. • ‘[.+$^]’ will match either a period (.), a plus (+), a dollar sign ($) or a caret (^). Note that all special characters are escaped within square brackets. • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘M[aeiou]’, String)# • </CFOUTPUT> • Position1=6
RegEx Ranges • Rule 7a: A caret (^), when used within square brackets ([]) is has the effect of saying ‘NOT these characters’. It must be the first character for this to work. • Example: • ‘[^abc]’ will match ANY character other than a, b, or c. • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘M[^aeiou]’, String)# • </CFOUTPUT> • Position1=1
RegEx Ranges • Rule 7b: A dash (-), when used within square brackets ([]) has the effect of saying ‘all characters from the first character till the last’. • Example: • ‘[a-e]’ will match ANY character between a and e. • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘M[a-m]’, String)# • </CFOUTPUT> • Position1=6
RegEx Ranges • Rule 8: ColdFusion has a series of pre-built character ranges. These are referenced as [[:range name:]]. • Example: • [[:digit:]] - same as 0-9 (all numbers) • [[:alpha:]] - same as A-Z and a-z (all letters of both case) • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘[[:space:]]’, String)# • </CFOUTPUT> • Position1=3
RegEx Multipliers • Any character or character class can be assigned a multiplier that will define the use of the character or class. These multipliers can say that a character must exist, is optional, may exist for a certain minimum or maximum, etc. • Multiplier characters include: • Plus (+) One or more • Asterisk (*) 0 or more • Question Mark (?) may or may not exist once • Curly Brackets({}) A specific range of occurances
RegEx Multipliers • The Plus (+) multiplier specifies that the character or character group must exist but can exist more than once. • Example: • A+ - A followed by any number of additional A’s • [[:digit:]]+ - A number (0-9) followed by any amount of additional numbers • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘is+i’, String)# • </CFOUTPUT> • Position1=2
RegEx Multipliers • The Asterisk (*) multiplier specifies that the character or character group may or may not exist, and can exist more than once. (I.e. 0 or more) • Example: • A* - Either no A or an A followed by any number of additional A’s • [[:digit:]]* - Either no number (0-9) or a number followed by any amount of additional numbers • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘si*s’, String)# • </CFOUTPUT> • Position1=3
RegEx Multipliers • The Question mark (?) multiplier specifies that the character or character group may or may not exist, but only once. • Example: • A? - Either A or no As • [[:digit:]]+ - One or no numbers (0-9) • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘p?i’, String)# • </CFOUTPUT> • Position1=2
RegEx Multipliers • Curly brackets ({}) can be used to specify a minimum and maximum range for a character to appear. The format is {min, max} • Example: • A{2,4} - 2 As or more but no more than 4. • [[:digit:]]{1,6} - 1 number (0-9) or more, but no more than 6. • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘s{2,3}’, String)# • </CFOUTPUT> • Position1=3
RegEx SubExpressions • SubExpressions are a way of grouping characters together. This allows us to reference the entire group at once. To group characters together, place them within parenthesis (). • Example: • (Name) = name • (Name)+ = name, namename or basically one or more names. • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘(iss)+’, String)# • </CFOUTPUT> • Position1=2
RegEx SubExpressions • An additional special character that is usable within a subExpression is the pipe (|). This means either the first group of text or the second (or more). • Example: • (Na|me) = na or me • (Name|Date) = Name or date • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘(hard|word)’, String)# • </CFOUTPUT> • Position1=18
RegEx SubExpressions • SubExpressions allow us to do something else that’s special; back referencing. This is the ability to reference one or more groups directly. This is done by using the backslash (\) followed by a number that specifies which subexpression we want. • Example: • (name)\1 = namename • (Name|Date)\1 = namename or datedate • <CFSET String=“Mississippi is is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘(is )\1’, String)# • </CFOUTPUT> • Position1=13
REReplace • The REReplace() and REReplaceNoCase() functions use everything you’ve learned about searching and allows you to ‘work’ with the search results, I.e. replace them with something. • Example: • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REReplaceNoCase(String, ‘iss’, ‘emm’)# • Position2=#REReplaceNoCase(String, ‘iss’, ‘emm’, ‘all’)# • </CFOUTPUT> • Position1=Memmissippi is a hard word • Position2=Memmemmippi is a hard word