1 / 28

Basic Regular Expressions

Basic Regular Expressions. By Michael Dinowitz 04/10/2005. Definitions. String - Any collection of 0 or more characters. Example: “ This is a String ” SubString - A segment of a String Example: “ is a ” Case Sensitivity - detection if a character is upper or lower case. Simple Task.

Download Presentation

Basic Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Regular Expressions By Michael Dinowitz 04/10/2005

  2. Definitions • String - Any collection of 0 or more characters. • Example: • “This is a String” • SubString - A segment of a String • Example: • “is a” • Case Sensitivity - detection if a character is upper or lower case.

  3. Simple Task • Find the word “Name” inside a string: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#Find(‘Name’, String)# • </CFOUTPUT> • Position=0

  4. Simple Text • Find the word “Name” inside a string: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#Find(‘name’, String)# • </CFOUTPUT> • Position=4

  5. Simple Task • Find the word “Name” inside a string: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position= #FindNoCase(‘Name’, String)# • </CFOUTPUT> • Position=4

  6. Simple Task • Find the word “Name” inside a string using Regular Expressions: • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#REFindNoCase(‘Name’, String)# • </CFOUTPUT> • Position=4

  7. Intro to Regular Expressions • Refereed to as RegEx • Matches patterns of characters • Used in many languages (ColdFusion, Perl, JavaScript, etc.) • Uses a small syntax library to do ‘dynamic’ matches • Can be used for Search and/or Replace actions • Slightly slower than similar Find() and Replace() functions • Has both a case sensitive and a non-case sensitive version of each function operation • REFind() • REFindNoCase() • REReplace() • REReplaceNoCase

  8. RegEx Basics • Rule 1: A character matches itself as long as it is not a control character. • Example: • A=“A” • A=“a” (non-case sensitive) • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position= #REFindNoCase(‘n’, String)# • </CFOUTPUT> • Position=4

  9. RegEx Basics • Rule 1a: A search will return the first successful match. To get a different match, set the start position (third attribute of the function - optional) • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position1= #REFindNoCase(‘M’, String)# • Position2= #REFindNoCase(‘M’, String, 2)# • </CFOUTPUT> • Position1=1 • Position2=12

  10. RegEx Basics • Rule 2: A collection of non-control characters matches another collection of non-control characters. • AA=“AA” • AA!=“Aa” (case sensitive) • AA=“Aa” (non-case sensitive) • A A=“A A” (notice the space) • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position=#REFindNoCase(‘y n’, String)# • </CFOUTPUT> • Position=2

  11. RegEx Basics • Rule 3: A period (.) is a control character that matches ANY other character. • Example: • . = “A” • A. = “Ac” • A.A=“A A” • <CFSET String=“My name is Michael Dinowitz”> • <CFOUTPUT> • Position= #REFindNoCase(‘N.me’, String)# • </CFOUTPUT> • Position=4

  12. RegEx Basics • Rule 4: A control character can be ‘escaped’ by using a backslash (\) before it. This will cause the control character to match a text version of itself. • Example: • . = “.” • \. = “.” • A\.A = “A.A” • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘tz\.’, String)# • </CFOUTPUT> • Position=26

  13. RegEx Anchoring • Rule 5a: Using the caret (^) will make sure the text your searching for is at the start of the string. • Example: • ^A= “A” • ^M != “AM” • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘^My’, String)# • Position2=#REFindNoCase(‘^is’, String)# • </CFOUTPUT> • Position1=1 • Position2=0

  14. RegEx Anchoring • Rule 5b: Using the dollar sign ($) will make sure the text your searching for is at the end of the string. • Example: • A$ = “A” • M$ = “MAM” (second M will be returned) • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘\.$’, String)# • </CFOUTPUT> • Position1=28

  15. RegEx Ranges • Rule 6: When looking for one of a group of characters, place them inside square brackets ([]). • Example: • ‘[abc]’ will match either a, b, or c. • ‘[.+$^]’ will match either a period (.), a plus (+), a dollar sign ($) or a caret (^). Note that all special characters are escaped within square brackets. • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘M[aeiou]’, String)# • </CFOUTPUT> • Position1=6

  16. RegEx Ranges • Rule 7a: A caret (^), when used within square brackets ([]) is has the effect of saying ‘NOT these characters’. It must be the first character for this to work. • Example: • ‘[^abc]’ will match ANY character other than a, b, or c. • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘M[^aeiou]’, String)# • </CFOUTPUT> • Position1=1

  17. RegEx Ranges • Rule 7b: A dash (-), when used within square brackets ([]) has the effect of saying ‘all characters from the first character till the last’. • Example: • ‘[a-e]’ will match ANY character between a and e. • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘M[a-m]’, String)# • </CFOUTPUT> • Position1=6

  18. RegEx Ranges • Rule 8: ColdFusion has a series of pre-built character ranges. These are referenced as [[:range name:]]. • Example: • [[:digit:]] - same as 0-9 (all numbers) • [[:alpha:]] - same as A-Z and a-z (all letters of both case) • <CFSET String=“My name is Michael Dinowitz.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘[[:space:]]’, String)# • </CFOUTPUT> • Position1=3

  19. RegEx Character Classes

  20. RegEx Multipliers • Any character or character class can be assigned a multiplier that will define the use of the character or class. These multipliers can say that a character must exist, is optional, may exist for a certain minimum or maximum, etc. • Multiplier characters include: • Plus (+) One or more • Asterisk (*) 0 or more • Question Mark (?) may or may not exist once • Curly Brackets({}) A specific range of occurances

  21. RegEx Multipliers • The Plus (+) multiplier specifies that the character or character group must exist but can exist more than once. • Example: • A+ - A followed by any number of additional A’s • [[:digit:]]+ - A number (0-9) followed by any amount of additional numbers • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘is+i’, String)# • </CFOUTPUT> • Position1=2

  22. RegEx Multipliers • The Asterisk (*) multiplier specifies that the character or character group may or may not exist, and can exist more than once. (I.e. 0 or more) • Example: • A* - Either no A or an A followed by any number of additional A’s • [[:digit:]]* - Either no number (0-9) or a number followed by any amount of additional numbers • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘si*s’, String)# • </CFOUTPUT> • Position1=3

  23. RegEx Multipliers • The Question mark (?) multiplier specifies that the character or character group may or may not exist, but only once. • Example: • A? - Either A or no As • [[:digit:]]+ - One or no numbers (0-9) • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘p?i’, String)# • </CFOUTPUT> • Position1=2

  24. RegEx Multipliers • Curly brackets ({}) can be used to specify a minimum and maximum range for a character to appear. The format is {min, max} • Example: • A{2,4} - 2 As or more but no more than 4. • [[:digit:]]{1,6} - 1 number (0-9) or more, but no more than 6. • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘s{2,3}’, String)# • </CFOUTPUT> • Position1=3

  25. RegEx SubExpressions • SubExpressions are a way of grouping characters together. This allows us to reference the entire group at once. To group characters together, place them within parenthesis (). • Example: • (Name) = name • (Name)+ = name, namename or basically one or more names. • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘(iss)+’, String)# • </CFOUTPUT> • Position1=2

  26. RegEx SubExpressions • An additional special character that is usable within a subExpression is the pipe (|). This means either the first group of text or the second (or more). • Example: • (Na|me) = na or me • (Name|Date) = Name or date • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘(hard|word)’, String)# • </CFOUTPUT> • Position1=18

  27. RegEx SubExpressions • SubExpressions allow us to do something else that’s special; back referencing. This is the ability to reference one or more groups directly. This is done by using the backslash (\) followed by a number that specifies which subexpression we want. • Example: • (name)\1 = namename • (Name|Date)\1 = namename or datedate • <CFSET String=“Mississippi is is a hard word.”> • <CFOUTPUT> • Position1=#REFindNoCase(‘(is )\1’, String)# • </CFOUTPUT> • Position1=13

  28. REReplace • The REReplace() and REReplaceNoCase() functions use everything you’ve learned about searching and allows you to ‘work’ with the search results, I.e. replace them with something. • Example: • <CFSET String=“Mississippi is a hard word.”> • <CFOUTPUT> • Position1=#REReplaceNoCase(String, ‘iss’, ‘emm’)# • Position2=#REReplaceNoCase(String, ‘iss’, ‘emm’, ‘all’)# • </CFOUTPUT> • Position1=Memmissippi is a hard word • Position2=Memmemmippi is a hard word

More Related