170 likes | 347 Views
Regular Expressions. Dr. Ralph D. Westfall May, 2011. RegEx Object. what does it do? finds patterns in text specific strings (all characters identified) strings with "wild cards" (any character) strings with certain characters, not others how could it be used in VB.NET?
E N D
Regular Expressions Dr. Ralph D. Westfall May, 2011
RegEx Object • what does it do? • finds patterns in text • specific strings (all characters identified) • strings with "wild cards" (any character) • strings with certain characters, not others • how could it be used in VB.NET? • validating inputs on an input form e.g., • email addresses, telephone #s, etc. • searching the content of a large database
Regular Expressions • based on the idea of wild cards • .e.g., dir s*.doc in DOS/Windows finds any .doc file starting with s, followed by any other character(s) • find much more complicated patterns • e.g., any of a specified list of characters rather than just one or all possibilities • e.g., any character except ones in a list
Regular Expression Object • put import statement at top of code, just below Option Strict on Imports System.Text.RegularExpressions • create a regular expression object • Dim reg as Regex • reg = New Regex("[pattern]") • need to create a Regex with a pattern (can't add one later)
VB.NET RegEx Special Characters • . ba. finds bat, bal, bac • period matches any single character • ? bottles? finds bottle and bottles • 0 or 1 of what is before the ? • [0-9] finds 0, 1, … or 9 • any 1 character in specified range • + to+ finds to and too • 1 or more of what is before + • {#} b{2} finds any bb • # = number of preceding matches
RegEx Special Characters - 2 • | OR operator e.g., a|b finds a or b • \ escape character is used to • revert special characters to literal values e.g., \. (back slash followed by period becomes an actual period • change regular characters to operators • \b finds a word boundary (1st or last character) or replaces a backspace
Combining Operators • can group character sequences • [a-z]{3} finds 3-letter lower case words • [0-9]{4}( |-)? finds any 4 #s • followed by space or dash or neither • (could use in credit card validation)
Regular Expression Methods • reg.IsMatch([string to search]) • returns true if pattern is found in string • reg.Match([string to search]) • returns a match object each time pattern is found (use ToString to get object value) • reg.Replace([string1, string2]) • replaces every occurrence of string1 with string2‘Notes
Regular Expression Match Object • can tell if a match was found • can also return the matching string • usually don't need the whole string, just want the matching part mach = reg.Match([string to search]) 'returns match object wherever pattern is found If mach.Success Then strZip = mach.ToString
Using Regular Expressions Dim strCard as String = TextBox1.Text Dim mach as Match Dim reg As New Regex("[0-9]{4}( |-)?" & _ "[0-9]{4}( |-)?[0-9]{4}( |-)?[0-9]{4}")'shorten? mach = reg.Match(strCard ) If mach.Success Then MsgBox("Card is OK") strCard = mach.ToString.Replace("-"c, " "c) TextBox1.Text = strCard ' finds pattern even inside other characters ' e.g., XYZ 1234 5678 9012 3456 ABC ' .Replace("-", " ") dashes spaces)
Warning • a regular expression may say a String is OK even if there are other characters around it e.g., so need to extract match from original String (other processing?) • Dim strCard as String • Dim ok as Boolean • strCard = "XYZ1234-4323-9876-6543XYZ" • ok = reg.IsMatch(strCard) 'credit card • If ok Then 'pattern • MsgBox(strCard & " is OK") • End If
Extracting Matches from Strings • can use .Match() function to separate matching part from other characters around it e.g., • Dim strCard as String • strCard = "XYZ1234-4323-9876-6543XYZ" • If reg.IsMatch(strCard) Then • MsgBox(reg.Match(strCard).ToString & "is OK") • End If
More Regular Expressions Info • Regular Expressions in .Net • JavaScript Regular Expressions Tester • A Better .NET Regular Expression Tester • Tee-Shirts, etc. (language alert)
RegEx Exercises • explain the following e-mail address pattern: [a-z]+@[a-z]+\.com • extend it to handle following endings: • com, edu, org, net, gov, mil, int (must appear at least once) • modify it to allow numbers, dashes [-] and periods [.] after the 1st character
RegEx Exercises - 2 • create patterns to validate • zip codes both as 5-digit and Zip + 4 • e.g., 90702 or 90702-7934 • phone #s (intnl., long distance, and local) • Social Security #s 123-45-6789 (or spaces) • Cal Poly student ID numbers • names (including middle initial, von, de, etc.) • course #s (CIS, EBZ, CS; 1xx-4xx, etc.)
RegEx Exercises - 3 • test text or files samples against regular expressions at Regular Expression Library's tester page and report back, using regular expressions • from previous pages in this PowerPoint • others that you make up • search for examples from Regular Expression Library
RegEx Exercises - 4 • use Regular Expression Library's tester page to Load a Data Source from a URL and find data in the web page using a regular expression that you specify • and/or find free (trial) software that will do the same thing