230 likes | 236 Views
Learn to use regular expressions in various scenarios such as single and multiple string occurrences, character classes, sequences matching, and variable patterns. Explore the convenience of character classes, negated classes, escape characters, and grouping techniques.
E N D
Using regular expressions • Search for a single occurrence of a specific string. • Search for all occurrences of a string. • Approximate string matching.
Forming RegExps • Strings • Variables • Patterns
Strings and Variables • /Joey Ramone/ - match a specific string. • /$name/, where $name = “Joey Ramone” - match the string stored in a variable. • /Joey $name/ - matching a pattern defined by a mixture of strings and variables.
Character classes • abc – match “abc” • . – match any single character (i.e. a.b). • [abc] – match “a” or “b” or “c” • [0123456789] – match “0” or “1” or …or “9” • [0-9] – same as previous • [a-z] – match “a” or “b” or …or “z” • [A-Z] – same as previous only with caps • [] – match any single occurrence of any of the characters found within. • [0-9a-zA-Z-] – match any alphanumeric or the minus sign
Negated character classes • [^0-9] – match any single character that is not a numeric digit • [^aeiouAEIOU] – match any single character that is not a vowel • Works only for single characters • We’ll discuss matching negated strings of characters later.
Escape characters • \ - use the backslash to match any special character as the character itself. • /\$name/ - match the literal string “$name”. • /a\.b/ - match the literal string “a.b” rather than “a” followed by any character, followed by “b”.
Convenience character classes • \d (a digit) - [0-9] • \D (digits, not!) - [^0-9] • \w (word char) - [a-zA-Z0-9_] • \W (words, not!) - [^a-zA-Z0-9_] • \s (space char) - [ \r\t\n\f] • \S (space, not!) - [^ \r\t\n\f]
Sequences • + - one or more of preceding pattern • /[a-zA-Z]+/ (match a string of alpha characters such as a name). • ? (match zero or one instance of preceding character). • /[a-zA-Z]+-?[a-zA-Z]+ (Now we can match hyphenated names).
Sequences • * (match zero or more of preceding pattern) • Example – list of names: • George Harrison • Paul McCartney • Richard “Ringo” Starkey • John Winston Lennon • /[a-zA-Z]+ [a-zA-Z]+/ (match first and last name) • /[a-zA-Z]+ [a-zA-Z\”]* [a-zA-Z]+/ (match first name, middle name, if it exists, and last name)
Sequences • {k} – match k instances of preceding pattern. • Example: floating point numbers to 2 decimal places • /[0-9]+\.[0-9]{2} • {k,j} – match at least k instances of preceding pattern, but no more than j. • Example: floating point numbers that may or may not have a decimal component. • /[0-9]+\.?[0-9]{0,2}/
Grouping • /(John|Paul|George|Ringo)/ – matches any one of either “John”, “Paul”, “George”, or “Ringo” • /((John|Paul|George|Ringo) )+/ • Matches the Beatles names listed in any order. • John Paul George Ringo • Paul George John Ringo • Ringo Paul George John • Actually, this will also match: • Paul Paul Paul Paul Paul Paul Paul Paul Paul • Be careful about what assumptions you make.
Problem • Write a regular expression that will match social security number. • Format: 555-55-5555
A solution • /[0-9]{3}-[0-9]{2}-[0-9]{4}/
Problem • Write a regular expression that will match a phone number. • Formats • 319-337-3663 • 319.337.3663
A solution • /[0-9]{3}[\.-][0-9]{3}[\.-][0-9]{4}
Add another format • 3193373663
A solution • /[0-9]{3}[\.-]?[0-9]{3}[\.-]?[0-9]{4}/
Problem • Write a regular expression that will match an email address. • Legal characters for names are: • Letters, numbers, “-”, and “_” • Legal characters for domain names are: • Letters only • Assume form: username@machine.domain.suffix
A solution • /[a-z0-9-_]+\@[a-z]+(\.[a-z]+){2}/ • More general version: /[a-z0-9-_]+\@[a-z]+(\.[a-z]+)+/
Problem • Write a regular expression that will match an HTML anchor start tag. • Assume anchor tag is of the form: • <a href=“some url”>some anchor text</a>
A solution • /<a href=“[^”]+”>/ • Actually, quotes are not required • So it should be: • /<a href=“?[^”>]+”?>/ • How would we assign the url to a variable?
A solution • ($url) = ($htmlText =~ m/<a href=“?[^”>]”?>/);
Take Away • There is almost always a pattern that will match what you want it to match. • The best way to learn is to simply jump in and start writing your own patterns. • If you have a question about how to construct one, feel free to ask me. • One typically learns Perl by asking people with more experience.