230 likes | 387 Views
Using regular expressions. Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching. Forming RegExps. Strings Variables Patterns. Strings and Variables. /Joey Ramone/ - match a specific string.
E N D
Using regular expressions • Search for a single occurrence of a specific string. • Search for all occurrences of a string. • Approximate string matching.
Forming RegExps • Strings • Variables • Patterns
Strings and Variables • /Joey Ramone/ - match a specific string. • /$name/, where $name = “Joey Ramone” - match the string stored in a variable. • /Joey $name/ - matching a pattern defined by a mixture of strings and variables.
Character classes • abc – match “abc” • . – match any single character (i.e. a.b). • [abc] – match “a” or “b” or “c” • [0123456789] – match “0” or “1” or …or “9” • [0-9] – same as previous • [a-z] – match “a” or “b” or …or “z” • [A-Z] – same as previous only with caps • [] – match any single occurrence of any of the characters found within. • [0-9a-zA-Z-] – match any alphanumeric or the minus sign
Negated character classes • [^0-9] – match any single character that is not a numeric digit • [^aeiouAEIOU] – match any single character that is not a vowel • Works only for single characters • We’ll discuss matching negated strings of characters later.
Escape characters • \ - use the backslash to match any special character as the character itself. • /\$name/ - match the literal string “$name”. • /a\.b/ - match the literal string “a.b” rather than “a” followed by any character, followed by “b”.
Convenience character classes • \d (a digit) - [0-9] • \D (digits, not!) - [^0-9] • \w (word char) - [a-zA-Z0-9_] • \W (words, not!) - [^a-zA-Z0-9_] • \s (space char) - [ \r\t\n\f] • \S (space, not!) - [^ \r\t\n\f]
Sequences • + - one or more of preceding pattern • /[a-zA-Z]+/ (match a string of alpha characters such as a name). • ? (match zero or one instance of preceding character). • /[a-zA-Z]+-?[a-zA-Z]+ (Now we can match hyphenated names).
Sequences • * (match zero or more of preceding pattern) • Example – list of names: • George Harrison • Paul McCartney • Richard “Ringo” Starkey • John Winston Lennon • /[a-zA-Z]+ [a-zA-Z]+/ (match first and last name) • /[a-zA-Z]+ [a-zA-Z\”]* [a-zA-Z]+/ (match first name, middle name, if it exists, and last name)
Sequences • {k} – match k instances of preceding pattern. • Example: floating point numbers to 2 decimal places • /[0-9]+\.[0-9]{2} • {k,j} – match at least k instances of preceding pattern, but no more than j. • Example: floating point numbers that may or may not have a decimal component. • /[0-9]+\.?[0-9]{0,2}/
Grouping • /(John|Paul|George|Ringo)/ – matches any one of either “John”, “Paul”, “George”, or “Ringo” • /((John|Paul|George|Ringo) )+/ • Matches the Beatles names listed in any order. • John Paul George Ringo • Paul George John Ringo • Ringo Paul George John • Actually, this will also match: • Paul Paul Paul Paul Paul Paul Paul Paul Paul • Be careful about what assumptions you make.
Problem • Write a regular expression that will match social security number. • Format: 555-55-5555
A solution • /[0-9]{3}-[0-9]{2}-[0-9]{4}/
Problem • Write a regular expression that will match a phone number. • Formats • 319-337-3663 • 319.337.3663
A solution • /[0-9]{3}[\.-][0-9]{3}[\.-][0-9]{4}
Add another format • 3193373663
A solution • /[0-9]{3}[\.-]?[0-9]{3}[\.-]?[0-9]{4}/
Problem • Write a regular expression that will match an email address. • Legal characters for names are: • Letters, numbers, “-”, and “_” • Legal characters for domain names are: • Letters only • Assume form: username@machine.domain.suffix
A solution • /[a-z0-9-_]+\@[a-z]+(\.[a-z]+){2}/ • More general version: /[a-z0-9-_]+\@[a-z]+(\.[a-z]+)+/
Problem • Write a regular expression that will match an HTML anchor start tag. • Assume anchor tag is of the form: • <a href=“some url”>some anchor text</a>
A solution • /<a href=“[^”]+”>/ • Actually, quotes are not required • So it should be: • /<a href=“?[^”>]+”?>/ • How would we assign the url to a variable?
A solution • ($url) = ($htmlText =~ m/<a href=“?[^”>]”?>/);
Take Away • There is almost always a pattern that will match what you want it to match. • The best way to learn is to simply jump in and start writing your own patterns. • If you have a question about how to construct one, feel free to ask me. • One typically learns Perl by asking people with more experience.