1 / 23

Using regular expressions

Using regular expressions. Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching. Forming RegExps. Strings Variables Patterns. Strings and Variables. /Joey Ramone/ - match a specific string.

shay
Download Presentation

Using regular expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using regular expressions • Search for a single occurrence of a specific string. • Search for all occurrences of a string. • Approximate string matching.

  2. Forming RegExps • Strings • Variables • Patterns

  3. Strings and Variables • /Joey Ramone/ - match a specific string. • /$name/, where $name = “Joey Ramone” - match the string stored in a variable. • /Joey $name/ - matching a pattern defined by a mixture of strings and variables.

  4. Character classes • abc – match “abc” • . – match any single character (i.e. a.b). • [abc] – match “a” or “b” or “c” • [0123456789] – match “0” or “1” or …or “9” • [0-9] – same as previous • [a-z] – match “a” or “b” or …or “z” • [A-Z] – same as previous only with caps • [] – match any single occurrence of any of the characters found within. • [0-9a-zA-Z-] – match any alphanumeric or the minus sign

  5. Negated character classes • [^0-9] – match any single character that is not a numeric digit • [^aeiouAEIOU] – match any single character that is not a vowel • Works only for single characters • We’ll discuss matching negated strings of characters later.

  6. Escape characters • \ - use the backslash to match any special character as the character itself. • /\$name/ - match the literal string “$name”. • /a\.b/ - match the literal string “a.b” rather than “a” followed by any character, followed by “b”.

  7. Convenience character classes • \d (a digit) - [0-9] • \D (digits, not!) - [^0-9] • \w (word char) - [a-zA-Z0-9_] • \W (words, not!) - [^a-zA-Z0-9_] • \s (space char) - [ \r\t\n\f] • \S (space, not!) - [^ \r\t\n\f]

  8. Sequences • + - one or more of preceding pattern • /[a-zA-Z]+/ (match a string of alpha characters such as a name). • ? (match zero or one instance of preceding character). • /[a-zA-Z]+-?[a-zA-Z]+ (Now we can match hyphenated names).

  9. Sequences • * (match zero or more of preceding pattern) • Example – list of names: • George Harrison • Paul McCartney • Richard “Ringo” Starkey • John Winston Lennon • /[a-zA-Z]+ [a-zA-Z]+/ (match first and last name) • /[a-zA-Z]+ [a-zA-Z\”]* [a-zA-Z]+/ (match first name, middle name, if it exists, and last name)

  10. Sequences • {k} – match k instances of preceding pattern. • Example: floating point numbers to 2 decimal places • /[0-9]+\.[0-9]{2} • {k,j} – match at least k instances of preceding pattern, but no more than j. • Example: floating point numbers that may or may not have a decimal component. • /[0-9]+\.?[0-9]{0,2}/

  11. Grouping • /(John|Paul|George|Ringo)/ – matches any one of either “John”, “Paul”, “George”, or “Ringo” • /((John|Paul|George|Ringo) )+/ • Matches the Beatles names listed in any order. • John Paul George Ringo • Paul George John Ringo • Ringo Paul George John • Actually, this will also match: • Paul Paul Paul Paul Paul Paul Paul Paul Paul • Be careful about what assumptions you make.

  12. Problem • Write a regular expression that will match social security number. • Format: 555-55-5555

  13. A solution • /[0-9]{3}-[0-9]{2}-[0-9]{4}/

  14. Problem • Write a regular expression that will match a phone number. • Formats • 319-337-3663 • 319.337.3663

  15. A solution • /[0-9]{3}[\.-][0-9]{3}[\.-][0-9]{4}

  16. Add another format • 3193373663

  17. A solution • /[0-9]{3}[\.-]?[0-9]{3}[\.-]?[0-9]{4}/

  18. Problem • Write a regular expression that will match an email address. • Legal characters for names are: • Letters, numbers, “-”, and “_” • Legal characters for domain names are: • Letters only • Assume form: username@machine.domain.suffix

  19. A solution • /[a-z0-9-_]+\@[a-z]+(\.[a-z]+){2}/ • More general version: /[a-z0-9-_]+\@[a-z]+(\.[a-z]+)+/

  20. Problem • Write a regular expression that will match an HTML anchor start tag. • Assume anchor tag is of the form: • <a href=“some url”>some anchor text</a>

  21. A solution • /<a href=“[^”]+”>/ • Actually, quotes are not required • So it should be: • /<a href=“?[^”>]+”?>/ • How would we assign the url to a variable?

  22. A solution • ($url) = ($htmlText =~ m/<a href=“?[^”>]”?>/);

  23. Take Away • There is almost always a pattern that will match what you want it to match. • The best way to learn is to simply jump in and start writing your own patterns. • If you have a question about how to construct one, feel free to ask me. • One typically learns Perl by asking people with more experience.

More Related