1 / 16

CIT 383: Administrative Scripting

Regular Expressions. CIT 383: Administrative Scripting. Topics. Creating Regexp objects Regular expression syntax Pattern matching Substitution. Regular Expressions. Used to match patterns against strings. UNIX commands: egrep, awk, sed Ruby provides an expanded regexp syntax.

benard
Download Presentation

CIT 383: Administrative Scripting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions CIT 383: Administrative Scripting CIT 383: Administrative Scripting

  2. CIT 383: Administrative Scripting Topics • Creating Regexp objects • Regular expression syntax • Pattern matching • Substitution

  3. CIT 383: Administrative Scripting Regular Expressions Used to match patterns against strings. • UNIX commands: egrep, awk, sed • Ruby provides an expanded regexp syntax. Applications of regular expressions • Find every login failure in a log file. • Find every address you received email from. • Find every IP address in a file.

  4. CIT 383: Administrative Scripting Creating a Regexp object Three methods re = Regexp.new('^\s*[a-z]') re = /^\s*[a-z]/ re = %r|^\s*[a-z]| Modifiers i: ignore case when matching text m: multiline match, allow . to match \n x: extended syntax with comments + whitespace o: perform #{} interpolations only once

  5. CIT 383: Administrative Scripting Pattern Syntax Characters match themselves except ., |, (, ), [, ], {, }, +, \, ^, $, *, ? Use \ to escape, i.e. \| will match a | The . metacharacter matches any character. Anchors require match to match at start or end ^ matches the beginning of a line $ matches the end of a line \A matches the beginning of a string \Z matches the end of a string

  6. CIT 383: Administrative Scripting Regexp Escape Sequences Similar to double quotes \t is tab \n is newline etc. Word boundaries /red/ matches “red”, “bred”, “reddened” /\bred\b/ matches only “red” \B matches nonword boundaries /\brub\B/ matches “ruby” but not “rub”

  7. CIT 383: Administrative Scripting Character Classes Set of characters between brackets [aeiou] will match any vowel [0123456789] will match any digit Special characters aren’t special inside []’s Additional syntax [A-Z] is a range including all capital letters [A-Za-z0-9] is a range of alphanumerics [^A-Z] is a range of anything but capital letters

  8. CIT 383: Administrative Scripting Abbreviations \d is [0-9] \D is [^0-9] \s is [ \t\r\n\f] \S is [^ \t\r\n\f] \w is [A-Za-z0-9_] \W is [^A-Za-z0-9_] POSIX Classes [:alnum:] is [A-Za-z0-9] [:alpha:] is [A-Za-z] [:digit:] is [0-9] [:xdigit:] is [0-9A-Fa-f] [:lower:] is [a-z] [:upper:] is [A-Z] [:space:] is [ \t\r\n\f] Special Character Classes

  9. CIT 383: Administrative Scripting Alternation Vertical bar matches pattern before or after it pattern1|pattern2 Precedence red|blue matches either “red” or “blue” red ball|blue sky matches “red ball” or “blue sky” but not “red blue sky” or “red ball sky” Use parentheses to group in an expression red (ball|blue) sky

  10. CIT 383: Administrative Scripting Repetition Repetition operators are greedy, matching as many occurrences as possible. re* matches zero or more occurrences of re re+ matches one or more occurrences of re re? matches zero or one occurrences of re re{n} matches exactly n occurrences of re re{n,} matches n or more occurrences of re re{n,m} matches at least n and at most m occurrences of re

  11. CIT 383: Administrative Scripting Additional features Backreferences Regular expressions remember matches in () /([Rr])uby&\1ails/ will match • Ruby & Rails • ruby & rails /(\w+) \1/ will match a repeated word Greedy and non-greedy matching <.*> is greedy, will match “<ruby>perl>” <.*?> is non-greedy, will match “<ruby>”

  12. CIT 383: Administrative Scripting Patching Matching Pattern-matching uses the =~ operator re = /[Rr]uby|[Pp]ython/ re =~ “Ruby is better than PHP.” After successful match, can retrieve details: data = Regexp.last_match data.string: the string that was compared data.to_s: the part of the string that matched data.pre_match: portion of string before match data.post_match: portion of string after match data[1]: what first set of () matched data[2]: what second set of () matched data.captures: what all sets of parentheses matched

  13. CIT 383: Administrative Scripting Pattern Matching Methods Slicing “ruby123”[/\d+/] # 123 “ruby123”[/([a-z]+)(\d+)/,1] # ruby “ruby123”[/([a-z]+)(\d+)/,2] # 123 r = “ruby123” r.slice(/\d+/) # 123 r.slice!(/\d+/) # 123, r = “ruby” Splitting s = “one, two, three” s.split # [“one,”, “two,”, “three”] s.split(‘, ‘) # [“one, “two”, “three”] s.split(/\s*,\s*/) # [“one”,”two”,”three”]

  14. CIT 383: Administrative Scripting Substitutions The String class provides RE substitutions sub(re, str): return string where the first substring matching re is replaced by str sub!(re, str): replace the first substring matching re with str gsub(re, str): return string where the all substrings matching re are replaced by str gsub!(re, str): replace all substrings matching re with str

  15. CIT 383: Administrative Scripting Substitution Examples Remove ruby-style quotes line.sub!(/#.*$/, “”) Remove all non-digits line.gsub!(/\D/, “”) Capitalize specified words line.gsub!(/\brails\b/, ‘Rails’) Change “John Smith” to “Smith, John” name.sub!(/(\w+)\s+(\w+)/, ‘\2, \1’) Flip UNIX slashes to Windows slashes path.gsub!(%r|/|, ‘\\’)

  16. CIT 383: Administrative Scripting References • Michael Fitzgerald, Learning Ruby, O’Reilly, 2008. • David Flanagan and Yukihiro Matsumoto, The Ruby Programming Language, O’Reilly, 2008. • Hal Fulton, The Ruby Way, 2nd edition, Addison-Wesley, 2007. • Robert C. Martin, Clean Code, Prentice Hall, 2008. • Dave Thomas with Chad Fowler and Andy Hunt, Programming Ruby, 2nd edition, Pragmatic Programmers, 2005.

More Related