160 likes | 283 Views
Regular Expressions. CIT 383: Administrative Scripting. Topics. Creating Regexp objects Regular expression syntax Pattern matching Substitution. Regular Expressions. Used to match patterns against strings. UNIX commands: egrep, awk, sed Ruby provides an expanded regexp syntax.
E N D
Regular Expressions CIT 383: Administrative Scripting CIT 383: Administrative Scripting
CIT 383: Administrative Scripting Topics • Creating Regexp objects • Regular expression syntax • Pattern matching • Substitution
CIT 383: Administrative Scripting Regular Expressions Used to match patterns against strings. • UNIX commands: egrep, awk, sed • Ruby provides an expanded regexp syntax. Applications of regular expressions • Find every login failure in a log file. • Find every address you received email from. • Find every IP address in a file.
CIT 383: Administrative Scripting Creating a Regexp object Three methods re = Regexp.new('^\s*[a-z]') re = /^\s*[a-z]/ re = %r|^\s*[a-z]| Modifiers i: ignore case when matching text m: multiline match, allow . to match \n x: extended syntax with comments + whitespace o: perform #{} interpolations only once
CIT 383: Administrative Scripting Pattern Syntax Characters match themselves except ., |, (, ), [, ], {, }, +, \, ^, $, *, ? Use \ to escape, i.e. \| will match a | The . metacharacter matches any character. Anchors require match to match at start or end ^ matches the beginning of a line $ matches the end of a line \A matches the beginning of a string \Z matches the end of a string
CIT 383: Administrative Scripting Regexp Escape Sequences Similar to double quotes \t is tab \n is newline etc. Word boundaries /red/ matches “red”, “bred”, “reddened” /\bred\b/ matches only “red” \B matches nonword boundaries /\brub\B/ matches “ruby” but not “rub”
CIT 383: Administrative Scripting Character Classes Set of characters between brackets [aeiou] will match any vowel [0123456789] will match any digit Special characters aren’t special inside []’s Additional syntax [A-Z] is a range including all capital letters [A-Za-z0-9] is a range of alphanumerics [^A-Z] is a range of anything but capital letters
CIT 383: Administrative Scripting Abbreviations \d is [0-9] \D is [^0-9] \s is [ \t\r\n\f] \S is [^ \t\r\n\f] \w is [A-Za-z0-9_] \W is [^A-Za-z0-9_] POSIX Classes [:alnum:] is [A-Za-z0-9] [:alpha:] is [A-Za-z] [:digit:] is [0-9] [:xdigit:] is [0-9A-Fa-f] [:lower:] is [a-z] [:upper:] is [A-Z] [:space:] is [ \t\r\n\f] Special Character Classes
CIT 383: Administrative Scripting Alternation Vertical bar matches pattern before or after it pattern1|pattern2 Precedence red|blue matches either “red” or “blue” red ball|blue sky matches “red ball” or “blue sky” but not “red blue sky” or “red ball sky” Use parentheses to group in an expression red (ball|blue) sky
CIT 383: Administrative Scripting Repetition Repetition operators are greedy, matching as many occurrences as possible. re* matches zero or more occurrences of re re+ matches one or more occurrences of re re? matches zero or one occurrences of re re{n} matches exactly n occurrences of re re{n,} matches n or more occurrences of re re{n,m} matches at least n and at most m occurrences of re
CIT 383: Administrative Scripting Additional features Backreferences Regular expressions remember matches in () /([Rr])uby&\1ails/ will match • Ruby & Rails • ruby & rails /(\w+) \1/ will match a repeated word Greedy and non-greedy matching <.*> is greedy, will match “<ruby>perl>” <.*?> is non-greedy, will match “<ruby>”
CIT 383: Administrative Scripting Patching Matching Pattern-matching uses the =~ operator re = /[Rr]uby|[Pp]ython/ re =~ “Ruby is better than PHP.” After successful match, can retrieve details: data = Regexp.last_match data.string: the string that was compared data.to_s: the part of the string that matched data.pre_match: portion of string before match data.post_match: portion of string after match data[1]: what first set of () matched data[2]: what second set of () matched data.captures: what all sets of parentheses matched
CIT 383: Administrative Scripting Pattern Matching Methods Slicing “ruby123”[/\d+/] # 123 “ruby123”[/([a-z]+)(\d+)/,1] # ruby “ruby123”[/([a-z]+)(\d+)/,2] # 123 r = “ruby123” r.slice(/\d+/) # 123 r.slice!(/\d+/) # 123, r = “ruby” Splitting s = “one, two, three” s.split # [“one,”, “two,”, “three”] s.split(‘, ‘) # [“one, “two”, “three”] s.split(/\s*,\s*/) # [“one”,”two”,”three”]
CIT 383: Administrative Scripting Substitutions The String class provides RE substitutions sub(re, str): return string where the first substring matching re is replaced by str sub!(re, str): replace the first substring matching re with str gsub(re, str): return string where the all substrings matching re are replaced by str gsub!(re, str): replace all substrings matching re with str
CIT 383: Administrative Scripting Substitution Examples Remove ruby-style quotes line.sub!(/#.*$/, “”) Remove all non-digits line.gsub!(/\D/, “”) Capitalize specified words line.gsub!(/\brails\b/, ‘Rails’) Change “John Smith” to “Smith, John” name.sub!(/(\w+)\s+(\w+)/, ‘\2, \1’) Flip UNIX slashes to Windows slashes path.gsub!(%r|/|, ‘\\’)
CIT 383: Administrative Scripting References • Michael Fitzgerald, Learning Ruby, O’Reilly, 2008. • David Flanagan and Yukihiro Matsumoto, The Ruby Programming Language, O’Reilly, 2008. • Hal Fulton, The Ruby Way, 2nd edition, Addison-Wesley, 2007. • Robert C. Martin, Clean Code, Prentice Hall, 2008. • Dave Thomas with Chad Fowler and Andy Hunt, Programming Ruby, 2nd edition, Pragmatic Programmers, 2005.