1 / 10

RE review (Perl syntax)

RE review (Perl syntax). single-character disjunction: [aeiou] ranges: [0-9] negation: [^aeiou] conjunction: /cat/ matching zero or one: /cats?/ Kleene * and +: /[ab]+/ matches ‘a’, ‘b’, ‘aa’, ‘ab’, ‘ba’, ‘bb’, etc wildcard: /c.t/ matches “cat”, “cbt”, “cct”, … anchors: ^, $, b, B

Download Presentation

RE review (Perl syntax)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RE review (Perl syntax) single-character disjunction: [aeiou] ranges: [0-9] negation: [^aeiou] conjunction: /cat/ matching zero or one: /cats?/ Kleene * and +: /[ab]+/ matches ‘a’, ‘b’, ‘aa’, ‘ab’, ‘ba’, ‘bb’, etc wildcard: /c.t/ matches “cat”, “cbt”, “cct”, … anchors: ^, $, \b, \B /projects/CSE467/Resources/Code/Perl CSE 467/567

  2. Conjunction Two regular expressions are conjoined by juxtaposition (placing the expressions side by side). Examples: /a/ matches ‘a’ /m/ matches ‘m’ /am/ matches ‘am’ but not ‘a’ or ‘m’ alone CSE 467/567

  3. Disjunction We have already seen disjunction of characters using the square bracket notation General disjunction is expressed using the vertical bar (|), also called the pipe symbol. This form of disjunction allows us to match any one of the alternative patterns, not just characters like the [ ] disjunction form. CSE 467/567

  4. Grouping • Parentheses, ‘(’ and ‘)’, are used to group subpatterns of a larger pattern. • Ex: /[Gg](ee)|(oo)se/ CSE 467/567

  5. Replacement In addition to matching, we can do replacements when a match is found: Example: To replace the British spelling of color with the American spelling, we can write: s/colour/color/ CSE 467/567

  6. Registers – saving matches • To save a match from part of a pattern, to reuse it later on, Perl provides registers • Registers are named \#, where # is the number of the register • Ex. DE DO DO DO DE DA DA DA IS ALL I WANT TO SAY TO YOU /(D[AEO].)*/ will match the first line /(D[AEO])(.D[AEO]) \2 \2\s \1 (.D[AEO]) \3 \3/ matches it more specifically This pattern also matches strings like DA DE DE DE DA DO DO DO \s matches a whitespace character CSE 467/567

  7. For more information • PERL Regular Expression TUTorial • http://perldoc.perl.org/perlretut.html • PERL Regular Expression reference page • http://perldoc.perl.org/perlre.html CSE 467/567

  8. Eliza • Published by Weizenbaum in 1966 • Modelled a Rogerian therapist • Had no intelligence – worked by pattern matching and replacement • Had some people convinced that it really understood! • demo at http://chayden.net/eliza/Eliza.shtml CSE 467/567

  9. Wordcount program • Unix wordcount program (wc) counts lines, words and characters • Determining counts & probabilities of words has many applications: • augmentative communiction • context-sensitive spelling error correction • speech recognition • hand-writing recognition CSE 467/567

  10. Counting words in a corpora (preview) #!/usr/bin/perl #FROM Perl BOOK, PAGE 39$/ = ""; # Enable paragraph mode.$* = 1; # ENABLE multi-line patterns.# Now read each paragraph and split into words. Record each# instance of a word in the %wordcount associative array.$total = 0;while (<>){ s/-\n//g; # Dehyphenate hyphenations (across lines) s/<s>//g; # Remove <s> tr/A-Z/a-z/; # Canonicalize to lowercase. @words = split(/\W*\s+\W*/, $_); foreach $word (@words) { $wordcount{$word}++; # Increment the entry. $total++; }}# Now print out all the entries in the %wordcount arrayforeach $word (sort keys(%wordcount)) { printf "(%8.6f\%) %20s occurs %3d time(s)\n", (100 * $wordcount{$word}/$total), $word, $wordcount{$word}; }printf "Total number of distinct words is %d.\n", $total; CSE 467/567

More Related