1 / 20

Regular Expressions: grep

Regular Expressions: grep. LING 5200 Computational Corpus Linguistics Martha Palmer. Homework 2. Bytes Read path names ~ not necessary in home directory Display results of commands if they’re just a few lines. Switches. -c list a count of matching lines only (like adding | wc)

risa
Download Presentation

Regular Expressions: grep

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions: grep LING 5200 Computational Corpus Linguistics Martha Palmer

  2. Homework 2 • Bytes • Read path names • ~ not necessary in home directory • Display results of commands if they’re just a few lines. BASED on Kevin Cohen’s LING 5200

  3. Switches • -c list a count of matching lines only • (like adding | wc) • -i ignore the case of the letters in the pattern • -n include the line numbers • -v show lines that do NOT match the pattern grep -i lemma README.english grep -ic lemma README.english grep -in lemma README.english BASED on Kevin Cohen’s LING 5200

  4. The Chomsky Grammar Hierarchy • Regular grammars, aabbbb S → aS | nil | bS • Context free grammars, aaabbb S → aSb | nil • Context sensitive grammars, aaabbbccc xSy → xby • Transformational grammars - Turing Machines BASED on Kevin Cohen’s LING 5200

  5. Movement What did John give to Mary? *Where did John give to Mary? John gave cookies to Mary. John gave <what> to Mary. BASED on Kevin Cohen’s LING 5200

  6. Nested Dependencies and Crossing Dependencies John, Mary and Bill ate peaches, pears and apples, respectively CF The dog chased the cat that bit the mouse that ran. CF The mouse the cat the dog chased bit ran. CS BASED on Kevin Cohen’s LING 5200

  7. Most parsers are Turing Machines • To give a more natural and comprehensible treatment of movement • For a more efficient treatment of features • Not because of respectively – most parsers can’t handle it. BASED on Kevin Cohen’s LING 5200

  8. b*c matches the first character in the string cabbbcde, • b*cd matches the third to seventh characters in the string cabbbcdebbbbbbcdbc. BASED on Kevin Cohen’s LING 5200

  9. Character classes: ranges • All upper-case, all lower-case, all letters, any digit from zero to 9… • [A-Z] • [a-z] • [A-Za-z] • [0-9] • Practice! BASED on Kevin Cohen’s LING 5200

  10. Character classes: complements • Any character that's not a vowel • [^aeiouAEIOU] In this context, means "not" BASED on Kevin Cohen’s LING 5200

  11. Anchors • Any line that begins with… • Any line that ends with… • ^T line that begins with T • VBZ$ line that ends with VBZ BASED on Kevin Cohen’s LING 5200

  12. Quantifiers • One or more… • Zero or more… • One or zero… • a+ one or more “a's” • a* zero or more “a's” • a? one “a”, or nothing • And more… BASED on Kevin Cohen’s LING 5200

  13. grep/egrep • X+ instead of xx* • (xxx|yyy) • ? Matches a single character BASED on Kevin Cohen’s LING 5200

  14. Searching the treebank • cat ??/* | egrep -i '(push|pull)[a-z]*’ BASED on Kevin Cohen’s LING 5200

  15. grep/egrep • grep '^[^a-z]*epl' README.english • grep ‘ epl' README.english • egrep '^[^a-z]*(epl|epw)' README.english • egrep ‘ (epl|epw)' README.english • Nice when you have tokenized strings… BASED on Kevin Cohen’s LING 5200

  16. More grepping • But when you don’t…. • /corpora/celex/english/epw/epw.cd • Find all capitalized words • grep ^'[0-9][0-9]*.[A-Z]' epw.cd | wc -l BASED on Kevin Cohen’s LING 5200

  17. Exercises – pick a directory • How many 5 letter words? • head -10 wsj_0564 | grep -i ' [a-z][a-z][a-z][a-z][a-z] ' | wc • grep -i ' [a-z][a-z][a-z][a-z][a-z] ' * | wc BASED on Kevin Cohen’s LING 5200

  18. Lab (cont.) • Are there any words with no vowels? • grep -i ' [^aeiou][^aeiou]* ' wsj_0564 | wc • grep -i ' [^aeiouy][^aeiouy.]* ' wsj_0564 | wc • grep -i ' [^aeiouy"][^aeiouy."]* ' wsj_0564 • 80%? BASED on Kevin Cohen’s LING 5200

  19. Lab (cont.) • Find “1-syllable” words. (words with exactly one vowel) grep -i ' [^aeiouy]*[aeiouy][^aeiouy]* ‘ • Find “2- syllable” words. (words with exactly two vowels) • Delete words ending with a silent “e” from the “2-syllable” list BASED on Kevin Cohen’s LING 5200

  20. Emacs • emacs –nw • Control x, control c – exit • Control x, control s – save • Control x, control v – visit • Appropos BASED on Kevin Cohen’s LING 5200

More Related