1 / 29

More Xkwic and Tgrep

More Xkwic and Tgrep. LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006. Resources – Laura is bugging me to make a CU Corpora page…. Like this http://www.stanford.edu/dept/linguistics/corpora/cas-home.html

renata
Download Presentation

More Xkwic and Tgrep

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006

  2. Resources – Laura is bugging me to make a CU Corpora page… • Like this http://www.stanford.edu/dept/linguistics/corpora/cas-home.html • TGREP http://www.stanford.edu/dept/linguistics/corpora/cas-tut-tgrep.html BASED on Kevin Cohen’s LING 5200

  3. Searching with pos tags and ! • [word = "[tT]he" & !( pos = "DT" ) ]; wsj • [ !(word = "water" | pos = "NN")]; • [ !(word = "water") & !( pos = "NN")]; • [ word != "water" & pos != "NN" ]; BASED on Kevin Cohen’s LING 5200

  4. Operator precedence The precedence properties of the (logical) operators are defined by the following list, i.e. if operator x is listed before operator y, operator x has precedence over y. Operators are evaluated left-right • =, !=, !, &, | • [ ! word = "water" & ! pos = "NN" ]; disambiguates as • [ !(word = "water") & !( pos = "NN")]; BASED on Kevin Cohen’s LING 5200

  5. Searching sequences with | and ? • "Bill" [pos = "NP"]; • [pos = "NP"] [pos = "NP"] [pos = "NP"]; • ([pos = "NP"] [pos = "NP"]) | ([pos = "NP"] "of" [pos = "NP"]); • ([pos = "NP"] "of“? [pos = "NP"]); Note: First match applies BASED on Kevin Cohen’s LING 5200

  6. Corpus Position: wild cards and contexts • "give" []* "up"; • "give" []{0,5} "up"; • "give" []* "up" within 7; • "Clinton" expand to 5; • "Clinton" expand left to 5; • "Clinton" expand right to 5; BASED on Kevin Cohen’s LING 5200

  7. Assignments and Intersect • Q1 = "rain"; • Q2 = [pos="NN"]; • intersect Q1 Q2; • Q1 = [pos = "JJ"] [pos = "NN"]; • Q2 = "acid" "rain"; • intersect Q1 Q2; • [word = "acid" & pos = "JJ"] [word = "rain" & pos = "NN"] BASED on Kevin Cohen’s LING 5200

  8. Structural restrictions • "give" []* "up" within s; • ("gain" []* "profit") | ("profit" []* "gain") within 3 s; • ("gain" []* "profit") | ("profit" []* "gain") within article; • "Clinton" expand left to 2 s; BASED on Kevin Cohen’s LING 5200

  9. Defining structural restrictions • Nounphrase = [pos = "DT"] [pos = "JJ"] [pos = "NN"]; • Nounphrase; • [pos = “JJ”] • Go back to select BASED on Kevin Cohen’s LING 5200

  10. For fun • <s> [pos = "V.*"][pos = "PN.*”] </s> • <s> []* [pos = "V.*"][pos = "PN.*”] </s> • ( [pos = “V.*”] [pos = “PN.*”]) within s • Not a question, not beginning of sentence… BASED on Kevin Cohen’s LING 5200

  11. less is more • less <filename> • cat ??/* | less • Switches • SPACE – next screenful • b– previous screenful • /<reg exp pattern> /RNR search for pattern • ?<reg exp pattern> search backwards for pattern • q - quit BASED on Kevin Cohen’s LING 5200

  12. Searching for a word • tgrep Halloween – what happens? • Why don’t you have to specify a file? babel>grep tgrep .cshrc # tgrep stuff #setenv TGREP_CORPUS /corpora/treebank2/tbl_075/tgrepabl/brwn_cmb.crp setenv TGREP_CORPUS /corpora/treebank2/tgrepabl/wsj_mrg.crp • Count results: tgrep research | wc –l • cat ??/* | grep Halloween | wc -l BASED on Kevin Cohen’s LING 5200

  13. Tgrep Switches • -a Match on all patterns in a sentence • -w Return the whole sentence • -n Put the entire string on one line • -t Print only the terminals BASED on Kevin Cohen’s LING 5200

  14. Viewing it in sentential context • tgrep –wn Halloween | more • tgrep –wn research | more (20,865 hits) • Can also use less BASED on Kevin Cohen’s LING 5200

  15. Viewing it in sentential context • tgrep –wn research | more BASED on Kevin Cohen’s LING 5200

  16. Searching by POS • tgrep NNS | more Another way to do your sanity check BASED on Kevin Cohen’s LING 5200

  17. See more data? • tgrep NNS | grep . | more BASED on Kevin Cohen’s LING 5200

  18. Sentential context (again) • tgrep –wn NNS | more BASED on Kevin Cohen’s LING 5200

  19. Searching by syntactic constituent • tgrep NP | more BASED on Kevin Cohen’s LING 5200

  20. Single-line outputs • tgrep –n NP | more BASED on Kevin Cohen’s LING 5200

  21. Viewing tree-like output • tgrep –w NP | head 20 BASED on Kevin Cohen’s LING 5200

  22. Searching for relations between nodes • tgrep ‘NP < CC’ | head -16 BASED on Kevin Cohen’s LING 5200

  23. tgrep –g (whole language) • A < B – A immediately dominates B • A < B – A is immediately dominated by B • A << B – A dominates B • A >> B – A is dominated by B • A . B – A immediately precedes B • A .. B – A precedes B • A<<,B – B is the leftmost descendent of A • A<<‘B – B is the rightmost descendent of A BASED on Kevin Cohen’s LING 5200

  24. Alternation • node names can be ORed e.g. • tgrep ‘Clinton|Gore’ | head BASED on Kevin Cohen’s LING 5200

  25. Character classes • Regular expressions • tgrep ‘/[Cc]hild/’ | egrep . | head BASED on Kevin Cohen’s LING 5200

  26. Working towards that weird example… • tgrep ‘/[Pp]resident/’ | head BASED on Kevin Cohen’s LING 5200

  27. Combining alternation and a regular expression • tgrep ‘Clinton|Gore|[Pp]resident/’ | head BASED on Kevin Cohen’s LING 5200

  28. Searching for a transitive verb • tgrep -w 'VP << like < NP << DT' | more BASED on Kevin Cohen’s LING 5200

  29. Verbs + Particles • tgrep -w 'VP << kick' > kicktgrep 'VP << /kick.*/ <2 PRT' kicktgrep 'VP <1 VB <2 PRT' kicktgrep -nw 'VP <1 /VB.*/ <2 PRT' kicktgrep 'VP <1 (VB < kick) <2 PRT' kicktgrep 'VP <1 (/VB.*/ < kick) <2 PRT' kick BASED on Kevin Cohen’s LING 5200

More Related