300 likes | 435 Views
LING 581: Advanced Computational Linguistics. Lecture Notes January 23rd. Adminstrivia. Let us assume Installed Penn Treebank v3 Downloaded and installed tregex under MacOSX or Linux (possibly inside VirtualBox ). Trees in the Penn Treebank. Notation : LISP S-expression S-EXP =
E N D
LING 581: Advanced Computational Linguistics Lecture Notes January 23rd
Adminstrivia • Let us assume • Installed Penn Treebank v3 • Downloaded and installed tregex • under MacOSX or Linux (possibly inside VirtualBox)
Trees in the Penn Treebank Notation: LISP S-expression S-EXP = (LABEL S-EXP … S-EXP) or S-EXP = (LABEL WORD)
tregex • tregex is a tgrep2-style utility for matching patterns in trees • run-tregex-gui.command (on MacOSX)
tregex • tregex is a tgrep2-style utility for matching patterns in trees • select the PTB directory, e.g. TREEBANK_3/parsed/mrg/wsj/
tregex • tregex is a tgrep2-style utility for matching patterns in trees • Browse Trees
tregex • Search (NP-SBJ << NNP) (NP-SBJ < NNP)
Penn Tagset Recap • Part-of-speech (POS) tags • http://www.americannationalcorpus.org/OANC/penn.html
Penn Tagset Recap • Part-of-speech (POS) tags • http://www.americannationalcorpus.org/OANC/penn.html
Penn Tagset Recap • Part-of-speech (POS) tags • http://www.americannationalcorpus.org/OANC/penn.html
Penn Tagset Recap • Syntactic tagset: • (from The Penn Treebank: An overview, Taylor, Marcus & Santorini)
Penn Tagset Recap • Syntactic tagset: • (from The Penn Treebank: An overview, Taylor, Marcus & Santorini)
tregex: relations Help
tregex: relations Help
tregex: labels /regex/ anchors: ^, $ __ @NP matches NP, NP-SBJ etc. • Help S < NP < VP means S < VP AND S < NP Note: node grouping () vs. relation grouping []
Tregex: operators Help 4. VP < VV | < NP $ NP equiv. to VP [ < VV | [< NP & $ NP ]] NP < NN | < NNS NP > S & $++ VP (& redundant) 5. NP !< NNP 6. NP < !NNP|NNS 3. NP [ < NN | < NNS ] & > S (Note: squarebrackets)
tregex • Help @NP matches NP, NP-SBJ, NP-PRD, NP-TMP etc. Matches: 432,777 anywhere…
tregex: names Similar to backreferences in Perl regexs (@NP <, (@NP $+ (/,/ $+ (@NP $+ /,/=comma))) <- =comma) • Help
tregex: names same node • Pattern: • (@NP <, (@NP $+ (/,/ $+ (@NP $+ /,/=comma))) <- =comma) Key: <, first child $+ immediate left sister <- last child
tregex: links • Help ADJP=cat <, ~cat <- ~cat
tregex: variable groups • Help @SBAR < /^WH.*-([0-9]+)$/#1%index << (__=empty < (/^-NONE-/ < /^\*T\*-([0-9]+)$/#1%index))
tregex: variable groups • Different results from: • @SBAR < /^WH.*-([0-9]+)$/#1%index << (@NP < (/^-NONE-/ < /^\*T\*-([0-9]+)$/#1%index))
tregex: variable groups Example: WHADVP also possible (not just WHNP)
Treebank Guides Tagging Guide Arpa94 paper Parse Guide
Treebank Guides • Parts-of-speech (POS) Tagging Guide, tagguid1.pdf (34 pages): tagguid2.pdf: addendum, see POS tag ‘TO’
Treebank Guides • Parsing guide 1, prsguid1.pdf (318 pages): prsguid2.pdf: addendum for the Switchboard corpus
Homework Exercise • Report your regex search expression and frequency counts for NPs that have various classes of relative clauses attached • (Prsguid1.pdf, section 4.2.2, pg.63) • Criteria: • Relative clauses are adjoinedto the head noun phrase • The relative pronoun is • (1) given the appropriate WH-label, • (2) put inside the SBAR level, and • (3) coindexedwith a *T* in the position of a gap
Homework Exercise • Report and document your search string and frequency counts for different categories: • Tensed relatives • Subject relative clauses vs. non-subject relative clauses • That-relatives vs. wh-word relatives vs. zero relatives • Infinitival relatives • Subject relative clauses vs. non-subject relative clauses • Submit a snapshot of a tree from the WSJ for each of the categories
Homework Exercise • Put everything in one PDF file • Submit by email before class next time • Be prepared to come up and explain your searches to the class