220 likes | 323 Views
LING 581: Advanced Computational Linguistics. Lecture Notes January 19th. Administrivia. New room Shantz 338 ( I have asked Jennifer Columbus to investigate refund: however, I’m told it may not happen ). Marshall 480. Shantz 338. Penn Treebank. Availability Source:
E N D
LING 581: Advanced Computational Linguistics Lecture Notes January 19th
Administrivia • New room • Shantz 338 • (I have asked Jennifer Columbus to investigate refund: however, I’m told it may not happen) Marshall 480 Shantz 338
Penn Treebank • Availability • Source: • Linguistic Data Consortium (LDC) • U. of Arizona is a (fee-paying) member of this consortium • Resources are made available to the community through the main library • URL • http://sabio.library.arizona.edu/search/X
Penn Treebank (V3) • Call Record
Penn Treebank Tagging Guide Arpa94 paper Parse Guide
Penn Treebank sections 00-24
tregex • Tregex is a Tgrep2-style utility for matching patterns in trees. written In Java run-tregex-gui.commandshell script -mx flag, the 300m default memory size will need to be increased depending on the platform
tregex • Select the PTB directory • TREEBANK_3/parsed/mrg/wsj/ • Browse Deselect any unwanted files
tregex • Search
tregex Help
tregex • Help
tregex • Help
tregex • Help
tregex • Help
tregex • Pattern: • (@NP <, (@NP $+ (/,/ $+ (@NP $+ /,/=comma))) <- =comma)
tregex • Help
tregex • Different results from: • @SBAR < /^WH.*-([0-9]+)$/#1%index << (@NP < (/^-NONE-/ < /^\*T\*-([0-9]+)$/#1%index))
tregex Example: WHADVP also possible (not just WHNP)
Ungraded Homework Exercise • Search for NP trace relative clauses as defined below: Be ready to compare search pattern and number found next time in class