1 / 8

Why NLP Needs Theoretical Syntax (It in Fact Already Uses It)

Why NLP Needs Theoretical Syntax (It in Fact Already Uses It). Owen Rambow Center for Computational Learning Systems Columbia University, New York City rambow@ccls.columbia.edu. Key Issue: Representation.

hei
Download Presentation

Why NLP Needs Theoretical Syntax (It in Fact Already Uses It)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Why NLP Needs Theoretical Syntax(It in Fact Already Uses It) Owen Rambow Center for Computational Learning Systems Columbia University, New York City rambow@ccls.columbia.edu

  2. Key Issue: Representation • Aravind Joshi to statisticians (adapted): “You know how to count, but we tell you what to count” • Linguistic representations are not naturally occurring! • They are devised by linguists • Example: English Penn Treebank • Beatrice Santorini (thesis: historical syntax of Yiddish) • Lots of linguistic theory went into the PTB • PTB annotation manual is a comprehensive descriptive grammar of English

  3. What Sort of Representations for Syntax? • Syntax: links between text and meaning • Text consists of words -> lexical models • Lexicalized formalisms • Note: bi- and monolexical versions of CFG • Need to link to meaning (for example, PropBank) • Extended domain of locality to locate predicate-argument structure • Note: importance of dashtags etc in PTB II • Tree Adjoining Grammar! (but CCG is also cool, and LFG has its own appeal)

  4. Why isn’t everyone using TAG? • The PTB is not annotated with a TAG • Need to do linguistic interpretation on PTB to extract TAG (Chen 2001, Fei 2001) • This is not surprising: all linguistic representations need to be interpreted (Rambow 2010) • Extraction of (P)CFG is simple and requires little interpretation • Extraction of bilexical (P)CFG is not, requires head percolation, which is interpretation

  5. Why isn’t everyone using TAG Parsers? • Unclear how well they are performing • PS evaluation irrelevant • MICA parser (Bangalore et al 2009): • high 80s on a linguistically motivated predicate-argument structure dependency • MALT does slightly better on same representation • But MICA output comes fully interpreted, MALT does not • Once we have a good syntactic pred-arg structure, tasks like semantic role labeling (PropBank) are easier • 95% on args given gold pred-arg structure (Chen and Rambow 2002)

  6. What Have We Learned About TAG Parsing? • Large TAG grammar not easy to manage computationally (MICA: 5000 trees, 1,200 used in parsing) • Small TAG grammars lose too much information • Need to investigate: • Dynamic creation of TAG grammars (trees created in response to need) (note: LTAG-spinal Shen 2006) • “Bushes”: underspecified trees • Metagrammars(Kinyon 2003)

  7. What about All Those Other Languages? • Can’t do treebanks for 3,000 languages • Need to understand cross-linguistic variation and use that understanding in computational models • Cross-linguistic variation: theoretical syntax • Models: NLP • Link: metagrammars for TAG

  8. Summary • Treebanks already encode insights from theoretical syntax • Require interpretation for non-trivial models • Applications other than Parseval require richer representations (and richer evaluations) • But probably English is not the right language to argue for the need for richer syntactic knowledge • Real coming bottleneck: NLP for 3,000 languages

More Related