Why NLP Needs Theoretical Syntax (It in Fact Already Uses It)

Why NLP Needs Theoretical Syntax(It in Fact Already Uses It) Owen Rambow Center for Computational Learning Systems Columbia University, New York City rambow@ccls.columbia.edu

Key Issue: Representation • Aravind Joshi to statisticians (adapted): “You know how to count, but we tell you what to count” • Linguistic representations are not naturally occurring! • They are devised by linguists • Example: English Penn Treebank • Beatrice Santorini (thesis: historical syntax of Yiddish) • Lots of linguistic theory went into the PTB • PTB annotation manual is a comprehensive descriptive grammar of English

What Sort of Representations for Syntax? • Syntax: links between text and meaning • Text consists of words -> lexical models • Lexicalized formalisms • Note: bi- and monolexical versions of CFG • Need to link to meaning (for example, PropBank) • Extended domain of locality to locate predicate-argument structure • Note: importance of dashtags etc in PTB II • Tree Adjoining Grammar! (but CCG is also cool, and LFG has its own appeal)

Why isn’t everyone using TAG? • The PTB is not annotated with a TAG • Need to do linguistic interpretation on PTB to extract TAG (Chen 2001, Fei 2001) • This is not surprising: all linguistic representations need to be interpreted (Rambow 2010) • Extraction of (P)CFG is simple and requires little interpretation • Extraction of bilexical (P)CFG is not, requires head percolation, which is interpretation

Why isn’t everyone using TAG Parsers? • Unclear how well they are performing • PS evaluation irrelevant • MICA parser (Bangalore et al 2009): • high 80s on a linguistically motivated predicate-argument structure dependency • MALT does slightly better on same representation • But MICA output comes fully interpreted, MALT does not • Once we have a good syntactic pred-arg structure, tasks like semantic role labeling (PropBank) are easier • 95% on args given gold pred-arg structure (Chen and Rambow 2002)

What Have We Learned About TAG Parsing? • Large TAG grammar not easy to manage computationally (MICA: 5000 trees, 1,200 used in parsing) • Small TAG grammars lose too much information • Need to investigate: • Dynamic creation of TAG grammars (trees created in response to need) (note: LTAG-spinal Shen 2006) • “Bushes”: underspecified trees • Metagrammars(Kinyon 2003)

What about All Those Other Languages? • Can’t do treebanks for 3,000 languages • Need to understand cross-linguistic variation and use that understanding in computational models • Cross-linguistic variation: theoretical syntax • Models: NLP • Link: metagrammars for TAG

Summary • Treebanks already encode insights from theoretical syntax • Require interpretation for non-trivial models • Applications other than Parseval require richer representations (and richer evaluations) • But probably English is not the right language to argue for the need for richer syntactic knowledge • Real coming bottleneck: NLP for 3,000 languages

Why NLP Needs Theoretical Syntax (It in Fact Already Uses It)

Why NLP Needs Theoretical Syntax (It in Fact Already Uses It)

Presentation Transcript

Give it a REST already

Who needs it?????

Why Jonathan needs more Legos

Why it

Nutrient Needs

Syntax and Processing it

BE-IT Needs

Future Needs and Future Directions Maximizing the LHC Performances

Office cleaning Canberra Why one needs it

Why Injection Mould Manufacturer Needs to be Hired

Why study NLP if it doesn’t work?

Top 3 Penny Stock Investing Strategies

Why Hiring Car Service To Newark Airport Is Important?

Why Our Body Needs Calcium?

MySQL Workbench as a Tool of SQL Dev and Administration

Why Should I Choose Mom Canada?

Why IT Professionals prefer for Fiber Optics Cables As Compared To Aluminium

Your Business Needs To Go For Wordpress Development Services

Why is NLP Important

List of some of the most prominent benefits

Syntax

NLP Patterns