1 / 10

Portability, Parallelism and Efficiency in Parsing

Portability, Parallelism and Efficiency in Parsing. Dan Bikel University of Pennsylvania March 11th, 2002. Parsing: Where are we now?. Pounding away at Penn Treebank, §23 Collins (1999): LR 88.0, LP 88.3 Charniak (2000): LR 89.6, LP 89.5 Collins (2000): LR 89.6, LP 89.9

blanchej
Download Presentation

Portability, Parallelism and Efficiency in Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Portability, Parallelism and Efficiency in Parsing Dan Bikel University of Pennsylvania March 11th, 2002

  2. Parsing: Where are we now? • Pounding away at Penn Treebank, §23 • Collins (1999): LR 88.0, LP 88.3 • Charniak (2000): LR 89.6, LP 89.5 • Collins (2000): LR 89.6, LP 89.9 • Henderson & Brill (1999) on §22: LR 90.1, LP 92.4 • Room to grow: new domains, better performance

  3. Language DecoderServer 1 Language package ModelCollection CKY Client 1 CKY Client 2 DecoderServer N CKY Client N ModelCollection Switchboard Object server The Right Architecture forParallel Parsing M M

  4. Architecture for Parallel Parsing II • Highly parallel, multi-threaded • New cluster about to come on-line; poised to take advantage • Fully fault-tolerant • Significant flexibility: layers of abstraction • Optimized for speed • Highly portable for new domains, including new languages

  5. P(th,wh) L Mi(ti,wi) Mi-1(ti-1,wi-1) H (th,wh) Collins BBN Layer of Abstraction:Probability Structure

  6. Plug-’n’-play Probability Models • New engine capable of implementing a wide variety of models, including Collins, BBN • Have meticulously replicated Collins’ model and performance • Cleaned up probabilistic “oddities” • Code is thoroughly documented • Will release to public

  7. Fast Portability to New Data Sets • Parsers operate over augmented tree space, T+ • Generative models define joint probability P(S,T,T+) • Chiang & Bikel (2002, in submission) provide • New, portable syntax for augmenting tree nodes • Method for reestimating parser models in the augmented space such that P(S,T) is maximized

  8. Rapid Portability to New Languages with High Accuracy • Bikel & Chiang (2000) described porting two parsing models developed for English to Chinese • BBN: LR 69.0, LP 74.8 (≤ 40 words) • Chiang: LR 76.8, LP 77.8 (≤ 40 words) • New engine designed from ground up for multi-lingual processing: language package • Original design goal for new parsing engine: develop new language packages in 1–2 weeks • Developed Chinese language package for new engine in one and a half days • Compared to other known Chinese parsers on the CTB, recall is equivalent and precision is significantly superior • LR 77.0, LP 81.6 (≤ 40 words)

  9. What’s in store… • Incorporating richer lexical information into parsing/language processing, specifically… • Incorporating word sense information into a parsing model, building on both • previous work extending BBN parsing model to include word sense • recent work with David Chiang, viewing word sense as yet another component of “hidden” data in a Treebank

  10. FIN

More Related