240 likes | 362 Views
Can One Language Bootstrap the Other: A Case Study on Event Extraction. Zheng Chen Heng Ji BLENDER: Cross-lingual Cross-document IE Lab Department of Computer Science The Graduate Center and Queens College The City University of New York
E N D
Can One Language Bootstrap the Other: A Case Study on Event Extraction Zheng Chen Heng Ji BLENDER: Cross-lingual Cross-document IE Lab Department of Computer Science The Graduate Center and Queens College The City University of New York June, 2009
“Up-to-date” Related Work in This Workshop Carlson et al.: Present CBL(Coupled Bootstrap Learner) algorithm and show that simultaneously learning a coupled collection of classifiers achieves more accurate extractions than training classifiers individually Plank: Evaluated different variations of self-training for parse selection problem Poveda et al.: Present a bootstrapping algorithm for the extraction of IE patterns for the recognition of time expressions … 06/04/2009 2 SSLNLP 2009
ACE Event Extraction: Terminology • Event: a specific occurrence involving participants • Event trigger: • The word that most clearly expresses an event’s occurrence • Event argument: • An entity, or a temporal expression that has a certain role (e.g., Time-Within, Place) in an event. • Event mention: • A sentence with a distinguished trigger and involving arguments SSLNLP 2009
Event Extraction: A Toy Example Example: Mikegotmarriedin2008. SSLNLP 2009
Pre-processing Trigger labeling Trigger identification Trigger classification Argument labeling Argument identification Argument classification Post-processing Event Extraction: A Pipeline SSLNLP 2009 to identify a word or a phrase as the event trigger to assign an event type to the identified trigger to identify whether an entity or temporal expression is an argument associated with the trigger in the same sentence to assign a role to the arguments
Two Monolingual Event Extraction Systems(Grishman et al., 2005, Chen and Ji, 2009) • Employ the same pipeline framework • Use Maximum Entropy based classifiers • Share some language-independent features (POS tagging, parsing, …) • Differ by some language-specific features (character-based features in Chinese, using synonym dictionary…) Table 1.Performance of Event Extraction on Perfect Entities and Human Annotators SSLNLP 2009
Bootstrapping Event Extraction • Both systems rely on expensive human labeled data, thus suffers from data scarcity (much more expensive than other NLP tasks due to the extra tagging tasks of entities and temporal expressions) Questions: • Can the monolingual system benefit from bootstrapping techniques with a relative small set of training data? • Can a monolingual system (in our case, the Chinese event extraction system) benefit from the other resource-rich monolingual system (English system)? SSLNLP 2009
Traditional Bootstrapping Algorithms Unlabeled Samples Labeled Samples Train Select at Random … … Classifier1 Classifiern Classifieri Pool with Constant Size Test Labeled Samples with High Confidence • Self-training: n=1: trust yourself and teach yourself • Co-training: n=2 (Blum and Mitchell,1998) • you have a partner working toward the same goal • working independently • can get along and get improved together Two assumptions: • the two views are individually sufficient for classification • the two views are conditionally independent given the class
New Approach: Bootstrapping across Languages • Baseline: Monolingual Self-training • Cross-lingual Bootstrapping • Cross-lingual Co-Training • Cross-lingual Semi-Co-Training SSLNLP 2009
Monolingual Self-training Labeled Monolingual Samples Unlabeled Monolingual Samples Train Event Extraction System Select at random Labeled Samples with High Confidence Pool with Constant Size Event Extraction SSLNLP 2009
Cross-lingual Co-Training • Intuition: • The same event has different “views” described in different languages, because the lexical unit, the grammar and sentence construction differ from one language to the other. • Satisfy the sufficiency assumption • Independency assumption is arguable • Difference from co-training: • The two systems in cross-lingual co-training are not initially trained from the same labeled data. • In the bootstrapping phase, each system only labels half portion of the parallel corpus (bitexts) in its own language • Cross-lingual projection SSLNLP 2009
Cross-lingual Co-Training Labeled Samples in Language A Unlabeled Bitexts Labeled Samples in Language B train train Select at Random System for Language A System for Language B B Bilingual Pool with constant size Event Extraction Event Extraction A High Confidence Samples B High Confidence Samples A Cross-lingual Projection Projected Samples A Projected Samples B
Cross-lingual Semi-co-training • Variation2 • Variation1 Labeled Samples in Language A Unlabeled Bitexts Labeled Samples in Language A Unlabeled Bitexts Labeled Samples in Language B Labeled Samples in Language B • It tries to bootstrap only one system by the other fine-tuned system • It is helpful if one language is a rich resource language train train train train Select at Random Select at Random System for Language A System for Language A System for Language B System for Language B Bilingual Pool with constant size Bilingual Pool with constant size Event Extraction Event Extraction Event Extraction Event Extraction High Confidence Samples B High Confidence Samples B High Confidence Samples A Cross-lingual Projection Cross-lingual Projection Projected Samples A Projected Samples A Combine SSLNLP 2009
Cross-lingual Projection • A key operation in the cross-lingual co-training algorithm • In our case, project the triggers and the arguments from one language into the other language according to the alignment information provided by bitexts. • An example Before Projection After Projection SSLNLP 2009
Experiments and Results Data • ACE 2005 corpus • 560 English documents • 633 Chinese documents • LDC Chinese Treebank English Parallel corpus • 159 bitexts with manual alignment SSLNLP 2009
Experiments: Cross-lingual Semi-co-training on Bitexts • Monolingual self-training for Chinese system on the bitexts • Bitexts do not provide ground-truth entities and temporal expressions, instead, tagged by IE system. • Seed training size for Chinese system: 100 • Pool size : 20 • Cross-lingual semi-co-training on the bitexts, but use only English labeling results for retraining Chinese system • Set up an Fine-tuned English event extraction system trained on a relative large training set (500 documents). • Seed training size for Chinese system: 100 • Pool size: 20 SSLNLP 2009
Experiments: Cross-lingual Semi-co-training on Bitexts • cross-lingual semi-co-training on the bitexts, but combining the Chinese and English labeling results for retraining Chinese system Combine the results based on the following rules: • If the event labeled by English system is not labeled by Chinese system, add the event to Chinese system • If the event labeled by Chinese system is not labeled by English system, keep the event in the Chinese system • If both systems label the same event but with different event types and arguments, select the one with higher confidence SSLNLP 2009
Experiment results Self-training, and Semi-co-training (English- labeled & Combined-labeled) for Argument Labeling Self-training, and Semi-co-training (English- labeled & Combined-labeled) for Trigger Labeling SSLNLP 2009
Analysis • Self-training: a little gain of 0.4% above the baseline for trigger labeling and a loss of 0.1% below the baseline for argument labeling. The deterioration tendency of the self-training curve indicates that entity extraction errors do have counteractive impacts on argument labeling. • Trust-English method: a gain of 1.7% for trigger labeling and 0.7% for argument labeling. • Combination method: a gain of 3.1% for trigger labeling and 2.1% for argument labeling. The third method outperforms the second method. SSLNLP 2009
Related Work Using bitexts or translation techniques as feedback to improve entity extraction Huang and Vogel (2002): improve the named translation dictionary and name tagging simultaneously Ji and Grishman (2007): joint inference between entity extraction and entity translation Zitouni and Florian (2008): use translation as additional features to improve source mention detection What’s New A new case study on event extraction Combine cross-lingual projection with bootstrapping methods to avoid knowledge engineering of inference rules or features 20 SSLNLP 2009 06/04/2009
Conclusions • Formalize a new algorithm of cross-lingual bootstrapping, and demonstrate its effectiveness in a challenging task of event extraction • Demonstrate that for some applications besides machine translation, effective use of bitexts can be beneficial SSLNLP 2009
Future Work • Conduct experiments on cross-lingual co-training and investigate whether the two systems on both sides can benefit from each other. • In this paper, we used a corpus with manual alignment, but in the future we intend to investigate the effect of automatic alignment errors. SSLNLP 2009
Thank you! Questions and Comments? SSLNLP 2009
References • David Ahn. 2006. The stages of event extraction. Proc. COLING/ACL 2006 Workshop on Annotating and Reasoning about Time and Events. Sydney, Australia. • Rie Ando and Tong Zhang. 2005. A High-Performance Semi-Supervised Learning Methods for Text Chunking. Proc. ACL2005. pp. 1-8. Ann Arbor, USA • David Bean and Ellen Riloff. 2004. Unsupervised Learning of Contextual Role Knowledge for Coreference Resolution. Proc. HLT-NAACL2004. pp. 297-304. Boston, USA. • Avrim Blum and Tom Mitchell. 1998. Combining Labeled and Unlabeled Data with Co-training. Proc. of the Workshop on Computational Learning Theory. Morgan Kaufmann Publishers. • Zheng Chen and Heng Ji. 2009. Language Specific Issue and Feature Exploration in Chinese Event Extraction. Proc. HLT-NAACL 2009 Student Research Workshop. Boulder, Co. • Michael Collins and Yoram Singer. 1999. Unsupervised Models for Named Entity Classification. Proc. of EMNLP/VLC-99. • Ralph Grishman, David Westbrook and Adam Meyers. 2005. NYU’s English ACE 2005 System Description. Proc. ACE 2005 Evaluation Workshop. Washington, US. • Fei Huang and Stephan Vogel. 2002. Improved Named Entity Translation and Bilingual Named Entity Extraction. Proc. ICMI 2002. Pittsburgh, PA, US. • Heng Ji and Ralph Grishman. 2006. Data Selection in Semi-supervised Learning for Name Tagging. In ACL 2006 Workshop on Information Extraction Beyond the Document:48-55. Sydney, Australia. • Heng Ji and Ralph Grishman. 2007. Collaborative Entity Extraction and Translation. Proc. International Conference on Recent Advances in Natural Language Processing 2007. Borovets, Bulgaria. • Winston Lin, Roman Yangarber and Ralph Grishman. 2003. Bootstrapping Learning of Semantic Classes from Positive and Negative Examples. Proc. ICML-2003 Workshop on The Continuum from Labeled to Unlabeled Data. Washington, D.C. • Rada Mihalcea. 2004. Co-training and self-training for word sense disambiguation. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL-2004). • Scott Miller, Jethran Guinness and Alex Zamanian.2004. Name Tagging with Word Clusters and Discriminative Training. Proc. HLT-NAACL2004. pp. 337-342. Boston, USA • Imed Zitouni and Radu Florian. 2008. Mention Detection Crossing the Language Barrier. Proc. EMNLP. Honolulu, Hawaii. SSLNLP 2009