260 likes | 413 Views
A Classifier-based Deterministic Parser for Chinese. -- Mengqiu Wang Advisor: Prof. Teruko Mitamura Joint work with Kenji Sagae. Outline of the talk. Background Deterministic parsing model Classifier and feature selection POS tagging Experiment and results
E N D
A Classifier-based Deterministic Parser for Chinese -- Mengqiu Wang Advisor: Prof. Teruko Mitamura Joint work with Kenji Sagae
Outline of the talk • Background • Deterministic parsing model • Classifier and feature selection • POS tagging • Experiment and results • Discussion and future work • Conclusion
Background • Constituency parsing is one of the most fundamental tasks in NLP. • State-of-the-art accuracy in Chinese constituency parsing achieves precision and recall in the lower 80% using automatically generated POS. • Most literature in parsing only reports accuracy, efficiency is typically ignored • But in reality, parsers are deemed too slow for many NLP applications (e.g. IR)
Deterministic Parsing Model • Originally developed in [Sagae and Lavie 2005] • Input • Convention in deterministic parsing assumes input sentences (Chinese in our case) are already segmented and POS tagged1. • Main Data Structure • A queue, to store input word-POS pairs • A stack, holds partial parse trees • Trees are lexicalized. We used the same head-finding rules as [Bikel 2004] • The Parser performs binary Shift-Reduce actions based on classifier decisions. • Example … 1.We perform our own POS tagging based on SVM
NR VV NR 布朗 访问 上海 Deterministic Parsing Model Cont. • Input sentence: 布朗/NR (Brown/Proper Noun)访问/VV (Visits/Verb) 上海/NR (Shanghai/Proper Noun) • Initial parser state: Stack:Θ Queue:
NR VV NR 布朗 访问 上海 Deterministic Parsing Model Cont. • Action 1: Shift • Parser State: Stack: Queue:
NP (NR 布朗) NR 布朗 VV NR 访问 上海 Deterministic Parsing Model Cont. • Action 2: Reduce the first item on stack to a NP node, with node (NR 布朗) as the head • Parser State: Stack: Queue:
NP (NR 布朗) NR 布朗 VV NR 访问 上海 Deterministic Parsing Model Cont. • Action 3: Shift • Parser State: Stack: Queue:
NP (NR 布朗) NR 布朗 VV NR 访问 上海 Deterministic Parsing Model Cont. • Action 4: Shift • Parser State: Stack: Queue: Θ
NP (NR 布朗) NP (NR 上海) NR NR 布朗 上海 VV 访问 Deterministic Parsing Model Cont. • Action 5: Reduce the first item on stack to a NP node, with node (NR 上海) as the head • Parser State: Stack: Queue: Θ
NP (NR 布朗) NR 布朗 VV 访问 Deterministic Parsing Model Cont. • Action 6: Reduce the first twoitems on stack to a VP node, with node (VV 访问) as the head • Parser State: Stack: Queue: Θ VP (VV 访问) NP (NR 上海) NR 上海
NP (NR 布朗) NR 布朗 VV 访问 Deterministic Parsing Model Cont. • Action 7: Reduce the first twoitems on stack to an IP node, take the head node of the VP subtree as the head -- (VV 访问). • Parser State: Stack: Queue: Θ VP (VV 访问) VP (VV 访问) NP (NR 上海) NR 上海
NP (NR 布朗) NR 布朗 VV 访问 Deterministic Parsing Model Cont. • Parsing terminates when queue is empty and stack only contains one item • Final parse tree: VP (VV 访问) VP (VV 访问) NP (NR 上海) NR 上海
Classifiers • Classification is the most important part of deterministic parsing. • We experimented with four different classifiers: • SVM classifier finds a hyper-plane that gives the maximum soft margin that minimizes the expected risk. • Maximum Entropy Classifier estimates a set of parameters that would maximize the entropy over distributions that satisfy certain constraints which force the model to best account for the training data. • Decision Tree Classifier We used C4.5 • Memory-based Learning kNN classifier, Lazy learner, short training time, ideal for prototyping.
Features • The features we used are distributionallyderived or linguistically motivated. • Each feature carriesinformation about the context of a particular parsestate. • We denote the top item on the stack as S(1), and second item (from the top) on the stack as S(2), and so on. Similarly, we denote the first item on the queue as Q(1), the second as Q(2), and so on.
Features • A Boolean feature indicates if a closing punctuation is expected or not. • A Boolean value indicates if the queue is empty or not. • A Boolean feature indicates whether there is a comma separating S(1) and S(2) or not. • Last action given by the classifier, and number of words in S(1) and S(2). • Headword and its POS of S(1), S(2), S(3) and S(4), and word and POS of Q(1), Q(2), Q(3) and Q(4). • Nonterminal label of the root of S(1) and S(2), and number of punctuations in S(1) and S(2). • Rhythmic features and the linear distance between the head-words of the S(1) and S(2). • Number of words found so far to be dependents of the head-words of S(1) and S(2). • Nonterminal label, POS and headword of the immediate left and right child of the root of S(1) and S(2). • Most recently found word and POS pair that is to the left of the head-word of S(1) and S(2). • Most recently found word and POS pair that is to the right of the head-word of S(1) and S(2).
POS tagging • In our model, POS tagging is treated as a separate problem and is done prior to parsing. • But we care about the performance of the parser in realistic situations with automatically generated POS tags. • We implemented a simple 2-pass POS tagging model based on SVM, achieved 92.5% accuracy.
Experiments • Standard data collection • Training set: section 1-270 of the Penn Chinese Treebank (3484 sentences, 84873 words). • Development set: section 301-326 • Testing set: section 271-300 • Total: 99629 words, about 1/10 of the size of English Penn Treebank. • Standard corpus preparation • Empty nodes were removed • Functional label of nonterminal nodes removed. Eg. NP-Subj -> NP • For scoring we used the evalb1 program. Labeled recall, labeled precision and F1 (harmonic mean) measures are reported. 1. http://nlp.cs.nyu.edu/evalb
Results • Comparison of classifiers on development set using gold-standard POS
Classifier Ensemble Using stacked-classifier techniques, we improved the performance on the dev set to 90.3% LR and LP of 90.5%, which is a 3.4% improvement in LR and a 2.6% improvement in LP over the SVM model.
Comparison with related work Results on test set using automatically generated POS.
Comparison with related work cont. • Comparison of parsing speed
Discussion and future work • Among the classifiers, SVM has high accuracy but low speed; DTree has lower accuracy but great speed; Maxent sits in between these two in terms of accuracy and speed. • It is desirable to bring the two ends of the spectrum closer, ie. increase the accuracy of DTree classifier, lower the computational cost of SVM classification. • Action items • Apply boosting techniques (Adaboost, random forest, bagging, etc.) to DTree. (Preliminary attempt didn’t yield better performance, calls for further investigation). • Feature selection (especially on lexical items) to reduce computational cost of classification • Re-implement the parser in C++ (avoid invoking external processes and expensive I/O
Conclusion • Implemented a classifier based deterministic constituency parser for Chinese • We achieved comparable results to the state-of-the-art in Chinese parsing • Very fast parsing is made possible for applications that are speed-critical with some tradeoff in accuracy. • Advances in machine learning techniques can be directly applied to parsing problem, opens up lots of opportunities for further improvement
Reference • Daniel M. Bikel and David Chiang. 2000. Two statisticalparsing models applied to the Chinese Treebank.In Proceedings of the Second Chinese LanguageProcessing Workshop. • Daniel M. Bikel. 2004. On the Parameter Space ofGenerative Lexicalized Statistical Parsing Models.Ph.D. thesis, University of Pennsylvania. • David Chiang and Daniel M. Bikel. 2002. Recoveringlatent information in treebanks. In Proceedings ofthe 19th International Conference on ComputationalLinguistics. • Michael John Collins. 1999. Head-driven StatisticalModels for Natural Langauge Parsing. Ph.D. thesis,University of Pennsylvania. • Walter Daelemans, Jakub Zavrel, Ko van der Sloot, andAntal van den Bosch. 2004. Timbl: Tilburgmemorybased learner, version 5.1, reference guide. TechnicalReport 04-02, ILK Research Group, Tilburg University. • Pascale Fung, Grace Ngai, Yongsheng Yang, and BenfengChen. 2004. A maximum-entropy Chineseparser augmented by transformation-based learning.ACM Transactions on Asian Language InformationProcessing, 3(2):159–168. • Mary Hearne and Andy Way. 2004. Data-orientedparsing and the Penn Chinese Treebank. In Proceedingsof the First International Joint Conference onNatural Language Processing. • Zhengping Jiang. 2004. Statistical Chinese parsing.Honours thesis, National University of Singapore. • Zhang Le, 2004. Maximum Entropy Modeling Toolkitfor Python and C++. Reference Manual. • Roger Levy and Christopher D. Manning. 2003. Is itharder to parse Chinese, or the Chinese Treebank?In Proceedings of the 41st Annual Meeting of theAssociation for Computational Linguistics. • Xiaoqiang Luo. 2003. A maximum entropy Chinesecharacter-based parser. In Proceedings of the 2003Conference on Empirical Methods in Natural LanguageProcessing. • David M. Magerman. 1994. Natural Language Parsingas Statistical Pattern Recognition. Ph.D. thesis,Stanford University. • Kenji Sagae and Alon Lavie. 2005. A classifier-basedparser with linear run-time complexity. In Proceedingsof the Ninth International Workshop on ParsingTechnology. • Deyi Xiong, Shuanglong Li, Qun Liu, Shouxun Lin,and Yueliang Qian. 2005. Parsing the Penn ChineseTreebank with semantic knowledge. In InternationalJoint Conference on Natural LanguageProcessing 2005.
Thank you! Questions?