190 likes | 317 Views
Extracting Why Text Segment from Web Based on Grammar-gram. Iulia Nagy, Master student, 2010-02-27. Summary. Introduction Related work Rule Based Methods Machine Learning Approach “Bag of Function Words” method Method outline Adaptation of “Bag of Function Words” to English
E N D
Extracting Why Text Segment from Web Based on Grammar-gram Iulia Nagy, Master student, 2010-02-27
Summary • Introduction • Related work • Rule Based Methods • Machine Learning Approach • “Bag of Function Words” method • Method outline • Adaptation of “Bag of Function Words” to English • Experiments and Evaluation • Conclusion and Remarks
Problem tremendous growth of the Internetinformation hard to find
Solution • Create QA system system capable to give an exact answer to an exact question detect answer from arbitrary corpora • Purpose obtain viable information rapidly
Purpose of our research Create a why-QA system with automatically-built classifier • Classifier • Use a model presented in Japanese Literature created using Machine learning based on Bag of Grammar approach Purpose of this paper
Related word Two main trends • Rule Based methods • Machine Learning methods
Rule based in why-QA Suzan Vererne’s Approach • Improve performance by re-ranking Method : • weight the score assigned to a QA-pair by QAP with a number of syntactic features.
Machine Learning method Higashinaka and Isozaki’s Approach • Acquire causal expression from Japanese EDR dictionary Method : • train a ranker based on clause structures extracted from EDR
Machine Learning method Tanaka’s Approach • Build why-classifier with function words as features Method : • Bag of function words
Bag of function words Function words Machine learning approach to automatically build domain independent why-classifier based of function words Conditions to obtain domain-independence Class fulfilling conditions
Bag of function words Ts 1 Create feature space Create feature vectors Extract function words Ts 2 … Ts n Mapping using “tf-idf” on function words Fv1 Vectors' format: Classification scheme Fv 2 Trainer for because at after in under which that why to therefore є … Fv n Loogit Boost weak learners Method – same baseline for Japanese and English
Adaptation to English • Differences • Adjustments • Identify eligible function words in English
Experiment • Data • Processing • Label all words with POS and extract function words • Calculatetf-idffor each function word • Map features from feature set into feature vectors
Experiment • Classifier • Used Loogit Boost (Weka) with Decision stump • Created 5 classifiers (50, 100, 150, 200, 250 iterations) • Evaluation • 10-fold cross validation • Models trained on 9 folds and tested on 1 • Measured precision, recall and F-measure
Results – why text segments No of iterations
Results – non why text segments (NWTS) No of iterations
Conclusion Method effective on English Type of TS • Results • 321 instances out of 432 correctly classified • 76.1% precision and 70.6% recall on WTS • 72.6% precision and 77.9% recall on NWTS
Future works • Experiment with a increased dataset (> 5000) • Use Yahoo!Answers database to extract dataset • Interest • Include causative construction in the analysis
Questions and remarks Thank you for your attention !