370 likes | 526 Views
中国科学院计算技术研究所. Statistical Learning Methods in Natural Language Processing. Hang Li Microsoft Research Asia Nov. 29, 2002. Talk Outline. MDL Principle Lexical Knowledge Acquisition Using MDL Principle Text Mining Using MDL Principle Information Extraction based on Active Learning.
E N D
中国科学院计算技术研究所 Statistical Learning Methods inNatural Language Processing Hang Li Microsoft Research Asia Nov. 29, 2002
Talk Outline • MDL Principle • Lexical Knowledge Acquisition Using MDL Principle • Text Mining Using MDL Principle • Information Extraction based on Active Learning
Statistical Learning and Prediction Learning System Prediction System
Elements of Statistical Estimation • Model • Strategy (Criterion) • Algorithm
Example of Model Estimation Data Model1 Bernoulli Model Model2 Mixture Model Question: What Criterion Should We Employ?
Shannon’s Information Theory Data Distribution Code Length Average Probability Distribution = Compact Code
Minimum Description Length Principle • MDL Principle: Selecting Model with Minimum Code Length • Minimum Code Length:
Example of Description Length Data Bernoulli
MDL:Trade-off Relationship L Lm+Ld Lm Ld M
Lexical Knowledge Acquisition fly arg1 bird fly arg1 swallow fly arg1 bee fly arg1 bird Knowledge Learning System Prediction System fly arg1 crow ?
Problems • Model ? • Criterion (Strategy) ? ← MDL • Algorithm ?
Case Slot Model Word-based Model Class-based Model
Example Partition crow crow swallow swallow bug bug bird bird eagle eagle bee bee insect insect crow swallow bug bird eagle crow crow swallow bee swallow bug insect bird bug bird eagle eagle bee bee insect insect
Example Thesaurus ANIMAL INSECT BIRD swallow crow bug bee insect eagle bird
Tree Cut ANIMAL INSECT BIRD swallow crow bug bee insect eagle bird
Efficient Algorithm Dynamic Programming
Experimental Results TOP <entity> <abstraction> 0.11 0.10 <life_form> <object> <quantity> <time> 0.08 <plant> <animal> <substance> <artifact> • Penn Tree Bank Data • Object of Eat 0.39 <solid> <fluid> <food>
Demo Lexical Knowledge Acquisition
Text Mining Questionnaire Data: Car Brand Images Closed Answer Open Answer How to mine Open Answers ?
Rule Analysis Accurate extraction of image characteristics for individual car types Characteristics of Car A Comfortable Less luxury Easy to drive expensive Characteristics Of Car B Not reliable Fast & Stylish
Using MDL w w w w w T N N T N T N N T T T 1001010011 w=1 w=0 T 10111 T 01000
Ex.1 Rules for Car A Condition Score Freq./Total Freq. `for ordinary people’ 4.459 7/7 `X, LTD’ 4.523 6/6 `simplicity’ 3.017 4/6 `traditional’ 3.030 4/4 `Japan’ 3.061 3/4 `common-people’ 3.093 3/3 `middle-class’ 3.126 3/3 `earnest’ 3.159 2/2 class&common people’ 3.194 2/2 `general’ 1.919 4/8
Condition Score Freq./TotalFreq. Ex.2 Rules for Car B `outdoor’ 10.325 5/5 `Z, LTD’ 8.132 4/4 `mobility’ 6.527 8/14 `fast’ 5.694 3/3 `run’ 5.057 2/2 `work’ 3.341 2/2 `road’ 3.380 2/2 `enjoyableness’ 3.420 2/2 `boring’ 3.438 3/4 `sporty’ 1.891 2/3
Graphic User Interface Rule Analysis A 車 Search Function Displaying Data
Information Extraction • One Setting: Information Extraction = Classification • Our Goal: • Help user to build task-specific information extraction system • Minimize user efforts in data annotation • Solution: Active Perceptron
Active Perceptron Text Data Active Learing Feature Extraction Perceptron with Margin Extraction Model Perceptron with Margin Using Active Learning ?
Related Work • Perceptron with Margin (Krauth and Mezard) • Performance: comparable with SVM in text classification • Easy to implement • Much faster than SVM • Active Learning • Parsing (Tang et al.) • FSA learning (Angluin)
Case Study: Site Question Answering • Site Question Answering: Dr. What • Answering “what is X” question at microsoft.com • Information Extraction • Extract definitions from web pages
Case Study: Dr. What Active Perceptron Active learning Annotation about 400 paragraphs About 3300 paragraphs from MS Search of several keywords Extraction Model Definitions of about 10000 terms 150000 web pages downloaded from microsoft.com
Demo Active Perceptron and Dr. What
Dr. What • Performance of Perceptron with Margin • 3300 examples, 2640 for training, 660 for testing • Precision: 70.03% • Recall: 38.09% • Performance of Active Learning • Reach the optimal performance with annotation of 400 examples • Performance of Dr. What • Human evaluation on the definitions of 2800 terms extracted • Top 1 Precision: 76.97% • Top 3 Precision: 77.78%
Summary of Talk • Lexical Knowledge Acquisition Using MDL • Text Mining Using MDL • Information Extraction based on Active Learning
References • Hang Li and Naoki Abe, Generalizing Case Frames Using a Thesaurus and the MDL Principle, Computational Linguistics 24(2), 217-244 (1998). • Hang Li and Kenji Yamanishi, Mining from Open Answers in Questionnaire Data, Proc. of ACM-KDD’01, 43-449, (2001).