Statistical Learning Methods in Natural Language Processing

中国科学院计算技术研究所 Statistical Learning Methods inNatural Language Processing Hang Li Microsoft Research Asia Nov. 29, 2002

Talk Outline • MDL Principle • Lexical Knowledge Acquisition Using MDL Principle • Text Mining Using MDL Principle • Information Extraction based on Active Learning

1.MDL Principle

Statistical Learning and Prediction Learning System Prediction System

Elements of Statistical Estimation • Model • Strategy (Criterion) • Algorithm

Example of Model Estimation Data Model1 Bernoulli Model Model2 Mixture Model Question: What Criterion Should We Employ?

Shannon’s Information Theory Data Distribution Code Length Average Probability Distribution = Compact Code

Minimum Description Length Principle • MDL Principle: Selecting Model with Minimum Code Length • Minimum Code Length:

Example of Description Length Data Bernoulli

MDL:Trade-off Relationship L Lm+Ld Lm Ld M

2. Lexical Knowledge Acquisition

Lexical Knowledge Acquisition fly arg1 bird fly arg1 swallow fly arg1 bee fly arg1 bird Knowledge Learning System Prediction System fly arg1 crow ?

Problems • Model ? • Criterion (Strategy) ? ← MDL • Algorithm ?

Case Slot Model Word-based Model Class-based Model

Example Partition crow crow swallow swallow bug bug bird bird eagle eagle bee bee insect insect crow swallow bug bird eagle crow crow swallow bee swallow bug insect bird bug bird eagle eagle bee bee insect insect

Example Thesaurus ANIMAL INSECT BIRD swallow crow bug bee insect eagle bird

Tree Cut ANIMAL INSECT BIRD swallow crow bug bee insect eagle bird

Efficient Algorithm Dynamic Programming

Experimental Results TOP <entity> <abstraction> 0.11 0.10 <life_form> <object> <quantity> <time> 0.08 <plant> <animal> <substance> <artifact> • Penn Tree Bank Data • Object of Eat 0.39 <solid> <fluid> <food>

Demo Lexical Knowledge Acquisition

3. Text Mining

Text Mining Questionnaire Data: Car Brand Images Closed Answer Open Answer How to mine Open Answers ?

Rule Analysis Accurate extraction of image characteristics for individual car types Characteristics of Car A Comfortable Less luxury Easy to drive expensive Characteristics Of Car B Not reliable Fast & Stylish

Using MDL w w w w w T N N T N T N N T T T 1001010011 w=1 w=0 T 10111 T 01000

Ex.1 Rules for Car A Condition Score Freq./Total Freq. `for ordinary people’ 4.459 7/7 `X, LTD’ 4.523 6/6 `simplicity’ 3.017 4/6 `traditional’ 3.030 4/4 `Japan’ 3.061 3/4 `common-people’ 3.093 3/3 `middle-class’ 3.126 3/3 `earnest’ 3.159 2/2 class&common people’ 3.194 2/2 `general’ 1.919 4/8

Condition Score Freq./TotalFreq. Ex.2 Rules for Car B `outdoor’ 10.325 5/5 `Z, LTD’ 8.132 4/4 `mobility’ 6.527 8/14 `fast’ 5.694 3/3 `run’ 5.057 2/2 `work’ 3.341 2/2 `road’ 3.380 2/2 `enjoyableness’ 3.420 2/2 `boring’ 3.438 3/4 `sporty’ 1.891 2/3

Graphic User Interface Rule Analysis A　車　　　　　　　　　　　　　 Search Function Displaying Data

4. Information Extraction

Information Extraction • One Setting: Information Extraction = Classification • Our Goal: • Help user to build task-specific information extraction system • Minimize user efforts in data annotation • Solution: Active Perceptron

Active Perceptron Text Data Active Learing Feature Extraction Perceptron with Margin Extraction Model Perceptron with Margin Using Active Learning ?

Related Work • Perceptron with Margin (Krauth and Mezard) • Performance: comparable with SVM in text classification • Easy to implement • Much faster than SVM • Active Learning • Parsing (Tang et al.) • FSA learning (Angluin)

Case Study: Site Question Answering • Site Question Answering: Dr. What • Answering “what is X” question at microsoft.com • Information Extraction • Extract definitions from web pages

Case Study: Dr. What Active Perceptron Active learning Annotation about 400 paragraphs About 3300 paragraphs from MS Search of several keywords Extraction Model Definitions of about 10000 terms 150000 web pages downloaded from microsoft.com

Demo Active Perceptron and Dr. What

Dr. What • Performance of Perceptron with Margin • 3300 examples, 2640 for training, 660 for testing • Precision: 70.03% • Recall: 38.09% • Performance of Active Learning • Reach the optimal performance with annotation of 400 examples • Performance of Dr. What • Human evaluation on the definitions of 2800 terms extracted • Top 1 Precision: 76.97% • Top 3 Precision: 77.78%

Summary of Talk • Lexical Knowledge Acquisition Using MDL • Text Mining Using MDL • Information Extraction based on Active Learning

References • Hang Li and Naoki Abe, Generalizing Case Frames Using a Thesaurus and the MDL Principle, Computational Linguistics 24(2), 217-244 (1998). • Hang Li and Kenji Yamanishi, Mining from Open Answers in Questionnaire Data, Proc. of ACM-KDD’01, 43-449, (2001).

Statistical Learning Methods in Natural Language Processing

Statistical Learning Methods in Natural Language Processing

Presentation Transcript

Statistical Natural Language Processing

CS 388: Natural Language Processing: Statistical Parsing

CS 595-052 Machine Learning and Statistical Natural Language Processing

Statistical Natural Language Processing

Statistical Natural Language Processing

Empirical Learning Methods in Natural Language Processing

Machine Learning for Natural Language Processing

Natural Language Processing

Statistical Natural Language Processing

Natural Language Processing

Finite-State Methods in Natural Language Processing

Finite-State Methods in Natural Language Processing

Finite-State Methods in Natural Language Processing

Machine Learning Natural Language Processing

CS 294-5: Statistical Natural Language Processing

Statistical Natural Language Processing

Natural Language Processing Statistical Inference: n-grams

Kernel Methods in Natural Language Processing

Statistical Natural Language Processing