Chunking

Chunking Pierre Bourreau Cristina España i Bonet LSI-UPC PLN-PTM

Plan • Introduction • Methods • HMM • SVM • CRF • Global analysis • Conclusion

Introduction • What is chunking? • Identifying groups of contiguous words. • Ex: • He is the person you read about. • [NP He] [VP is] [NP the person] [NP you] [VP read] [PP about]. • First step to full parsing

Introduction • Chunking task in CoNLL • Based on a previous POS tagging • Chunks: • B/I/O-Chunk • ADJP (adjective phrase) • ADVP (adverb phrase) • CONJP (conjunction phrase) • INTJ (interjection) • LST (list marker) • NP (noun phrase) • PP (prepositional phrase) • PRT (particles) • SBAR (subordinated clause) • UCP (unlike coordinated phrase) • VP (verb phrase) • 2.060 • 4.227 • 56 • 31 • 10 • 55.081 • 21.281 • 556 • 2.207 • 2 • 21.467 ( over 106.978 chunks )

Introduction • Corpus • Wall Street Journal (WSJ) • Training set: • Four sections: 15-18 • 211.727 tokens • Test set: • One section: 20 • 47.377 tokens

Evaluation • Output files style • Word POS Real-Chunk Processed-Chunk • Ex: • Boeing NNP B-NP I-NP • 's POS B-NP B-NP • 747 CD I-NP I-NP • jetliners NNS I-NP I-NP • . . O O • Evaluation’s script: precision, recall, F1 score • (1+β)*recall*precision/(recall+ βprecision) where β=1

All the states?! First order Second order Bigrams Trigrams Hidden Markov Models (HMM) • A bit of theory... • Find the most probable tags for a sentence (I), given a vocabulary and the set of possible tags Bayes theorem

HMM: Chunking • Common Setting • Input sentence: words • Input/output tags: POS • Chunking • Input sentence: POS • Input/output tags: Chunks  Tagger • Problem! • Small vocabulary

HMM: Chunking • Solution: Specialization • Input sentence: POS • Input/output tags: POS + Chunks • Improvement • Input sentence: Special words + POS • Input/output tags: Special words + POS + Chunks

Chancellor NNP O of IN B-PP the DT B-NP Exchequer NNP I-NP Nigel NNP B-NP Lawson NNP I-NP 's POS B-NP restated VBN I-NP commitment NN I-NP to TO B-PP a DT B-NP firm NN I-NP monetary JJ I-NP policy NN I-NP has VBZ B-VP helped VBN I-VP to TO I-VP prevent VB I-VP a DT B-NP freefall NN I-NP in IN B-PP sterling NN B-NP over IN B-PP the DT B-NP past JJ I-NP week NN I-NP . . O NNP NNP·O of·IN of·IN·B-PP the·DT the·DT·B-NP NNP NNP·I-NP NNP NNP·B-NP NNP NNP·I-NP POS POS·B-NP VBN VBN·I-NP NN NN·I-NP to·TO to·TO·B-PP a·DT a·DT·B-NP NN NN·I-NP JJ JJ·I-NP NN NN·I-NP has·VBZ has·VBZ·B-VP helped·VBN helped·VBN·I-VP to·TO to·TO·I-VP VB VB·I-VP a·DT a·DT·B-NP NN NN·I-NP in·IN in·IN·B-PP NN NN·B-NP over·IN over·IN·B-PP the·DT the·DT·B-NP past·JJ past·JJ·I-NP NN NN·I-NP . .·O HMM: Chunking • In practice: Modification of the Input data (WSJ train and test)

HMM: Results • Tool: • TnT Tagger (Thorsten Brants) • Implements Viterbi algorithm for second order MM • Allows to evaluate unigrams, bigrams and trigrams MM

HMM: Results • Configuration 1: • No special words, no POS • 3grams • Default parameters • Results: • Far from the best scores (F1~94%)

HMM: Results Trying to improve… • Configuration 2: • Lexical specialization (409 words, F. Pla) • Trigrams • Configuration 3: • Lexical specialization (409 words, F. Pla) • Bigrams -makes any difference?-

HMM: Results

HMM: Results • Comments: • Adding specialization information improves 7 points the total F1. • That’s much more that the improvement of using trigrams instead of bigrams (~1%). • As before, NP and PP are the best determined chunks. • Impressive improvement for PRT and SBAR (but small #).

HMM: Results • Importance of the training set size: • Test: • Divide the training set in 7 parts (~17000 tokens/part). • Calculate the results adding a part each time. • Conclusion: • Performances improve with the set size (see plot). • Limit? • Molina & Pla got a F1=93.26% with 18 sections of WSJ as training set.

HMM: Results

Support Vector Machines (SVM) • A bit of theory… • Objective: • Maximize the minimum margin • Allow missclassifications • Controlled by the C parameter

SVM • Tool: • SVMTool (Jesús Giménez & Lluís Màrquez) • Uses SVMLight(Thorsten Joachims) for learning. • Sequential tagger  chunking • No necessity to change input data • Binarizes the problem to apply SVMs

SVM • Features (model 0):

SVM • Results • Default parameters, vary C and/or direction (LR/LRL) • Very small variations with this configuration

SVM • Best results: • F1 > 90% for the three main chunks. • Modest values for the others. • Main difference with HMM in PP.

Conditional Random Fields (CRF) • A bit of theory… • Idea based on extension of HMM and Maximum-Entropy Models. • We don’t consider a chain but a graph G=(V,E) • Conditioned on X, observation sequence variable • Each node represents a value Yv of Y (output label)

Conditional Random Fields • P(y|x) (Lafferty ) where y is a label, and x an observation sequence • tj is a transition feature function (regarding previous features and observation sequence). • sj is a state feature function (regarding current features and observation sequence). • factors are set at the training level.

Conditional Random Fields (CRF) • CRF++ 0.45 • Developed by Taku Kudo (2nd at the CoNll2000 with SVM combination) • Parameters: • Features being used: • We can use words, POS tagging • We proposed three alternatives: • Using a binary combinations of words+POS on a frame size=2 • Using the above + a 3-ary combination of POS • Using only POS on a 3 size frame • Unigrams or Bigrams • Getting a score regarding probabilities for our current OR for the pair of words.

Conditional Random Fields (CRF) • Results

Conditional Random Fields (CRF) • Analysis: • Bigrams with maximum features -> 93.81% Global F1 • Global F1-score does not depend much on feature window, but on bigram/unigram selection: tagging pairs of tokens give more power than single tagging • Ocurrences: LST->0, INTJ->1, CONJP->9 => Identical resuts • PRT is the only POS tag which depends more on feature window and works better for size 2 windows. Prepositions tagging rely on bigger windows (ex: out, around, in, …) • Slightly the same for SBAR. (ex: than, rather, …)

Conditional Random Fields (CRF) • How to improve results: • Molina & Pla’s method? Should improve efficiency in SBAR and CONJP • Mixing the different methods?

Global Analysis

Global Analysis • CRF outperforms HMM and SVM • HMM performs better than SVM <= context. Particularly evident for SBAR and PRT. • HMM performs outperforms CRF for CONJP! • HMM: uses 3-grams -> better for expression like “as well as”, “rather than” • HMM improvement with Pla’s method

Global Analysis • CRF results are close to CoNll 2000 best results: • Need finest analysis, per POS

Gobal Anaysis • Combining the three methods:

Global Analysis • Combining does not help for PRPT, where the difference was big between HMM and CRF! • Helps… just a bit on SBAR • Global results are better for CRF alone: 93.81>93.57

Conclusion • Without exicalization, SVM performs a lot better than HMM • With lexical specialization, HMM performs better than SVM… and is a lot faster! • Only 3 experiments for votation: few. Taggers make mistakes for the same POS tags.

Conclusion • At a certain stage, hard to improve results. • CRF proves to be efficient without any specific modification -> how can we improve it? => CRF with 3-grams… but probably really slow. • Some fine comparisons with CoNll results?

Chunking

Chunking

Presentation Transcript

Stemming, tagging and chunking

Chunk- ing ….Chunking!

Chunking Shallow Parsing

CHUNKING TECHNIQUE FOR PARAPHRASING

CHUNKING

POS Tagging & Chunking

Skills -Chunking-

Sequence Classification: Chunking

CHUNKING

Chunking

Chunking

Chunking

Chunking

英文 Stemmer & Chunking

HDF5 Chunking

Chunking

Chunking: Shallow Parsing

Chunking 101

EAP 3 Chunking

Chunking

“Chunking” Method in Essays

Chunking

Chunking

Presentation Transcript

Stemming, tagging and chunking

Chunk- ing ….Chunking!

Chunking Shallow Parsing

CHUNKING TECHNIQUE FOR PARAPHRASING

CHUNKING

POS Tagging &amp; Chunking

Skills -Chunking-

Sequence Classification: Chunking

CHUNKING

Chunking

Chunking

Chunking

Chunking

英文 Stemmer &amp; Chunking

HDF5 Chunking

Chunking

Chunking: Shallow Parsing

Chunking 101

EAP 3 Chunking

Chunking

“Chunking” Method in Essays

POS Tagging & Chunking

英文 Stemmer & Chunking