Chunking Shallow Parsing

AsmaNaseer ChunkingShallow Parsing

Introduction • Shallow Parsing or Partial Parsing • At first proposed by Steven Abney (1991) • Breaking text up into small pieces • Each piece is parsed separately [1]

Introduction (continue . . . ) • Words are not arranged flatly in a sentence but are grouped in smaller parts called phrases The girl was playing in the street اس نے احمد کو کتاب دی

Introduction(continue . . . ) • Chunks are non-recursive (does not contain a phrase of the same category as it self) • NP D? AdjP? AdjP? N The big red balloon [NP[D The][AdjP [Adjbig]] [AdjP [Adjred]][N balloon]] [1]

Introduction(continue . . . ) • Each phrase is dominated by a head h A man proud of his son. A proud man • The root of the chunk has h as s-head (semantic head) • Head of a Noun phrase is usually a Noun or pronoun [1] [1]

Chunk tagging (continue . . .) • IOBE • IOB • IO

Chunk Tagging • IOB (Inside Outside Begin) • I-NP O-NP B-NP • I-VP O-VP B-BP قائد اعظم محمد علی جناح نے قوم سے خطاب کیا [جناحI-NP] [علیI-NP] [ محمد I-NP] [قائد اعظم B-NP] [خطاب B-NP] [سے O-NP ] [قومB-NP] [نےO-NP] [کیا O-NP]

research work • Rule Based Vs Statistical Based Chunking [2] • Use of Support Vector Learning for Chunk Identification [5] • A Context Based Maximum Likelihood Approach to Chunking [6] • Chunking with Maximum Entropy Models [7] • Single-Classifier Memory-Based Phrase Chunking [8] • Hybrid Text Chunking [9] • Shallow Parsing as POS Tagging [3]

Rule Based Vs Statistical Based Chunking • Two techniques are used • Regular expressions rules • Shallow Parse based on regular expressions • N-gram statistical tagger (machine based chunking) • NLTK (Natural Language Toolkit) based on TnT Tagger (Trigramsb’n’Tags). • Basic Idea: Reuse POS tagger for chunking.

Rule Based Vs Statistical Based Chunking (continue… ) Regular expressions rules • Necessary to develop regular expressions manually N-gram statistical tagger • Can be trained on gold standard chunked data

Rule Based Vs Statistical Based Chunking (continue… ) • Focus is on Verb and Noun phrase chunking • Noun Phrases • Noun or pronoun is the head • Also contains • Determiners i.e. Articles, Demonstratives, Numerals, Possessives and Quantifiers • Adjectives • Complements ( ad-positional, relative clauses ) • Verb Phrases • Verb is the head • Often one or two complements • Any number of Adjuncts

Rule Based Vs Statistical Based Chunking (continue… ) • Training NLTK on Chunk Data • Starts with empty rule set • 1. Define or refine a rule • 2. Execute chunker on training data • 3. Compare results with previous run • Repeat (1,2 & 3) until performance does not improve significantly • Issues: Total 211,727 phrases. Taken subset 1,000 phrases.

Rule Based Vs Statistical Based Chunking (continue… ) • Training TnT on Chunk Data • Chunking is treated as statistical tagging • Two steps • Parameter generation : create model parameters from training corpus • Tagging : tag each word with chunk label

Rule Based Vs Statistical Based Chunking (continue… ) • Data Set • WSJ: Wall Street Journal Newspaper NY • US • International Business • Financial News • Training: section 15-18 • Testing: section 20 • Both tagged with POS and IOB • Special characters are treated as other POS, punctuation are tagged as O

Rule Based Vs Statistical Based Chunking (continue… ) • Results • Precision P = |reference ∩ test| / test • Recall R = |reference ∩ test| / reference • F- Measure Fα = 0.5 = 1 / (α/P + (1-α)/PR) • F- Rate F = (2 * P* R) / (R+P)

Rule Based Vs Statistical Based Chunking (continue… ) • Results • NLTK • TnT

Use of Support Vector Learning for Chunk Identification • SVM (Large Margin Classifiers) • Introduced by Vapnik 1995 • Two class pattern recognition problem • Good generalization performance • High accuracy in text categorization without over fitting (Joachims, 1998; Taira and Haruono, 1999)

Use of Support Vector Learning for Chunk Identification ( continue… ) • Training data • (xi, yi)…. (xl, yl) xiЄRn, yiЄ {+1, -1} • xi is the i-th sample represented by n dimensional vector • yi is (+ve or –ve class) label of i-th sample • In SVM • +ve and –ve examples are separated by a hyperplane • SVM finds optimal hyperplane

Use of Support Vector Learning for Chunk Identification ( continue… ) • Two possible hyperplanes

Use of Support Vector Learning for Chunk Identification ( continue… ) • Chunks in CoNLL-2000 shared task, are IOB Tagged • Each chunk type belongs to either I or B • I-NP or B-NP • 22 types of chunks are found in CoNLL-2000 • Chunking problem is classification of these 22 types • SVM is binary classifier, so its extended to k-classes • One class vs. all others • Pairwise classification • k * (k-1) / 2 classifiers 22 * 21 / 2 = 231 classifiers • Majority decides final class

Use of Support Vector Learning for Chunk Identification ( continue… ) • Feature vector consists of • Words: w • POS tags: t • Chunk tags: c • To identify chunk ci at i-th word • wj, tj (j = i-2, i-1, i, i+1, i+2) • cj (j = i-2, i-1) • All features are expanded to binary values; either 0 or 1 • The total dimensions of feature vector becomes 92837

Use of Support Vector Learning for Chunk Identification ( continue… ) Results • It took about 1 day to train 231 classifiers • PC-Linux • Celeron 500 MHz, 512 MB • ADJP, ADVP, CONJP, INTJ, LST, NP, PP, PRT, SBAR, VP • Precision = 93.45 % • Recall = 93.51 % • Fβ=1 = 93.48 %

A Context Based Maximum Likelihood Approach to Chunking Training • POS Tags based • Construct symmetric n-context from training corpus • 1-context: most common chunk label for each tag • 3-context: tag followed by the tag before and after it [t-1, t0, t+1] • 5-context [t-2 ,t-1, t0, t+1, t+2] • 7-context [t-3 , t-2 ,t-1, t0, t+1, t+2, t+3]

A Context Based Maximum Likelihood Approach to Chunking (continue . . .) Training • For each context find the most frequent label • CC [O CC] • PRP CC RP  [B-NP CC] • To save storage space n-context is added if its different from its nearest lower order context

A Context Based Maximum Likelihood Approach to Chunking (continue . . .) Testing • Construct maximum context for each tag • Look up in the database of most likely patterns • If the largest context is not found context is diminished step by step • The only rule for chunk-labeling is to look up [t-3 , t-2 ,t-1, t0, t+1, t+2, t+3] .… [t0] until the context is found

A Context Based Maximum Likelihood Approach to Chunking (continue . . .) Results • The best results are achieved for 5-context • ADJP, ADVP, CONJP, INTJ, LST, NP, PP, PRT, SBAR, VP • Precision = 86.24% • Recall = 88.25% • Fβ=1 = 87.23%

Chunking with Maximum Entropy Models • Maximum Entropy models are exponential models • Collect as much information as possible • Frequencies of events relevant to the process • MaxEnt model has the form P(w|h) = 1 / Z(h) . eΣiλifi(h,w) • fi(h,w) is a binary valued featured vector describing an event • λi describes how important is fi • Z(h) is a normalization factor

Chunking with Maximum Entropy Models (contune . . .) Attributes Used • Information in WSJ Corpus • Current Word • POS Tag of Current Word • Surrounding Words • POS Tags of Surrounding Words • Context • Left Context: 3 words • Right Context: 2 words • Additional Information • Chunk tags of previous 2 words

Chunking with Maximum Entropy Models (contune . . .) Results • Tagging Accuracy = 95.5% # of correct tagged words Total # of words • Recall = 91.86% # of correct proposed base NPs Number of correct base NPs • Precision = 92.08% # of correct proposed base NPs Number of proposed base NPs • Fβ=1 = 91.97% (β 2 +1). Recall .Precision β2 . (Recall + Precision)

Hybrid Text Chunking • Context based Lexicon and HMM based chunker • Statistics were used for chunking by Church(1998) • Corpus frequencies were used • Non-recursive noun phrases were identified • Skut & Brants (1998) modifeid Church approach and used Viterbi Tagger

Hybrid Text Chunking (continue . . .) • Error-driven HMM based text chunker • Memory is decreased by keeping only +ve lexical entries • HMM based text chunker with context-dependent lexicon • Given Gn1 = g1, g2,. . ., gn • Find optimal sequence Tn1 = t1, t2, . . ., tn • Maximize log P( Tn1| Gn1 ) log P( Tn1| Gn1) = log P(Tn1) + log P( Tn1, Gn1) P( Tn1 ) P ( Gn1 )

Shallow Parsing as POS Tagging • CoNLL 2000 : for testing and training • Ratnaparkhi’s maximum entropy based POS tagger • No change in internal operation • Information for training is increased

Shallow Parsing as POS Tagging (continue . . .) Shallow Parsing VS POS Tagging • Shallow Parsing requires more surrounding POS/lexical syntactic environment • Training Configurations • Words w1 w2 w3 • POS Tags t1 t2 t3 • Chunk Types c1 c2 c3 • Suffixes or Prefixes

Shallow Parsing as POS Tagging (continue . . .) • Amount of information is gradually increased • Word w1 • Tag t1 • Word, Tag, Chunk Label (w1 t1 c1) • Current chunk label is accessed through another model with configurations of words and tags (w1 t1) • To deal with sparseness • t1, t2 • c1 • c2 (last two letters) • w1 (first two letters)

Shallow Parsing as POS Tagging (continue . . .) • Word w1

Shallow Parsing as POS Tagging (continue . . .) • Tag t1

Shallow Parsing as POS Tagging (continue . . .) • (w1 t1 c1)

Shallow Parsing as POS Tagging (continue . . .) • Sparseness Handling

Shallow Parsing as POS Tagging (continue . . .) • Over all Results

Shallow Parsing as POS Tagging (continue . . .) Error Analysis • Three groups of errors • Difficult syntactic constructs • Punctuations • Treating di-transitive VPs and transitive VPs • Adjective vs. Adverbial Phrases • Mistakes made in training or testing by annotator • Noise • POS Errors • Odd annotation decisions • Errors peculiar to approach • Exponential Distribution assigns non zero probability to all events • Tagger may assign illegal chunk-labels (I-NP while w is not NP)

Shallow Parsing as POS Tagging (continue . . .) Comments • PPs are easy to identify • ADJP and ADVP are hard to identify correctly (more syntactic information is required) • Performance at NPs can be further improved • Performance using w1 or t1 is almost same. Using both the features enhances performance

References • [1] Philip Brooks, “A Simple Chunk Parser”, May 8, 2003. • [2] Igor Boehm, “Rule Based vs. Statistical Chunking of CoNLL data Set”. • [3] Miles Osborne, “Shallow Parsing as POS Tagging” • [4] Hans van Halteren, “Chunking with WPDV Models” • [5] TakuKudoh and Yuji Matsumoto, “Use of Support Vector Learning for Chunk Identification”, In proceeding of CoNLL-2000 and LLL-2000, page 142-144, Portugal 2000. • [6] ChristerJohanson, “A Context Sensitive Maximum Likelihood Approach to Chunking” • [7] Rob Koeling, “Chunking with Maximum Entropy Models” • [8] JornVeenstra and Antal van den Bosch, “Single Cassifier Memory Based Phrase Chunking” • [9] Guo dong Zhou and Jian Su and TongGuanTey, “Hybrid Text Chunking”

Chunking Shallow Parsing

Chunking Shallow Parsing

Presentation Transcript

Chunk/Shallow Parsing

Chunk- ing ….Chunking!

CHUNKING

Shallow Parsing

Skills -Chunking-

Sequence Classification: Chunking

CHUNKING

Chunking

Shallow Parsing

Chunking

Chunking

Chunking

HDF5 Chunking

Chunking

Chunking: Shallow Parsing

Chunking 101

Shallow Parsing for South Asian Languages

Chunking

EAP 3 Chunking

Chunking

shallow subs - good shallow subwoofer - Best shallow mount subwoofer