CSA2050: Introduction to Computational Linguistics

CSA2050:Introduction to Computational Linguistics Part ofSpeech (POS) Tagging II Transformation Based Tagging Brill (1995)

3 Approaches to Tagging • Rule-Based Tagger: ENGTWOL Tagger(Voutilainen 1995) • Stochastic Tagger: HMM-based Tagger • Transformation-Based Tagger: Brill Tagger(Brill 1995) CSA3050: Tagging III and Chunking

Transformation-Based Tagging • A combination of rule-based and stochastic tagging methodologies: • like the rule-based tagging because rules are used to specify tags in a certain environment; • like stochastic tagging, because machine learning is used. • uses Transformation-Based Learning (TBL) • Input: • tagged corpus  dictionary (with most frequent tags) CLINT Lecture IV

Transformation-Based Tagging Basic Process: • Set the most probable tag for each word as a start value, e.g. tag all “race” with NNP(NN|race) = .98P(VB|race) = .02 • The set of possible transformations is limited • by using a fixed number of rule templates, containing slots and • allowing a fixed number of fillers to fill the slots CLINT Lecture IV

Transformation Based Error Driven Learning unannotated text initial state annotated text retag TRUTH learner transformation rules diagram after Brill (1996) CLINT Lecture IV

TBL Requirements • Initial State Annotator • List of allowable transformations • Scoring function • Search strategy CLINT Lecture IV

Initial State Annotation • Input • Corpus • Dictionary • Frequency counts for each entry • Output • Corpus tagged with most frequent tags CSA3050: Tagging III and Chunking

Transformations Each transformation comprises • A source tag • A target tag • A triggering environment Example • NN • VB • Previous tag is TO CSA3050: Tagging III and Chunking

More Examples Source tag Target Tag Triggering Environment NN VB previous tag is TOVBP VB one of the three previous tags is MD JJR RBR next tag is JJ VBP VB one of the two previous words is n’t CSA3050: Tagging III and Chunking

TBL Requirements • Initial State Annotator • List of allowable transformations • Scoring function • Search strategy CSA3050: Tagging III and Chunking

Schema ti-3 ti-2 ti-1 ti ti+1 ti+2 ti+3 1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * Rule Templates- triggering environments CLINT Lecture IV

Set of Possible Transformations The set of possible transformations is enumerated by allowing • every possible tag or word • in every possible slot • in every possible schema This set can get quite large CSA3050: Tagging III and Chunking

Rule Types and InstancesBrill’s Templates • Each rule begins with change tag a to tag b • The variables a,b,z,w range over POS tags • All possible variable substitutions are considered CLINT Lecture IV

TBL Requirements • Initial State Annotator • List of allowable transformations • Scoring function • Search strategy CSA3050: Tagging III and Chunking

Scoring Function For a given tagging state of the corpusFor a given transformation For every word position in the corpus • If the rule applies and yields a correct tag, increment score by 1 • If the rule applies and yields an incorrect tag, decrement score by 1 CSA3050: Tagging III and Chunking

The Basic Algorithm • Label every word with its most likely tag • Repeat the following until a stopping condition is reached. • Examine every possible transformation, selecting the one that results in the most improved tagging • Retag the data according to this rule • Append this rule to output list • Return output list CLINT Lecture IV

Examples of learned rules CLINT Lecture IV

TBL: Remarks • Execution Speed: TBL tagger is slower than HMM approach. • Learning Speed is slow: Brill’s implementation over a day (600k tokens) BUT … • Learns small number of simple, non-stochastic rules • Can be made to work faster with Finite State Transducers CLINT Lecture IV

Tagging Unknown Words • New words added to (newspaper) language 20+ per month • Plus many proper names … • Increases error rates by 1-2% • Methods • Assume the unknowns are nouns. • Assume the unknowns have a probability distribution similar to words occurring once in the training set. • Use morphological information, e.g. words ending with –ed tend to be tagged VBN. CLINT Lecture IV

Evaluation • The result is compared with a manually coded “Gold Standard” • Typically accuracy reaches 95-97% • This may be compared with the result for a baseline tagger (one that uses no context). • Important: 100% accuracy is impossible even for human annotators. CLINT Lecture IV

A word of caution • 95% accuracy: every 20th token wrong • 96% accuracy: every 25th token wrong • an improvement of 25% from 95% to 96% ??? • 97% accuracy: every 33th token wrong • 98% accuracy: every 50th token wrong CLINT Lecture IV

How much training data is needed? • When working with the STTS (50 tags) we observed • a strong increase in accuracy when testing on 10´000, 20´000, …, 50´000 tokens, • a slight increase in accuracy when testing on up to 100´000 tokens, • hardly any increase thereafter. CLINT Lecture IV

Summary • Tagging decisions are conditioned on a wider range of events that HMM models mentioned earlier. For example, left and right context can be used simultaneously. • Learning and tagging are simple, intuitive and understandable. • Transformation-based learning has also been applied to sentence parsing. CLINT Lecture IV

The Three Approaches Compared • Rule Based • Hand crafted rules • It takes too long to come up with good rules • Portability problems • Stochastic • Find the sequence with the highest probability – Viterbi Algorithm • Result of training not accessible to humans • Large volume of intermediate results • Transformation • Rules are learned • Small number of rules • Rules can be inspected and modified by humans CLINT Lecture IV

CSA2050: Introduction to Computational Linguistics

CSA2050: Introduction to Computational Linguistics

Presentation Transcript

Centering

Introduction to SNP and Haplotype Analysis

The Birth of Modern Linguistics

Text summarization

The scope of linguistics

Computational Tools for Linguists

Language and Linguistics

Corpus Linguistics: Introduction

Lexical networks, lexical centrality, and text mining

Applications (1 of 2): Information Retrieval

Computational Coalition Formation

EMiL Experimental Methods in Linguistics

Learning linguistic structure

AN INTRODUCTION TO MATLAB

Applied Linguistics: An Introduction

Learning linguistic structure

XI Theories and Schools of Modern Linguistics

Lecture 8: Computational Complexity

CS 8520: Artificial Intelligence

Computational Proteomics and Metabolomics

AN INTRODUCTION TO THE STUDY OF THE ENGLISH LANGUAGE