1 / 23

PoS-Tagging theory and terminology

School of Computing FACULTY OF ENGINEERING . PoS-Tagging theory and terminology. COMP3310 Natural Language Processing Eric Atwell, Language Research Group (with thanks to Katja Markert, Marti Hearst, and other contributors) . Reminder: PoS-tagging programs.

ulla
Download Presentation

PoS-Tagging theory and terminology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing Eric Atwell, Language Research Group (with thanks to Katja Markert, Marti Hearst, and other contributors)

  2. Reminder: PoS-tagging programs • Models behind some example PoS-tagging methods in NLTK: • Hand-coded • Statistical taggers • Brill (transformation-based) tagger • NB you don’t have to use NLTK – useful to illustrate

  3. Training and Testing ofMachine Learning Algorithms • Algorithms that “learn” from data see a set of examples and try to generalize from them. • Training set: • Examples trained on • Test set: • Also called held-out data and unseen data • Use this for evaluating your algorithm • Must be separate from the training set • Otherwise, you cheated! • “Gold standard” evaluation set • A test set that a community has agreed on and uses as a common benchmark. DO NOT USE IN TRAINING OR TESTING

  4. PoS word classes in English • Word classes, also called syntactic categories or grammatical categories or Parts of Speech • closed class type: classes with fixed and few members, function words e.g. prepositions; • open class type: large class of members, many new additions, content words e.g. nouns • 8 major word classes: nouns, verbs, adjectives, adverbs, • prepositions, determiners, conjunctions, pronouns • In English, also most (?all) Natural Languages

  5. What properties define “noun”? • Semantic properties: refer to people, places and things • Distributional properties: ability to occur next to determiners, possessives, adjectives (specific locations) • Morphological properties: most occur in singular and plural • These are properties of a word TYPE, • eg “man” is a noun (usually) • Sometimes a given TOKEN may not meet all these criteria … • The men are happy … the man is happy … • They man the lifeboat (?)

  6. Subcategories • Noun • Proper Noun v Common Noun • (Mass noun v Count Noun) • singular v plural • Count v mass (often not covered in PoS-tagsets) • Some tag-sets may have other subcategories, • Eg NNP = common noun with Word Initial Capital • (eg Englishman) • PoS-tagset Often encodes morphological categories like person, number, gender, tense, case . . .

  7. Verb: action or process • VB present/infinitive teach, eat • VBZ 3rd-person-singular present (s-form) teaches, eats • VBG progressive (ing-form) teaching, eating • VBD/VBN past taught, ate/eaten • Intransitive he died, transitive she killed him, … • (transitivity usually not marked in PoS-tags) • Auxiliaries:Modal verb e.g. can, must, may • Have, be, do can be modal or ma verbs • e.g. I have a present v I have given you a present

  8. Adjective: quality or property (of a thing: noun phrase) • English is simple: • JJ big, JJR comparative bigger, JJT superlative biggest • More features in other languages, eg • Agreement (number, gender) with noun • Before a noun v after “be”

  9. Adverb: quality or property of verb or adjective (or other functions…) • A hodge-podge (!) • General adverb often ends –ly slowly, happily (but NOT early) • Place adverb home, downhill • Time adverb now, tomorrow • Degree adverbs very, extremely, somewhat

  10. Function words • Preposition e.g. in of on for over with (to) • Determiner e.g. this that, article the a • Conjunction e.g. and or but because that • Pronoun e.g. personal pronouns • I we (1st person), • you (2nd person), • he she it they (3rd person) • Possessive pronouns my, your, our, their • WH-pronouns what who whoever • Others: negatives (not), interjections (oh), existential there, …

  11. Parts of “multi word expressions” • Particle – like preposition but “part of” a phrasal verb • I looked up her address v I looked up her skirt • I looked her address up v *I looked her skirt up • Big problem for PoS-tagging: common, and ambiguous • Other multi-word idioms: ditto tags

  12. Bigram Markov Model tagger • Naive Method • 1. Get all possible tag sequences of the sentence • 2. Compute the probability of each tag sequence given the • Sentence, using word-tag and tag-bigram probabilites • 3. Take the maximum probability • Problem: This method has exponential complexity! • Solution: Viterbi Algorithm (not discussed in this module)

  13. N-gram tagger • Uses the preceding N-1 predicted tags • Also uses the unigram estimate for the current word

  14. Example • p(AT NN BEZ IN AT NN|The bear is on the move) = • p(the|AT)p(AT|PERIOD)× p(bear|NN)p(NN|AT) . . . • ×p(move|NN)p(NN|AT) • p(AT NN BEZ IN AT VB|The bear is on the move) = • p(the|AT)p(AT|PERIOD)× p(bear|NN)p(NN|AT) . . . • ×p(move|VB)p(VB|AT)

  15. Bigram tagger: problems • Unknown words in new input • Parameter estimation: need a tagged training text, what if this is different genre/dialect/language-type from new input? • Tokenization of training text and new input: contractions (isn’t), multi-word tokens (New York) • crude assumptions • very short distance dependencies • tags are not conditioned on previous words • Unintuitive

  16. Transformation-based tagging • Markov model tagging: small range of regularities only • TB tagging first used by Brill, 1995 • Encodes more complex interdependencies between words • and tags • by learning intuitive rules from a training corpus • exploits linguistic knowledge; rules can be tuned manually

  17. Transformation Templates • Templates specify general, admissible transformations: • Change Tag1 to Tag2 if • The preceding (following) word is tagged Tag3 • The word two before (after) is tagged Tag3 • One of the two preceding (following) words is tagged Tag3 • One of the three preceding (following) words is tagged Tag3 • The preceding word is tagged Tag3 • and the following word is tagged Tag4 • The preceding (following) word is tagged Tag3 • and the word two before (after) is tagged Tag4

  18. Machine Learning Algorithm • Learns rules from tagged training corpus by specialising in templates • 1. Assume you do not know the precise tagging sequence in your training corpus • 2. Tag each word in the training corpus with its most frequent tag, e.g. move => VB • 3. Consider all possible transformations and apply the one that • improves tagging most (greedy search) , • e.g. Change VB to NN if the preceding word is tagged AT • 4. Retag whole corpus applying that rule • 5. Go back to 3 and repeat until no significant improvements are reached • 6. Output all the rules you learnt in order!

  19. Example: 1st cycle • First approximation: Initialise with most frequent tag (lexical information) • The/AT • bear/VB • is/BEZ • on/IN • the/AT • move/VB • to/TO • race/NN • there/RN

  20. Change VB to NN if previous tag is AT • Try all possible transformations, choose the most useful one and apply it: • The/AT • bear/NN • is/BEZ • on/IN • the/AT • move/NN • to/TO • race/NN • there/RN

  21. Change NN to VB if previous tag is TO • Try all possible transformations, choose the most useful one and apply it: • The/AT • bear/NN • is/BEZ • on/IN • the/AT • move/NN • to/TO • race/VB • there/RN

  22. Final set of learnt rules • Brill rules corresponding to syntagmatic patterns • 1. Change VB to NN if previous tag is AT • 2. Change NN to VB if previous tag is TO • Can now be applied to an untagged corpus! • uses pre-encoded linguistic knowledge explicitly • uses wider context + following context • can be expanded to word-driven templates • can be expanded to morphology-driven templates (for unknown words) • learnt rules are intuitive, easy to understand

  23. Combining taggers • Can be combined via backoff: if first tagger finds no tag (None) then try another tagger • This really only makes sense with N-gram taggers: • If trigram tagger finds no tag, backoff to bigram tagger, • if bigram tagger fails then backoff to unigram tagger • Better: combine tagger results by a voting system • Combinatory Hybrid Elementary Analysis of Text • (combines results of morphological analysers / taggers)

More Related