1 / 30

POS Tagging and Context-Free Grammars

This text provides an overview of part-of-speech tagging and context-free grammars, covering topics such as word classes, syntactic constituents, and the identification of a word's part of speech. It also explores different approaches to part-of-speech tagging, including hand-written rules and statistical methods like transformation-based learning.

wburton
Download Presentation

POS Tagging and Context-Free Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. POS Tagging and Context-Free Grammars CS 4705

  2. From Words to Syntactic Structures • Words  morphological structures Cows, cowed, reformulation • Ngrams  statistical co-occurrences The man/unicorn said, The man the • POS  word classes DET N V, DET N DET • Syntactic Constituents  word relationships S  NP VP, S  S conj S

  3. POS • Review: Words of the same class behave similarly • Are these words nouns or adjectives? • a blue seat a child seat • a very blue seat *a very child seat • this seat is blue *this seat is child • What are the word classes? • How do we identify a word’s POS?

  4. How many word classes are there? • A basic set: • Open class/content words: Noun, Verb, Adjective, Adverb, • Closed class/function words: Preposition, Pronoun, Determiner, Auxiliary and copular verbs, Particle, Conjunction

  5. Nouns • Words that describe people, places, things, events, abstractions, activities, … Hector, mirror, New York, cat, love, government • Can take the possessive: Hector’s, the cat’s • Can (usually) occur as plurals: governments, many New Yorks • Can occur with determiners: the cat, Homer’s Hector • Subclasses: • Proper Nouns: Hector, New York • Common Nouns: cat, dog, football • Mass vs count nouns: enumerable or not (cat, sand)

  6. Verbs • Refer to actions, events, conditions, processes Go, kicked, think, manage, trying • Tense: when did the action,… take place? • Present: I kick (simple), I am kicking (progressive), I have kicked (perfect), I have been kicking (perfect progressive) • Past: I kicked, I was kicking, I had kicked, I had been kicking • Future: I will kick, I will be kicking, I will have kicked, I will have been kicking • Aspect: the nature of the action,… -- simple/indefinite, complete, continuing

  7. Adjectives • Describe properties or qualities Pretty, red, careful, cat-like, wishful, silly

  8. Adverbs • Modify verbs or adverbs or …. • Directional or locative: here, upward • Degree modifiersvery, too • Manner: slowly • Temporals: today, now • Are they adverbs or nouns?

  9. Prepositions and Particles • Prepositions: indicate spatial or temporal relations To Boston, From Boston In, for, with, toward, into, by • Particles: act like prepositions or adverbs but behave like semantic units with their verbs • Test: can you move the prep/part and what follows to the front of the sentence? Prep: We ran up the hill. Up the hill we ran. Part: We ran up the bill. *Up the bill we ran.

  10. Some particles with their verbs: Run into (*Into Bill we ran) Find out (*Out the truth we found) Turn on (*On the light we turned) Throw up (*Up his dinner he threw)

  11. Determiners • Articles: the cat, a cat, an idiot • Possessive nouns/pronouns: her cat, Sally’s cat • Numbers: five cats • Indefinite pronouns: each cat, some cats • Demonstrative pronouns: that cat, those cats

  12. Conjunctions • Coordinate: and, but • Subordinate/complementizers: …that the war is over, …because I love you, …unless you change your ways

  13. Pronouns • Personal: I, he,... • Possessive: my, his,… • Indefinite: someone, everyone, anybody, nothing • Interrogative or wh: who, whom,... • And many more…

  14. Auxiliary Verbs • Indicate features of a main verb, such as tense and aspect Be (copula), have, do, can/will/shall/may (modal) He is silent, She has done that, We can help

  15. And more… • Interjections/discourse markers • Existential there : There is a unicorn in the garden • Greetings, politeness terms

  16. Part-of-Speech Tagging • It’s useful to know the POS of words in a sentence • Time/N flies/V like/Prep an/Det arrow/N • Fruit/N flies/N like/V a/DET banana/N

  17. POS can disambiguate • Some words have only one POS tag: is, Mary, very, smallest • Others have a single most likely tag: a, dog • Many are more ambiguous: likes, bass • But luckily….tags tend to co-occur regularly with other tags (e.g. DET N more likely than N DET) • We can learn POS ngram probabilities P(t1|tn-1) from a tagged corpus just as we learn word ngram probabilities

  18. Approaches to POS Tagging • Hand-written rules • Statistical approaches (e.g. HMM-based taggers) • Hybrid systems (e.g. Brill’s TBL: transformation-based learning)

  19. Brill Tagging: TBL • Start with simple rules…learn better ones from tagged corpus • Init: Start with a (hand) tagged corpus and remove the tags from a copy • Tag each word in the copy with most likely POS (obtained from the original or another tagged corpus) • Select a transformation that most improves tagging accuracy (compared to original) • Re-tag the whole corpus applying just this

  20. transformation and put it on the list of transformations • Compare the new tags of the copy to the original • Again, select the transformation that most improves the accuracy of the (better) tags on the copy compared to the original • Iterate until performance doesn’t improve (no transformation improves tagging accuracy) • Result: tagging procedure (set of transformations) which can be applied to new, untagged text

  21. Transformations Change tag a to tag b when….

  22. An Example Time flies like an arrow. • Tag every word with most likely tag and score Time/N flies/V like/V an/DET arrow/N 2) For each template, try every instantiation and apply to tagged corpus and score e.g. Change V to N when the preceding word is tagged V Time/N flies/V like/N an/DET arrow/N e.g. Change V to Prep when the preceding word is tagged V

  23. Time/N flies/V like/Prep an/DET arrow/N 3) Select the transformation rule that most improves the overall accuracy of POS assignments on the training corpus 4) Add the new rule to the tagging procedure list 5) Iterate from (2) until no transformation improves score • Result: ordered list of transformation rules which can be applied sequentially to new, untagged data (after initializing with most common tag)

  24. Basic Constituents and Rewrite Rules • S  NP VP • NP  DET NOM • NP  PropN • NOM  N | NOM • DET  a | an | the • PropN  George | Morocco • N  cat | box • VP  V NP • VP  V • V  exploded

  25. More Constituents and Rules • VP  V PP • PP  Prep NP • Prep  at | over | under | in | by

  26. How to write a grammar • Scenario: You are a lowly programmer in IT at a major financial institution in NYC. Your boss tells you the department needs to port data from an old database in which the person name field was not divided into multiple fields (title, firstname, middle name, surname, suffix) to a new modern database • Your task: Separate these names into their proper fields for the new database • What do you do?

  27. Solutions • Go through the old database names one at a time and type them into the new db • Create a script with regular expressions to search for names with different components and write each out into a standard set of fields • Build an FST to process the names and output field-separated components • Write a Context Free Grammar to parse the names into their constituents

  28. A Name Grammar • Name  Title Firstname Middlename Surname Honorific • Name  Title Firstname Middlename Surname • Name  Firstname Middlename Surname Honorific • Name  Firstname Middlename Surname • Name  Title Firstname MiddleInitial Surname Honorific • …….

  29. A Better Name Grammar • Name  Title BaseName Suffix • Name  Title BaseName • Name  Basename Suffix • Basename  Firstname Middle Surname • Middle  Middlename • Middle  MiddleInitial • Title  Mr. | Mrs.| Ms.| Miss | Dr. | Gen. | … • Suffix  Jr. | Sr. | Esq. | DDS | … • …….

  30. Next Class • How do we use CFGs for parsing • Read Chapter 11

More Related