1 / 26

Introduction to Syntax, with Part-of-Speech Tagging

Introduction to Syntax, with Part-of-Speech Tagging. Owen Rambow September 17 & 19. Admin Stuff. These slides available at http://www.cs.columbia.edu/~rambow/teaching.html For Eliza in homework, you can use a tagger or chunker, if you want – details at:

deliz
Download Presentation

Introduction to Syntax, with Part-of-Speech Tagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19

  2. Admin Stuff • These slides available at • http://www.cs.columbia.edu/~rambow/teaching.html • For Eliza in homework, you can use a tagger or chunker, if you want – details at: • http://www.cs.columbia.edu/~ani/cs4705.html • Special office hours (Ani): today after class, tomorrow at 10am in CEPSR 721

  3. Statistical POS Tagging • Want to choose most likely string of tags (T), given the string of words (W) • W = w1, w2, …, wn • T = t1, t2, …, tn • I.e., want argmaxT p(T | W) • Problem: sparse data

  4. Statistical POS Tagging (ctd) • p(T|W) = p(T,W) / p(W) = p(W|T) p (T) / p(W) • argmaxT p(T|W) = argmaxT p(W|T) p (T) / p(W) = argmaxT p(W|T) p (T)

  5. Statistical POS Tagging (ctd) p(T) = p(t1, t2, …, tn-1 , tn) = p(tn | t1, …, tn-1 ) p (t1, …, tn-1) = p(tn | t1, …, tn-1 ) p(tn-1 | t1, …, tn-2) p (t1, …, tn-2) = ip(ti | t1, …, ti-1 )  ip(ti | ti-2, ti-1 )  trigram (n-gram)

  6. Statistical POS Tagging (ctd) p(W|T) = p(w1, w2, …, wn | t1, t2, …, tn ) = ip(wi | w1, …, wi-1, t1, t2, …, tn)  ip(wi | ti )

  7. Statistical POS Tagging (ctd) argmaxT p(T|W) = argmaxT p(W|T) p (T)  argmaxT ip(wi | ti ) p(ti | ti-2, ti-1 ) • Relatively easy to get data for parameter estimation (next slide) • But: need smoothing for unseen words • Easy to determine the argmax (Viterbi algorithm in time linear in sentence length)

  8. Probability Estimation for trigram POS Tagging Maximum-Likelihood Estimation • p’ ( wi | ti ) = c( wi, ti ) / c( ti ) • p’ ( ti | ti-2, ti-1 ) = c( ti, ti-2, ti-1 ) / c( ti-2, ti-1 )

  9. Statistical POS Tagging • Method common to many tasks in speech & NLP • “Noisy Channel Model”, Hidden Markov Model

  10. S nonterminal symbols = constituents NP likes NP boy girl DetP DetP the a terminal symbols = words Back to Syntax • (((the/Det) boy/N) likes/V((a/Det) girl/N)) Phrase-structure tree

  11. S NP likes NP likes/V boy girl DetP DetP the boy/N girl/N a the/Det a/Det Phrase Structure and Dependency Structure

  12. likes/V sometimes/Adv boy/N girl/N the/Det small/Adj a/Det very/Adv Types of Dependency Adj(unct) Obj Subj Fw Adj Fw Adj

  13. Grammatical Relations • Types of relations between words • Arguments: subject, object, indirect object, prepositional object • Adjuncts: temporal, locative, causal, manner, … • Function Words

  14. Subcategorization • List of arguments of a word (typically, a verb), with features about realization (POS, perhaps case, verb form etc) • In canonical order Subject-Object-IndObj • Example: • like: N-N, N-V(to-inf) • see: N, N-N, N-N-V(inf) • Note: J&M talk about subcategorization only within VP

  15. S S NP NP VP NP NP likes boy girl DetP DetP likes boy girl DetP DetP the the a a Where is the VP?

  16. Where is the VP? • Existence of VP is a linguistic (empirical) claim, not a methodological claim • Semantic evidence??? • Syntactic evidence • VP-fronting (and quickly clean the carpet he did! ) • VP-ellipsis (He cleaned the carpets quickly, and so did she ) • Can have adjuncts before and after VP, but not in VP (He often eats beans, *he eats often beans ) • Note: in all right-branching structures, issue is different again

  17. Penn Treebank, Again • Syntactically annotated corpus (phrase structure) • PTB is not naturally occurring data! • Represents a particular linguistic theory (but a fairly “vanilla” one) • Particularities • Very indirect representation of grammatical relations (need for head percolation tables) • Completely flat structure in NP (brown bag lunch, pink-and-yellow child seat ) • Has flat Ss, flat VPs

  18. Context-Free Grammars • Defined in formal language theory (comp sci) • Terminals, nonterminals, start symbol, rules • String-rewriting system • Start with start symbol, rewrite using rules, done when only terminals left

  19. CFG: Example • Rules • S  NP VP • VP  V NP • NP  Det N | AdjP NP • AdjP  Adj | Adv AdjP • N  boy | girl • V  sees | likes • Adj  big | small • Adv  very • Det  a | the the very small boy likes a girl

  20. Derivations of CFGs • String rewriting system: we derive a string (=derived structure) • But derivation history represented by phrase-structure tree (=derivation structure)!

  21. Grammar Equivalence and Normal Form • Can have different grammars that generate same set of strings (weak equivalence) • Can have different grammars that have same set of derivation trees (string equivalence)

  22. Nobody Uses CFGs Only (Except Intro NLP Courses) • All major syntactic theories (Chomsky, LFG, HPSG, TAG-based theories) represent both phrase structure and dependency, in one way or another • All successful parsers currently use statistics about phrase structure and about dependency

  23. Massive Ambiguity of Syntax • For a standard sentence, and a grammar with wide coverage, there are 1000s of derivations! • Example: • The large head master told the man that he gave money and shares in a letter on Wednesday

  24. Some Syntactic Constructions: Wh -Movement

  25. Control

  26. Raising

More Related