260 likes | 362 Views
Introduction to Syntax, with Part-of-Speech Tagging. Owen Rambow September 17 & 19. Admin Stuff. These slides available at http://www.cs.columbia.edu/~rambow/teaching.html For Eliza in homework, you can use a tagger or chunker, if you want – details at:
E N D
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19
Admin Stuff • These slides available at • http://www.cs.columbia.edu/~rambow/teaching.html • For Eliza in homework, you can use a tagger or chunker, if you want – details at: • http://www.cs.columbia.edu/~ani/cs4705.html • Special office hours (Ani): today after class, tomorrow at 10am in CEPSR 721
Statistical POS Tagging • Want to choose most likely string of tags (T), given the string of words (W) • W = w1, w2, …, wn • T = t1, t2, …, tn • I.e., want argmaxT p(T | W) • Problem: sparse data
Statistical POS Tagging (ctd) • p(T|W) = p(T,W) / p(W) = p(W|T) p (T) / p(W) • argmaxT p(T|W) = argmaxT p(W|T) p (T) / p(W) = argmaxT p(W|T) p (T)
Statistical POS Tagging (ctd) p(T) = p(t1, t2, …, tn-1 , tn) = p(tn | t1, …, tn-1 ) p (t1, …, tn-1) = p(tn | t1, …, tn-1 ) p(tn-1 | t1, …, tn-2) p (t1, …, tn-2) = ip(ti | t1, …, ti-1 ) ip(ti | ti-2, ti-1 ) trigram (n-gram)
Statistical POS Tagging (ctd) p(W|T) = p(w1, w2, …, wn | t1, t2, …, tn ) = ip(wi | w1, …, wi-1, t1, t2, …, tn) ip(wi | ti )
Statistical POS Tagging (ctd) argmaxT p(T|W) = argmaxT p(W|T) p (T) argmaxT ip(wi | ti ) p(ti | ti-2, ti-1 ) • Relatively easy to get data for parameter estimation (next slide) • But: need smoothing for unseen words • Easy to determine the argmax (Viterbi algorithm in time linear in sentence length)
Probability Estimation for trigram POS Tagging Maximum-Likelihood Estimation • p’ ( wi | ti ) = c( wi, ti ) / c( ti ) • p’ ( ti | ti-2, ti-1 ) = c( ti, ti-2, ti-1 ) / c( ti-2, ti-1 )
Statistical POS Tagging • Method common to many tasks in speech & NLP • “Noisy Channel Model”, Hidden Markov Model
S nonterminal symbols = constituents NP likes NP boy girl DetP DetP the a terminal symbols = words Back to Syntax • (((the/Det) boy/N) likes/V((a/Det) girl/N)) Phrase-structure tree
S NP likes NP likes/V boy girl DetP DetP the boy/N girl/N a the/Det a/Det Phrase Structure and Dependency Structure
likes/V sometimes/Adv boy/N girl/N the/Det small/Adj a/Det very/Adv Types of Dependency Adj(unct) Obj Subj Fw Adj Fw Adj
Grammatical Relations • Types of relations between words • Arguments: subject, object, indirect object, prepositional object • Adjuncts: temporal, locative, causal, manner, … • Function Words
Subcategorization • List of arguments of a word (typically, a verb), with features about realization (POS, perhaps case, verb form etc) • In canonical order Subject-Object-IndObj • Example: • like: N-N, N-V(to-inf) • see: N, N-N, N-N-V(inf) • Note: J&M talk about subcategorization only within VP
S S NP NP VP NP NP likes boy girl DetP DetP likes boy girl DetP DetP the the a a Where is the VP?
Where is the VP? • Existence of VP is a linguistic (empirical) claim, not a methodological claim • Semantic evidence??? • Syntactic evidence • VP-fronting (and quickly clean the carpet he did! ) • VP-ellipsis (He cleaned the carpets quickly, and so did she ) • Can have adjuncts before and after VP, but not in VP (He often eats beans, *he eats often beans ) • Note: in all right-branching structures, issue is different again
Penn Treebank, Again • Syntactically annotated corpus (phrase structure) • PTB is not naturally occurring data! • Represents a particular linguistic theory (but a fairly “vanilla” one) • Particularities • Very indirect representation of grammatical relations (need for head percolation tables) • Completely flat structure in NP (brown bag lunch, pink-and-yellow child seat ) • Has flat Ss, flat VPs
Context-Free Grammars • Defined in formal language theory (comp sci) • Terminals, nonterminals, start symbol, rules • String-rewriting system • Start with start symbol, rewrite using rules, done when only terminals left
CFG: Example • Rules • S NP VP • VP V NP • NP Det N | AdjP NP • AdjP Adj | Adv AdjP • N boy | girl • V sees | likes • Adj big | small • Adv very • Det a | the the very small boy likes a girl
Derivations of CFGs • String rewriting system: we derive a string (=derived structure) • But derivation history represented by phrase-structure tree (=derivation structure)!
Grammar Equivalence and Normal Form • Can have different grammars that generate same set of strings (weak equivalence) • Can have different grammars that have same set of derivation trees (string equivalence)
Nobody Uses CFGs Only (Except Intro NLP Courses) • All major syntactic theories (Chomsky, LFG, HPSG, TAG-based theories) represent both phrase structure and dependency, in one way or another • All successful parsers currently use statistics about phrase structure and about dependency
Massive Ambiguity of Syntax • For a standard sentence, and a grammar with wide coverage, there are 1000s of derivations! • Example: • The large head master told the man that he gave money and shares in a letter on Wednesday