CS 730: Text Mining for Social Media & Collaboratively Generated Content

CS 730: Text Mining for Social Media & Collaboratively Generated Content Lecture 3: Parsing and Chunking

“Big picture” of the course • Language models (word, n-gram, …) • Classification and sequence models • WSD, Part-of-speech tagging • Syntactic parsing and tagging • Next week: semantics • Next->Next week: Info Extraction + Text Mining Intro • Fall break • Part II: Social media (research papers start) CS730: Text Mining for Social Media, F2010

Today’s Lecture Plan • Phrase Chunking • Syntactic Parsing • Machine Learning-based Syntactic Parse CS730: Text Mining for Social Media, F2010

Phrase Chunking CS730: Text Mining for Social Media, F2010

Why Chunking/Parsing? CS730: Text Mining for Social Media, F2010

Phrase Structure (continued) CS730: Text Mining for Social Media, F2010

Types of Phrases • Phrases: classify by part of speech of main word or by syntactic role • subject and predicate; noun phrase and verb phrase In "The young cats drink milk.“ "The young cats" is a noun phrase and the subject;"drink milk" is a verb phrase and the predicate the main word is the head of the phrase: "cats" in "the young cats" • Verb complements and modifiers • types of complements ... noun phrases, adjective phrases, prepositional phrases, particles noun phrase: I served a brownie.adjective phrase: I remained very rich.prepositional phrase: I looked at Fred.particles: He looked up the number. • clauses; clausal complements • I dreamt that I won a million brownies. • tenses: simple past, present, future; progressive, perfect simple present: John bakes cookies.present progressive: John is baking cookies.present perfect: John has baked cookies. • active vs. passive active: Bernie ate the banana.passive: The banana was eaten by Bernie. CS730: Text Mining for Social Media, F2010

Noun Phrase Structure • Left modifiers: • determiner, quantifier, adjective, noun: the five shiny tin cans • Right modifiers: prepositional phrases and apposition • prepositional phrase: the man in the moon • apposition: Scott, the Arctic explorer • Relative clauses the man who ate the popcornthe popcorn which the man atethe man who is eating the popcornthe tourist who was eaten by a lion • Reduced relative clauses the man eating the popcornthe man eaten by a lion CS730: Text Mining for Social Media, F2010

Attachment Ambiguities CS730: Text Mining for Social Media, F2010

Preliminaries: Constraint Grammars/CFGs CS730: Text Mining for Social Media, F2010

CFG (applying rewrite rules) CS730: Text Mining for Social Media, F2010

Preliminaries: CFG (continued) CS730: Text Mining for Social Media, F2010

Parsing CS730: Text Mining for Social Media, F2010

Human parsing CS730: Text Mining for Social Media, F2010

Chunking (from Abney 1994) • “I begin with an intuition: when I read a sentence, I read it a chunk at a time.” • Breaks up something like this: • [I begin] [with an intuition]: [when I read] [a sentence], [I read it] [a chunk] [at a time] • Chunks correspond to prosodic patterns. • Strongest stresses in the sentence fall one to a chunk • Pauses are most likely to fall between chunks • Typical chunk consists of a single content word surrounded by a constellation of function words, matching a fixed template. • A simple context-free grammar is often adequate to describe the structure of chunks. CS730: Text Mining for Social Media, F2010

Chunking (continued) • Text chunking subsumes a range of tasks. • The simplest is finding noun groups or base NPs: • non-recursive noun phrases up to the head (for English). • More ambitious systems may add additional chunk types, such as verb groups • Seek a complete partitioning of the sentence into chunks of different types: [NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ] [PP to ] [NP only $1.8 billion ] [PP in ] [NP September ] .Steve Abney, Parsing by Chunks • The chunks are non-recursive structures which can be handled by finite-state methods (CFGs) • Why do text chunking? • Full parsing is expensive, and is not very robust. • Partial parsing much faster, more robust, sufficient for many applications (IE, QA). • Can also serve as a possible first step for full parsing CS730: Text Mining for Social Media, F2010

Chunking: Rule-based • Quite high performance on NP chunking can be obtained with a small number of regular expressions • With a larger rule set, using Constraint Grammar rules, Voutilainen reports recall of 98%+ with precison of 95-98% for noun chunks. • Atro Voutilainen, NPtool, a Detector of English Noun Phrases, WVLC 93. CS730: Text Mining for Social Media, F2010

Why chunking can be difficult? • Two major sources of error (and these are also error sources for simple finite-state patterns for baseNP): participles and conjunction. • Whether a particple is part of a noun phrase will depend on the particular choice of words He enjoys writing letters.He sells writing paper. • and sometimes is genuinely ambiguous ... He enjoys baking potatoes.He has broken bottles in the basement. • The rules for conjoined NPs are complicated by the bracketing rules of the Penn Tree Bank. • Conjoined prenominal nouns are generally treated as part of a single baseNP: "brick and mortar university" (with "brick and mortar" modifying "university"). • Conjoined heads with shared modifiers are also to be treated as a single baseNP: "ripe apples and bananas"; • If the modifier is not shared, there are two baseNPs: "ripe apples and cinnamon". • Modifier sharing, however, is hard for people to judge and is not consistently annotated CS730: Text Mining for Social Media, F2010

Transformation-Based Learning for Chunking Ramshaw & Marcus, Text Chunking using Transformation-Based Learning, WVLC 1995 • Adapted the TBL method from Brill for POS tagger. One-level NP chunking restated as a word tagging task. • Used 3 tags: • I (inside a baseNP) • O (outside a baseNP) • B (the start of a baseNP which immediately follows another baseNP) • Initial tags assigned based on the most likely tag for a given part-of-speech. • The contexts for TBL rules: • words, part-of-speech assignments, and prior IOB tags. CS730: Text Mining for Social Media, F2010

TBL-based Chunking (2) Ramshaw & Marcus, Text Chunking using Transformation-Based Learning, WVLC 1995 • Results can be scored based on the correct assignment of tags, or on recall and precision of complete baseNPs. • The latter is normally used as the metric, since it corresponds to the actual objective -- different tag sets can be used as an intermediate representation. • Obtained about 92% recall and precision with their system for baseNPs, using 200K words of training. • Without lexical information: 90.5% recall and precision. CS730: Text Mining for Social Media, F2010

Chunking: Classification-based • Classification task: • NP or not NP? • Using classifiers for Chunking • The best performance on the base NP and chunking tasks was obtained using a Support Vector Machine method. • They obtained an accuracy of 94.22% with the small data set of Ramshaw and Marcus, and 95.77% by training on almost the entire Penn Treebank.Taku Kudo; Yuji Matsumoto. Chunking with Support Vector Machines Proc. NAACL 01. CS730: Text Mining for Social Media, F2010

Hand-tuning vs. Machine Learning • BaseNP chunking is a task for which people (with some linguistics training) can write quite good rules quickly. • This raises the practical question of whether we should be using machine learning at all. • If there is already a large relevant resource, it makes sense to learn from it. • However, if we have to develop a chunker for a new language, is it cheaper to annotate some data or to write the rules directly? • Ngai and Yarowsky addressed this question. • They also considered selecting the data to be annotated. • Traditional training is based on sequential text annotation ... we just annotate a series of documents in sequence. • Can we do better? • Ngai, G. and D. Yarowsky, Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking. ACL 2000 CS730: Text Mining for Social Media, F2010

Active Learning Ngai &Yarowsky ACL 2000 • Instead of annotating training examples sequentially, choose good examples • Usually, choose examples “on the boundary” – i.e., for which classifier has low confidence • Very often allows training to converge much faster than sequential/batch learning. • Drawback: requires user in the loop. CS730: Text Mining for Social Media, F2010

Active Learning (continued) Ngai &Yarowsky ACL 2000 CS730: Text Mining for Social Media, F2010

Rule Writing vs. Active Learning Ngai &Yarowsky ACL 2000 CS730: Text Mining for Social Media, F2010

Rule Writing vs. Annotation Learning Ngai &Yarowsky ACL 2000 • Annotation: • Can continue infinitely • Can combine efforts of multiple annotators • More consistent results • Accuracy can be improved by better learning algs. • Rule writing • Must keep in mind rule interactions • Difficult to combine rules from different experts • Requires more skills • Accuracy limited by set of rules (will not improve ever) CS730: Text Mining for Social Media, F2010

correct test trees accuracy Grammar The parsing problem P A R S E R s c o r e r test sentences Recent parsers quite accurate (Eisner, Collins, Charniak, etc.) CS730: Text Mining for Social Media, F2010

Applications of parsing (1/2) tree operations English Chinese • Machine translation (Alshawi 1996, Wu 1997, ...) • Speech synthesis from parses (Prevost 1996) The government plans to raise income tax. The government plans to raise income tax the imagination. • Speech recognition using parsing (Chelba et al 1998) Put the file in the folder. Put the file and the folder. CS730: Text Mining for Social Media, F2010

Applications of parsing (2/2) • Indexing for information retrieval (Woods 1997) ... washing a car with a hose ... vehicle maintenance • Information extraction (Hobbs 1996) • NY Times • archive • Database • query • Grammar checking (Microsoft) CS730: Text Mining for Social Media, F2010

Parsing for the Turing Test • Most linguistic properties are defined over trees. • One needs to parse to see subtle distinctions. E.g.: Sara dislikes criticism of her. (her  Sara) Sara dislikes criticism of her by anyone. (her  Sara) Sara dislikes anyone’s criticism of her. (her = Sara or her  Sara) CS730: Text Mining for Social Media, F2010

What makes a good grammar • Conjunctions must match • I ate a hamburger and on the stove. • I ate a cold hot dog and well burned. • I ate the hot dog slowly and a hamburger CS730: Text Mining for Social Media, F2010

Vanilla CFG not sufficient for NL • Number agreement • a men • DET selection • a apple • Tense, mood, etc. agreement • For now, let’s what it would take to parse English with vanilla CFG CS730: Text Mining for Social Media, F2010

Parsing re-defined CS730: Text Mining for Social Media, F2010

Revised CFG CS730: Text Mining for Social Media, F2010

In: cats scratch people with claws CS730: Text Mining for Social Media, F2010

Soundness and Completeness in Parsing CS730: Text Mining for Social Media, F2010

Top-Down Parsing • Top-down parsing is goal directed • A top-down parser starts with a list of constituents to be built. • The top-down parser rewrites the goals in the goal list by matching one against the LHS of the grammar rules, and expanding it with the RHS, attempting to match the sentence to be derived. • If a goal can be rewritten in several ways, then there is a choice of which rule to apply (search problem) • Can use depth-first or breadth-first search, and goal ordering. CS730: Text Mining for Social Media, F2010

Simple Top-down parsing algorithm • Start with initial state ((S) 1) and no backup states. • Select current state: Take the first state off possibilities list and call it C. • If the possibilities list is empty • then the algorithm fails (that is, no successful parse is possible). • If C consists of an empty symbol list and the word position is at the end of the sentence • then the algorithm succeeds. • Otherwise, generate the next possible states. • If the first symbol on the symbol list of C is a lexical symbol, and the next word in the sentence can be in that class, • then create a new state by removing the first symbol from the symbol list and updating the word position, and add it to the possibilities list. • Otherwise, if the first symbol on the symbol list of C is a non-terminal, generate a new state for each rule in the grammar that can rewrite that nonterminal symbol and add them all to the possibilities list. CS730: Text Mining for Social Media, F2010

Top-down as search • For a depth-first strategy, the possibilities list is a stack. In other words, step 1 always takes the first element off the list, and step 3 always puts the new states on the front of the list, yielding a last-in first-out (LIFO) strategy. • In contrast, in a breadth-first strategy the possibilities list is manipulated as a queue. Step 3 adds the new positions onto the end of the list, rather than the beginning, yielding a first-in first-out (FIR)) strategy. CS730: Text Mining for Social Media, F2010

Top-down example • Grammer: same CFG as before • Lexicon: • cried: V • dogs: N, V • the: ART • Input: The/1 dogs/2 cried/3 • A typical parse state: • ((N VP) 2) • Parser needs to find N followed by VP, starting at position 2 CS730: Text Mining for Social Media, F2010

Parsing “The dogs cried” CS730: Text Mining for Social Media, F2010

Problems with Top-down • Left recursive rules • A top-down parser will do badly if there are many different rules for the same LHS. Consider if there are 600 rules for S, 599 of which start with NP, but one of which starts with V, and the sentence starts with V. • Useless work: expands things that are possible top-down but not there • Top-down parsers do well if there is useful grammar-driven control: search is directed by the grammar • Top-down is hopeless for rewriting parts of speech (preterminals) with words (terminals). In practice that is always done bottom-up as lexical lookup. • Repeated work: anywhere there is common substructure CS730: Text Mining for Social Media, F2010

Bottom-up Parsing • Bottom-up parsing is data directed • The initial goal list of a bottom-up parser is the string to be parsed. If a sequence in the goal list matches the RHS of a rule, then this sequence may be replaced by the LHS of the rule. • Parsing is finished when the goal list contains just the start category. • If the RHS of several rules match the goal list, then there is a choice of which rule to apply (search problem) • Can use depth-first or breadth-first search, and goal ordering. • The standard presentation is as shift-reduce parsing. CS730: Text Mining for Social Media, F2010

Problems with Bottom-up Parsing • Unable to deal with empty categories: termination problem, unless rewriting empties as constituents is somehow restricted (but then it’s generally incomplete) • Useless work: locally possible, but globally impossible. • Inefficient when there is great lexical ambiguity (grammar-driven control might help here) • Conversely, it is data-directed: it attempts to parse the words that are there. • Repeated work: anywhere there is common substructure • Both TD (LL) and BU (LR) parsers can (and frequently do) do work exponential in the sentence length on NLP problems. CS730: Text Mining for Social Media, F2010

Dynamic Programming for Parsing • Systematically fill in tables of solutions to sub-problems. • Store subtrees for each of the various constituents in the input as they are discovered • Cocke-Kasami-Younger (CKY) algorithm, Early’s algorithm, and chart parsing. CS730: Text Mining for Social Media, F2010

CKY algorithm (BU), recognizer version • Input: string of n words • Output: yes/no (since it’s only a recognizer) • Data structure: n x n table • rows labeled 0 to n-1 • columns labeled 1 to n • cell [i,j] lists constituents found between i and j CS730: Text Mining for Social Media, F2010

Miniature Grammar CS730: Text Mining for Social Media, F2010

CKY Example CS730: Text Mining for Social Media, F2010

CKY Algorithm CS730: Text Mining for Social Media, F2010

CS730: Text Mining for Social Media, F2010

CS 730: Text Mining for Social Media & Collaboratively Generated Content