520 likes | 673 Views
EECS 595 / LING 541 / SI 661. Natural Language Processing. Fall 2005 Lecture Notes #7. Natural Language Generation. What is NLG?. Mapping meaning to text Stages: Content selection Lexical selection Sentence structure: aggregation, referring expressions Discourse structure.
E N D
EECS 595 / LING 541 / SI 661 Natural Language Processing Fall 2005 Lecture Notes #7
What is NLG? • Mapping meaning to text • Stages: • Content selection • Lexical selection • Sentence structure: aggregation, referring expressions • Discourse structure
Systemic grammars • Language is viewed as a resource for expressing meaning in context (Halliday, 1985) • Layers: mood, transitivity, theme
Example (:process save-1:actor system-1:goal document-1:speechact assertion:tense future ) Input is underspecified
The Functional Unification Formalism (FUF) • Based on Kay’s (83) formalism • partial information, declarative, uniform, compact • same framework used for all stages: syntactic realization, lexicalization, and text planning
Functional analysis • Functional vs. structured analysis • “John eats an apple” • actor (John), affected (apple), process (eat) • NP VP NP • suitable for generation
Partial vs. complete specification action = eat • Voice: An apple is eaten by John • Tense: John ate an apple • Mode: Did John ear an apple? • Modality: John must eat an apple • prolog: p(X,b,c) actor = John object = apple
Unification • Target sentence • input FD • grammar • unification process • linearization process
Sample input ((cat s) (prot ((n ((lex john))))) (verb ((v ((lex like))))) (goal ((n ((lex mary))))))
Sample grammar ((alt top (((cat s) (prot ((cat np))) (goal ((cat np))) (verb ((cat vp) (number {prot number}))) (pattern (prot verb goal))) ((cat np) (n ((cat noun) (number {^ ^ number}))) (alt (((proper yes) (pattern (n))) ((proper no) (pattern (det n)) (det ((cat article) (lex “the”))))))) ((cat vp) (pattern (v)) (v ((cat verb))))((cat noun)) ((cat verb)) ((cat article)))))
Sample output ((cat s) (goal ((cat np) (n ((cat noun) (lex mary) (number {goal number}))) (pattern (n)) (proper yes))) (pattern (prot verb goal)) (prot ((cat np) (n ((cat noun) (lex john) (number {verb number}))) (number {verb number}) (pattern (n)) (proper yes))) (verb ((cat vp) (pattern (v)) (v ((cat verb) (lex like))))))
Comparison with Prolog • Similarities: • both have unification at the core • Prolog program = FUF grammar • Prolog query = FUF input • Differences: • Prolog: first order term unification • FUF: arbitrarily rooted directed graphs are unified
The SURGE grammar • Syntactic realization front-end • variable level of abstraction • 5600 branches and 1600 alts Lexicalized FD Syntactic FD LinearizerMorphology Lexicalchooser SURGE Text
Systems developed using FUF/SURGE • COMET • MAGIC • ZEDDOC • PLANDOC • FLOWDOC • SUMMONS
CFUF • Fast implementation by Mark Kharitonov (C++) • Up to 100 times faster than Lisp/FUF • Speedup higher for larger inputs
References • Cole, Mariani, Uszkoreit, Zaenen, Zue (eds.) Survey of the State of the Art in Human Language Technology, 1995 • Elhadad, Using Argumentation to Control Lexical Choice: A Functional Unification Implementation, 1993 • Elhadad, FUF: the Universal Unifier, User Manual, 1993 • Elhadad and Robin, SURGE: a Comprehensive Plug-in Syntactic Realization Component for Text Generation, 1999 • Kharitonov, CFUF: A Fast Interpreter for the Functional Unification Formalism, 1999 • Radev, Language Reuse and Regeneration: Generating Natural Language Summaries from Multiple On-Line Sources, Department of Computer Science, Columbia University, October 1998
Path notation • You can view a FD as a tree • To specify features, you can use a path • {feature feature … feature} value • e.g. {prot number} • You can also use relative paths • {^ number} value => the feature number for the current node • {^ ^ number} value => the feature number for the node above the current node
Sample grammar ((alt top (((cat s) (prot ((cat np))) (goal ((cat np))) (verb ((cat vp) (number {prot number}))) (pattern (prot verb goal))) ((cat np) (n ((cat noun) (number {^ ^ number}))) (alt (((proper yes) (pattern (n))) ((proper no) (pattern (det n)) (det ((cat article) (lex “the”))))))) ((cat vp) (pattern (v)) (v ((cat verb))))((cat noun)) ((cat verb)) ((cat article)))))
The problem • Discourse • Monologue and Dialogue (dialog) • Human-computer interaction • Example: John went to Bill’s car dealership to check out an Acura Integra. He looked at it for about half an hour. • Example: I’d like to get from Boston to San Francisco, on either December 5th or December 6th. It’s okay if it stops in another city along the way.
Information extraction and discourse analysis • Example: First Union Corp. is continuing to wrestle with severe problems unleashed by a botched merger and a troubled business strategy. According to industry insiders at Paine Webber, their president, John R. Georgius, is planning to retire by the end of the year. • Problems with summarization and generation
Reference resolution • The process of reference (associating “John” with “he”). • Referring expressions and referents. • Needed: discourse models • Problem: many types of reference!
Example (from Webber 91) • According to John, Bob bought Sue an Integra, and Sue bough Fred a legend. • But that turned out to be a lie. - referent is a speech act. • But that was false. - proposition • That struck me as a funny way to describe the situation. - manner of description • That caused Sue to become rather poor. - event • That caused them both to become rather poor. - combination of several events.
Reference phenomena • Indefinite noun phrases: I saw an Acura Integra today. • Definite noun phrases: The Integra was white. • Pronouns: It was white. • Demonstratives: this Acura. • Inferrables: I almost bought an Acura Integra today, but a door had a dent and the engine seemed noisy. • Mix the flour, butter, and water. Kneed the dough until smooth and shiny.
Constraints on coreference • Number agreement: John has an Acura. It is red. • Person and case agreement: (*) John and Mary have Acuras. We love them (where We=John and Mary) • Gender agreement: John has an Acura. He/it/she is attractive. • Syntactic constraints: • John bought himself a new Acura. • John bought him a new Acura. • John told Bill to buy him a new Acura. • John told Bill to buy himself a new Acura • He told Bill to buy John a new Acura.
Preferences in pronoun interpretation • Recency: John has an Integra. Bill has a Legend. Mary likes to drive it. • Grammatical role: John went to the Acura dealership with Bill. He bought an Integra. • (?) John and Bill went to the Acura dealership. He bought an Integra. • Repeated mention: John needed a car to go to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.
Preferences in pronoun interpretation • Parallelism: Mary went with Sue to the Acura dealership. Sally went with her to the Mazda dealership. • ??? Mary went with Sue to the Acura dealership. Sally told her not to buy anything. • Verb semantics: John telephoned Bill. He lost his pamphlet on Acuras. John criticized Bill. He lost his pamphlet on Acuras.
An algorithm for pronoun resolution • Two steps: discourse model update and pronoun resolution. • Salience values are introduced when a noun phrase that evokes a new entity is encountered. • Salience factors: set empirically.
Lappin and Leass (cont’d) • Recency: weights are cut in half after each sentence is processed. • Examples: • An Acura Integra is parked in the lot. (subject) • There is an Acura Integra parked in the lot. (existential predicate nominal) • John parked an Acura Integra in the lot. (object) • John gave Susan an Acura Integra. (indirect object) • In his Acura Integra, John showed Susan his new CD player. (demarcated adverbial PP)
Algorithm • Collect the potential referents (up to four sentences back). • Remove potential referents that do not agree in number or gender with the pronoun. • Remove potential referents that do not pass intrasentential syntactic coreference constraints. • Compute the total salience value of the referent by adding any applicable values for role parallelism (+35) or cataphora (-175). • Select the referent with the highest salience value. In case of a tie, select the closest referent in terms of string position.
Example • John saw a beautiful Acura Integra at the dealership last week. He showed it to Bill. He bought it.
Observations • Lappin & Leass - tested on computer manuals - 86% accuracy on unseen data. • Centering (Grosz, Josh, Weinstein): additional concept of a “center” – at any time in discourse, an entity is centered. • Backwards looking center; forward looking centers (a set). • Centering has not been automatically tested on actual data.
Discourse structure • (*) Bill went to see his mother. The trunk is what makes the bonsai, it gives it both its grace and power. • Coherence principle: • John hid Bill’s car keys. He was drunk • ?? John hid Bill’s car keys. He likes spinach • Rhetorical Structure Theory (Mann, Matthiessen, and Thompson)
Example (from MMT) 1) Title: Bouquets in a basket - with living flowers 2) There is a gardening revolution going on. 3) People are planting flower baskets with living plants, 4) mixing many types in one container for a full summer of floral beauty. 5) To create your own "Victorian" bouquet of flowers, 6) choose varying shapes, sizes and forms, besides a variety of complementary colors. 7) Plants that grow tall should be surrounded by smaller ones and filled with others that tumble over the side of a hanging basket. 8) Leaf textures and colors will also be important. 9) There is the silver-white foliage of dusty miller, the feathery threads of lotus vine floating down from above, the deep greens, or chartreuse, even the widely varied foliage colors of the coleus. Christian Science Monitor, April, 1983