240 likes | 359 Views
S EGUE : a Hybrid Case-Based Surface Natural Language Generator. Shimei Pan and James Shaw IBM T.J. Watson Research Center. Overview of the Talk. Motivation Video demonstration System overview The hybrid algorithm Phase 1: Case-based retrieval Phase 2: Rule-based adaptation
E N D
SEGUE: a Hybrid Case-Based Surface Natural Language Generator Shimei Pan and James Shaw IBM T.J. Watson Research Center
Overview of the Talk • Motivation • Video demonstration • System overview • The hybrid algorithm • Phase 1: Case-based retrieval • Phase 2: Rule-based adaptation • Phase 3: Learning • Evaluation • Related Work • Conclusion
Motivation • Small training corpus to enable reuse • High accuracy – conversational systems • Extensible - easy to increase coverage • Variety in output • Efficient in execution
SEGUE (Spoken English Generation Using Examples) • Hybrid • Case-based retrieval • for extensibility, variety, and speed • Rule-based adaptation • for reuse and high accuracy
Match SEGUE Overview Sentence Corpus Input Target Semantic Graph 2 4 3 Adaptation numBedroom 2 House 1 1 5 Substitution Deletion Insertion 5 2 style 3 numBathroom 4 3 yearBuilt 4 5 3 2 1 This 3 bedroom, 2 bathroom colonial house… … TTS This 3 bedroom, 2 bathroom colonial house was built in 1890.
This new home was sold for 500K. • This 3 bedroom, 2 bathroom colonial home was built in 1890. Output This 3 bedroom, 2 bathroom colonial home was built in 1890. Multi-level Adaptation Sentence Corpus Exact match Input No.bedroom: 3 No.bath: 2 Style: Colonial Year: 1890 Substitution • This new home was sold for 500K. • This 2 bedroom, 1 bathroom ranch home was built in 1990. Reule-based deletion and insertion • This apartment was built in 1983. • This 2 bedroom, 1 bathroom colonial home was sold for 500k.
Overview of the Talk • Motivation • Video demonstration • System overview • The hybrid algorithm • Phase 1: Case-based retrieval • Phase 2: Rule-based adaptation • Phase 3: Learning • Evaluation • Related Work • Conclusion
Phase One: Adaptation-Guided Retrieval • Given a new SemGraph, identify a ranked list of similar examples • Compute similarity measure • all-pairs comparison of propositions between SemGraphtarget and SemGraphCorpus • Create a list of adaptation operators • Substitute Cost: $ • Delete $$ • Insert $$$
Ranking Retrieved Cases • SemGraph features – feature similarity • Speech act, theme/rheme etc. “Can you tell me about this colonial house?” ≠ “The style of this house is colonial.” • Adaptability (adaptation guided retrieval) • Operator cost – also captures semantic similarity • Sentence structure SemGraphtarget: The {1995} house is a {Colonial}. SemGraphCorpus: The {1995} {Colonial} house is {in Ardsley}. ReaTree: *The 1995 Colonial house.
Phase Two: Rule-based Adaptation • Adaptation Operators • Substitute • Delete • Insert
Substitute • Correct minor differences between SemGraphTarget and SemGraphCorpus • Examples: 2 houses → 1 housesgolf course → park
Delete • Remove propositions not exist in SemGraphtarget • A reverse-aggregation process • Hypotactic • Remove modifying phrase structure by recursive traversal • Paratactic • Delete/Shift phrase structure and conjunctor (A,B,and C → A and B) • Adaptation-guided retrieval ensures the soundness of the main sentence structure
Insert • Insert proposition in SemGraphtarget not exist in SemGraphCorpus • Incorporate phrases from various instances • Two types of aggregation operators • Paratactic • Hypotactic • Paratactic operators are applied first because they have more restrictive preconditions
Insert (Paratactic) • SEGUE currently supports • Quantification “3 houses are Colonials. 1 house is a Tudor.” • Simple Conjunction “The names of the school districts are Lakeland School District and Panas School District.”
Insert (Hypotactic) • Hypotactic Aggregation in SEGUE: two-step procedure • Extract all the phrases expressing the new proposition • Remember the heads they modify • Whether a premodifier or a post-modifier • Attach the substituted phrase to the head constituent being modified • Aggregation in traditional systems • Transform new proposition into a modifying constituent through a complex lexical process • Attach the transformed phrase to the head constituent being modified (the same as in SEGUE)
Phase Three: Learning • Non-trivial SemGraphtarget and its adapted ReaTree are added to a temporary repository • After manual verification, “learned” cases are added to repository • SEGUE learns from past experience and becomes faster and more accurate over time. • Less chance to make mistakes because SEGUE does not always start from scratch for complex sentences
Evaluation • Corpus • Assertive sentences related to houses • Houses has 20 main attributes, e.g., asking price, property tax, city location, school district • 100 sentences randomly selected from 21699 synthesized SemGraphs, containing 1 to 5 propositions. • Two judges manually evaluated each sentence
Evaluation (2) • Result • Major grammatical/pragmatic errors involved • Multiple sentences being generated • The lack of a referring expression module “I found 3 Colonial houses with 0.2 acre of land. The house has an asphalt roof.” • Baseline: Direct template matching, with 210 sentences in training corpus
Evaluation (3) • Major grammatical/pragmatic errors • Sentences that are too long • Incorrect referring expressions • Inadequate adaptation operators • ?A Colonial 1995 house • ?I found 2 2-bedroom houses ?The house in a city with population less than 4000 in a school district with over 95 percent of seniors attending college has 3 bedrooms and 3 bathrooms. *I found 3 houses with 3 bathrooms in cities with population less than 4000. The house has a crawlspace.
Related Work • Rule-based NLG • Robin93,Lavoie97,Shaw98 • Statistical NLG • Langkilde&Knight98 • Bangalore&Rambow00 • Ratnaparkhi00 • Instance-based NLG • Varges&Mellish01 • Example-based Machine Translation (EBMT) • Brown99, Somers99, Somers01
Significant Distinctions from other Corpus-based Approaches • Required corpus is much smaller, but more richly annotated • Ranking is performed first, not last • The adaptation operations always guarantee grammatical correctness • Do not always generate from scratch
Conclusion • A case-based NL generator with high accuracy requiring only a small annotated corpus • The generated sentences are guaranteed to be grammatically correct by performing rule-based adaptation (in one sentence) • First to incorporate adaptation-guided retrieval in a case-based NLG system • Handles paraphrasing and idioms naturally • Performs faster and more accurate as solutions accumulates