320 likes | 409 Views
Prosody in Generation. Natural Language Generation (NLG). Typical NLG system does Text planning transforms communicative goal into sequence or structure of elementary goals Sentence planning chooses linguistic resources to achieve those goals Realization produces surface output.
E N D
Natural Language Generation (NLG) • Typical NLG system does • Text planning transforms communicative goal into sequence or structure of elementary goals • Sentence planning chooses linguistic resources to achieve those goals • Realization produces surface output
Research Directions in NLG • Past focus • Hand-crafted rules inspired by small corpora • Very little evaluation • Monologue text generation • New directions • Large-scale corpus-based learning of system components • Evaluation important but howto do it still unclear • Spoken monologue and dialogue
Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current Approaches to CTS • Hand-built systems • Corpus-based systems • NLG Evaluation • Open Questions
Importance of NLG in Dialogue Systems • Conveying information intonationally for conciseness and naturalness • System turns in dialogue systems can be shorter S: Did you say you want to go to Boston? S: (You want to go to)Boston H-H% • Not providingmis-information through misleading prosody ...S: (You want to go to)Boston L-L%
Silverman et al ‘93: • Mimicking human prosody improves transcription accuracy in reverse telephone directory task • Sanderman & Collier ‘97 • Subjects were quicker to respond to ‘appropriately phrased’ ambiguous responses to questions in a monitoring task Q: How did I reserve a room? vs. Which facility did the hotel have? A: I reserved a room L-H% in the hotel with the fax. A: I reserved a room in the hotel L-H% with the fax.
Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current Approaches to CTS • Hand-built systems • Corpus-based systems • NLG Evaluation • Open Questions
Prosodic Generation for TTS • Default prosodic assignment from simple text analysis • Hand-built rule-based system: hard to modify and adapt to new domains • Corpus-based approaches (Sproat et al ’92) • Train prosodic variation on large labeled corpora using machine learning techniques • Accent and phrasing decisions • Associate prosodic labels with simple features of transcripts
# of words in phrase • distance from beginning or end of phrase • orthography: punctuation, paragraphing • part of speech, constituent information • Apply learned rules to new text • Incremental improvements continue: • Adding higher-accuracy parsing (Koehn et al ‘00) • Collins ‘99 parser • More sophisticated learning algorithms (Schapire & Singer ‘00) • Better representations: tree based? • Rules always impoverished • How to define Gold Standard?
Spoken NLG • Decisions in Text-to-Speech (TTS) depend on syntax, information status, topic structure,… information explicitly available to NLG • Concept-to-Speech (CTS) systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how • But….generating prosody for CTS isn’t so easy
Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current approaches to CTS • Hand-built systems • Corpus-based systems • NLG evaluation • Open questions
Relying upon Prior Research • MIMIC CTS (Nakatani & Chu-Carroll ‘00) • Use domain attribute/value distinction to drive phrasing and accent: critical information focussed Movie: October Sky Theatre: Hoboken Theatre Town: Hoboken • Attribute names and values always accented • Values set off by phrase boundaries • Information status conveyed by varying accent type (Pierrehumbert & Hirschberg ‘90) • Old (given) L* • Inferrable (by MIMIC, e.g. theatre name from town) L*+H
Key (to formulating valid query) L+H* • New H* • Marking Dialogue Acts • NotifyFailure: U: Where is “The Corrupter” playing in Cranford. S: “The Corrupter”[L+H*] is not [L+H*] playing in Cranford [L*+H]. • Other rules for logical connectives, clarification and confirmation subdialogues • Contrastive accent for semantic parallelism (Rooth ‘92, Pulman ‘97) used in GoalGetter and OVIS (Theune ‘99) The cat eats fish. The dog eats meat.
But … many counterexamples • Association of prosody with many syntactic, semantic, and pragmatic concepts still an open question • Prosody generation from (past) observed regularities and assumptions: • Information can be ‘chunked’ usefully by phrasing for easier user understanding • But in many different ways • Information status can be conveyed by accent: • Contrastive information is accented? S: You want to go to L+H* Nijmegen, L+H* not Eindhoven.
Given information is deaccented? Speaker/hearer givenness U: I want to go to Nijmegen. S: You want to go to H* Nijmegen? • Intonational contours can convey speech acts, speaker beliefs: • Continuation rise can maintain the floor? S: I am going to get you the train information [L-H%]. • Backchanneling can be produced appropriately? S: Okay. Okay? Okaaay… Mhmm..
Wh and yes-no questions can be signaled appropriately? S: Where do you want to go. S: What is your passport number? • Discourse/topic structure can be signaled by varying pitch range, pausal duration, rate?
Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current Approaches to CTS • Hand-built systems • Corpus-based systems • NLG Evaluation • Open Questions
MAGIC • MM system for presenting cardiac patient data • Developed at Columbia by McKeown and colleagues in conjunction with Columbia Presbyterian Medical Center to automate post-operative status reporting for bypass patients • Uses mostly traditional NLG hand-developed components • Generate text, then annotate prosodically • Corpus-trained prosodic assignment component • Corpus: written and oral patient reports • 50min multi-speaker, spontaneous + 11min single speaker, read • 1.24M word text corpus of discharge summaries
Transcribed, ToBI labeled • Generator features labeled/extracted: • syntactic function • p.o.s. • semantic category • semantic ‘informativeness’ (rarity in corpus) • semantic constituent boundary location and length • salience • given/new • focus • theme/ rheme • ‘importance’ • ‘unexpectedness’
Very hard to label features • Results: new features to specify TTS prosody • Of CTS-specific features only semantic informativeness (likeliness of occuring in a corpus) useful so far (Pan & McKeown ‘99) • Looking at context, word collocation for accent placement helps predict accent (Pan & Hirschberg ‘00) RED CELL (less predictable) vs. BLOOD cell (more) Most predictable words are accented less frequently (40-46%) and least predictable more (73-80%) Unigram+bigram model predicts accent status w/77% (+/-.51) accuracy
Stochastic, Corpus-based NLG • Generate from a corpus rather than hand-built system • For MT task, Langkilde & Knight ‘98 over-generate from traditional hand-built grammar • Output composed into lattice • Linear (bigram) language model chooses best path • But … • no guarantee of grammaticality • How to evaluate/improve results? • How to incorporate prosody into this kind of generation model?
FERGUS (Bangalore & Rambow ‘00) • Corpus-based learning to refine syntactic, lexical and prosodic choice • Domain is DARPA Communicator task (air travel information) • Uses stochastic tree model + linear LM + XTAG (hand-crafted) grammar • Trained on WSJ dependency trees tagged with p.o.s., morphological information, syntactic SuperTags (grammatical function, subcat frame, arg realization), WordNet sense tags and prosodic labels (accent and boundary)
Input: • Dependency tree of lexemes • Any feature can be specified, e.g. syntactic, prosodic control poachers <L+H*> now trade the underground
Tree Chooser: • Selects syntactic/prosodic properties for input nodes based match with features of mothers and daughters in corpus control poachers<L+H*> now trade the underground
Unraveler: • Produces lattice of all syntactically possible linearizations of tree using XTAG grammar underground poachers trade now control the s now poachers underground trade
Linear Precedence Chooser: • Finds most likely lattice traversal, using trigram language model Now [H*] poachers [L+H*] [L-] control the underground trade [H*] [L-L%]. • Many ways to implement each step • How to choose which works ‘best’? • How to evaluate output?
Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current Approaches to CTS • Hand-built systems • Corpus-based systems • NLG Evaluation • Open Questions
Evaluating NLG • How to judge success/progress in NLG an open question • Qualitative measures: preference • Quantitative measures: • task performance measures: speed, accuracy • automatic comparison to a reference corpus (e.g. string edit-distance and variants, tree-similarity-based metrics) • Not always a single “best” solution • Critical for stochastic systems to combine qualitative judgments with quantitative measures (Walker et al ’97)
Qualitative Validation of Quantitative Metrics • Subjects judged understandability and quality • Candidates proposed by 4 evaluation metrics to minimize distance from Gold Standard (Bangalore, Rambow & Whittaker ‘00) • Tree-based metrics correlate significantly with understandability and quality judgments -- string metrics do not • New objective metrics learned • Understandability accuracy = (1.31*simple tree accuracy -.10*substitutions=.44)/.87 • Quality accuracy = (1.02*simple tree accuracy - .08*substitutions - .35)/.67
Overview • Spoken NLG in Dialogue Systems • Text-to-Speech (TTS) vs. Concept-to-Speech (CTS) • Current Approaches to CTS • Hand-built systems • Corpus-based systems • NLG Evaluation • Open Questions
More Open Questions for Spoken NLG • How much to model human original? • Planning for appropriate intonational variation even important in recorded prompts • Timing and backchanneling • What kind of output is most comprehensible? • What kind of output elicits most easily understood user response? (Gustafson et al ’97,Clark & Brennan ‘99) • Implementing variations in dialogue strategy • Implicit confirmation • Mixed initiative