1 / 19

Share and Share Alike: Resources for Language Generation

Share and Share Alike: Resources for Language Generation. Prof. Marilyn Walker University of Sheffield NSF- 20 April 2007. What type of resource is needed for generation?. What type of scientific problem is generation?

ronat
Download Presentation

Share and Share Alike: Resources for Language Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Share and Share Alike: Resources for Language Generation Prof. Marilyn Walker University of Sheffield NSF- 20 April 2007

  2. What type of resource is needed for generation? • What type of scientific problem is generation? • An essential difference between language generation and language interpretation problems (parsing, WSD, relation extraction, coreference) is that there is no single right answer for language generation; • Language Productivity Assumption: An optimal generation resource will represent multiple outputs for each input, with a human-generated quality metric associated with each output

  3. Dialogue vs. generation? • Dialogue is like generation in that there is no single right answer for how to do a task in dialogue; • Information gathering and information presentation in dialogue systems are generation problems; • DARPA evaluation for dialogue systems; • Fixed domain “TRAVEL PLANNING” • First: ATIS evaluations compared dialogue system behaviour against human behaviour in corpus of human-wizard dialogues (Hirschman 2000); • No “mixed initiative”, different dialogue strategies, divergence of context, user modeling;

  4. Dialogue vs. generation? • Second: define context, evaluate on system response to user utterance in a particular context; • Much more like generation, context is defined, system ‘communicative goal’ is defined • Form: How is ‘the same response’ defined? Some forms for identical content may be better than others; • Content: User Models, definitions of context. Also dialogue system should be able to decide on communicative goal.

  5. Dialogue vs. generation? • Third: Communicator evaluation: given user task (NYC to LHR, Continental, April 22nd, 2007), collect metrics (time to completion, ASR error, utterance output quality, concept understanding, user satisfaction); • Corpus semi-automatically labelled with dialogue act (quality/strategy metrics) for system utterances (8 or more different instantiations from different systems for particular communicative goals); • Try to understand which metrics are contributors to user satisfaction (PARADISE); • User utterance labelled subsequently, used in RL experiments comparing dialogue strategies; • Hard to compare particular scientific techniques for particular modules in systems, plug and play never worked

  6. Dialogue vs. generation: Conclusions? • Just having a fixed task (TRAVEL) by itself does not necessarily lead to scientific progress; • Want to compare particular scientific techniques for particular modules in systems; • Plug and play is the only way to do this; • BUT: very hard to define for a whole community what interfaces between modules should be

  7. Position • What type of resources would be useful for scientific advancement in language generation?? • Almost anything!! • “If you build it they will come” - “If its useful people will use it” • Can we leverage what we already have in our own research groups, share it, and make it better?

  8. What is needed to incentivize data sharing • Many different domains/problems/modules => NEED LOTS OF DIFFERENT RESOURCES; • Resources costly (developing group not ‘finished’ yet) => FINANCIAL INCENTIVE; SCIENTIFIC INCENTIVE; CITATION INCENTIVE; • Costs too much to support resource preparation, maintenance, distribution and re-use => NSF/LDC FINANCIAL/SUPPORT • NOTE: MANY LDC RESOURCES ARE ``FOUND DATA’’ (not explicitly commissioned)

  9. A proposal for one shared resource

  10. Information presentation of one or more database entities • Natural Language Interfaces/SDS (McKeown85, McCoy89, Cooperative Response literature, Carenini&Moore01, Polifroni etal 03, COGENTEX w/ active buyers website, Walkeretal04,Demberg&Moore06, etc) • Different communicative goals; Summarize, Recommend, Compare, Describe (DB entities) • Representation not controversial (attributes and values for DB entities, relations between entity and attribute) • Application not dependent on NLU

  11. What type of resource is needed for generation? • What type of scientific problem is generation? • An essential difference between language generation and language interpretation problems (parsing, WSD, relation extraction, coreference) is that there is no single right answer for language generation; • Language Productivity Assumption: An optimal generation resource will represent multiple outputs for each input, with a human-generated quality metric associated with each output

  12. We could make available a resource of: • INPUT-1: Speech ACT, SET of DB Entities • SUMMARIZE(SET); DESCRIBE(ENTITY), RECOMMEND(ENTITY,SET), COMPARE(SET) • INPUT-2: user model, discourse/dialogue context, style parameters, etc. • OUTPUT-1: a set of alternative outputs possibly with TTS markup • OUTPUT-2: human generated ratings or rankings for the outputs oriented to the criteria specified by INPUT-2

  13. A Content Plan for a Recommend • strategy: recommend • relations: justify(nuc1; sat:2); justify(nuc:1; sat:3); justify(nuc:1, sat:4) • content: 1. assert(best (Babbo)) 2. assert(has-att (Babbo, foodquality(superb))) 3. assert(has-att (Babbo, decor(excellent))) 4. assert(has-att (Babbo, service(excellent)))

  14. Human Feedback for Ranking • The ratings can represent any metric associated with the possible response, e.g. coherence, information quality, social appropriateness, personality. • Informational Coherence • SPARKY, a generator for MATCH • SPOT, a generator for AT&T COMMUNICATOR • Users are shown response variants then told: • For each variant, please rate to what extent you agree with this statement. • The utterance is easy to understand, well-formed and appropriate to the dialogue context.

  15. Examples: Learned Rules applied to test fold

  16. Individual Differences (Sentence Planning Preferences)

  17. Human Feedback for Ranking (2) • Ten Item Personality Inventory Questionnaire, (Gosling 2003) • PERSONAGE • Users are shown response variants then told: • For each variant, rate on a scale of 1 to 7 whether: • The speaker is quiet, reserved; • The speaker is enthusiastic;

  18. Personality judgments: `Recommend Le Marais’

  19. What else is out there? • Coconut corpus: referring expression generation, but add alternatives and ratings? • Boston directions corpus (NSF funded early 1990s) • Communicator corpus (8 different system outputs for dialogue contexts that can be characterized) • Tools: Halogen, Penman, FUF-SURGE, RealPro • Library of text plans, content plans, sentence planners?

More Related