150 likes | 162 Views
Explore the pragmatic influences on sentence planning and surface realization in generation tasks, proposing a Wiki for evaluation. Utilize shared resources, tasks, and framework to improve comparative evaluation and address contextual challenges.
E N D
Pragmatic Influences on Sentence Planning and Surface Realization: Implications for Evaluation Amanda Stent Stony Brook University
Things I’ve Noticed • Different basic approaches • Text-to-text vs. KR to text • Corpus linguistics vs. systems engineering vs. empirical research • Evaluation vs. Experimentation
Three Possibilities • Shared evaluation resources • Data • Metrics • Tools • Shared evaluation task(s) • Shouldn’t be unnecessarily limiting • Shared evaluation framework • Encompasses/organizes both tasks and resources
Evaluation Framework • Three dimensions • Discourse type • Summaries, explanations…. • Application • Tutoring, Q/A, dialog…. • Generation task • Proposal is for a Wiki on generation evaluation McKeown, Walker, Green, Viethan, Gatt McKeown, Walker, Reiter, Rus et al, Byron et al Cf. Mellish and Scott, Pario et al.
Uses of Framework • Facilitiate discussion • What is generation? • Where do certain generation tasks take place? • Set up shared workspace • Wiki • Focus choice of shared tasks • Choose initial shared tasks from discourse/application/task triples where there is already data and/or multiple implementations
Generation in Context • Context is used in several generation tasks/applications • User modeling – content selection, sentence planning • Topic/focus – RE generation, sentence planning, surface realization • Style – surface realization, multimedia generation KR is a huge issue CALO Rudnicky, Di Fabbrizio et al.
Generation in Context • Existing evaluation metrics measure (to some extent) fluency and adequacy (Stent et al. 05) • Context affects both fluency and adequacy • But existing automatic metrics (for surface realization) do not take context into account • Cf. human evaluation methods like those used in (Walker et al. 02)
Generation in Context -- Examples • Parallelism/Awkwardness • Italy’s industrial wholesale sales index rose 13.2% in June from a year earlier • The June increase compared with a rise of 10.5% in may from a year earlier • The June increase compared with a rise in May of 10.5% from a year earlier (Zhong and Stent)
Generation in Context -- Examples • Unwanted implications • He didn’t start it but Mohandas Gandhi certainly provided a recognizable beginning to non-violent civil disobedience as we know it today • The mahatma instigated several campaigns of passive resistance against the British government in India • The mahatma instigated several campaigns against the British government in India of passive resistance (Zhong and Stent) • An explosion was reported in a shopping center in central Israel Monday, and paramedics said there were many casualties • A detonation was described in a shopping eye in central Israel Monday, and paramedics said there were many injured parties. (Barzilay and Lee)
Generation in Context -- Examples • Unwarranted assertions, loss of meaning • Three Israeli soldiers were killed when a Palestinian suicide bomber blew himself up at a West Bank Jewish settlement Sunday, while two Palestinian men were killed in a gunbattle with Israeli troops to the north in Nablus. • Police spokeswoman said3 people were killed by bomber at a West Bank Jewish settlement attack on Sunday. (Barzilay and Lee)
Generation in Context -- Examples • Unhelpful use of discourse cues • Chanpen Thai has the best overall quality among the selected restaurants because it is a Thai restaurant…. (Walker et al.) • Poor choice of referring expression • Sally gave John the box. She gave the doll to Sam, and she gave the box to him. (constructed based on examples seen in dialog systems)
Generation in Context -- Examples • ‘Gold standard’ / human data inadequate/disfluent/awkward • The Pad-Thai was soooo not good and so kinda concluded that the other entree's we ordered wouldn't be well executed (Restaurant reviews on Yelp.com) • a colleague of mine at work got some information over the computer network called internet (Switchboard) • a person lives across the street from me brought her home from work because a coworker of hers had this dog appear on its front doorstep (Switchboard)
What I Want: I • Shared resources • Facilitate comparative evaluation • The more KR info the better • Shared framework • Identify understudied areas • Shared tasks • Divide the load of human evaluation • Facilitate comparative evaluation
What I Want: II • Context • Represent local and global context • Info about representations less important than shared representations • Sign me up for the virtual worlds! • Users available • Text-to-text and DB-to-text • One-off and interactive • (Partially) solve KR, context issues • CONTROL for studies