NLG STEC Workshop April 20-21, 2007 Arlington, VA

NLG STEC WorkshopApril 20-21, 2007Arlington, VA Nancy Green Univ. of North Carolina Greensboro, USA

STEC NLG Pipeline Model & STEC • Pro-STEC Assumptions: • (All/most/worth-funding) NLG can be decomposed into well-defined independent STEC-modules such that improving each one will advance NLG • Input/output representation for STEC is non-controversial

NLG ‘Pipeline’ = Tip of Iceberg Media/ Presentation- related KR&R Discourse KR&R Domain CommunicationKR&R User ModelKR&R Who will pay for NLG research outside of classical pipeline?: essential empirical research, major cost, but afraid it would fall outside of STEC funding model

Example NLG System KR&R GenIE: generates letters to genetics clinic patients; goal to justify medical experts’ conclusions such that all arguments are comprehensible to a lay person • Discourse: argumentation • Domain Communication: conceptual causal model underlying expert-lay communication (not domain model) • User Model: model of appraisal • Media/Presentation: how presentation affects argument comprehension

Lesson from GenIE • NLG Pipeline = global control + sentence planning/realization • can use existing surface realizers, standard domain ontology, and lexical resources • Main cost has been KR&R modules; mainly empirical work: • Goal: find non-domain-specific principles/ guidelines to optimize lay audience’s comprehension of arguments • Corpus studies: very useful but not sufficient • Controlled studies: necessary, and cannot afford to wait for other disciplines (HCI, learning sciences, etc.) to do them for us

GenIE Corpus Studies • Intercoder reliability of content annotation scheme: used to justify domain communication model • Argumentation schemes (non-domain-specific, both normative and affective) • Stylistic (lexical/syntactic) features of author perspective • Argument presentation features (order, cue words, explicitness)

GenIE Controlled Studies • How multimedia layout, cross-media cue words affect comprehension • How argument presentation (explicit vs. implied claim, cue words) affects recognition of argument components (Claim vs. Data) & dependence of final claim on intermediate claims

STEC NLG Pipeline Model & STEC • Pro-STEC Assumptions: • (All/most/worth-funding) NLG can be decomposed into well-defined independent STEC-modules such that improving each one will advance NLG • Input/output representation for STEC is non-controversial

STEC Input/Output Problem • Different input representations needed for different types of output; e.g. compare requirements for: • Fixed-format text (original scope of NLG) • Task-appropriate, user-friendly text format (e.g. line length, paragraphing, headings, font) • Text and (reported or quoted) dialogue in story • Dialogue spoken by animated emoting conversational agent • Integrated text and images or data graphics • Text referring to physical or visual properties of presentation (‘The red line in Fig. 2 shows sales in 2002.’)

Big Challenges Empirical research to test computation- oriented, general theories, principles, guidelines to answer: • What makes a “text” (i.e. including spoken dialogue, MMPs, etc.) • Coherent? In story dialogue, believable? • User-friendly? Task-appropriate? • Comprehensible? Pedagogically effective? • Entertaining (suspenseful, funny, etc.)?

Ex. Challenges (cont.) • How does channel change answer? • E.g. HCI research: cannot assume findings for paper apply to computer screen • How does length change answer? • E.g. learning sciences: 300-word summary vs. 3-page science argument for middle school • How do individual differences matter? • E.g. cognitive impairments, affect

Conclusions • Need some NLG research with massively interdisciplinary view: cognitive science, communication studies, etc. • Need some NLG research motivated by search for answers to general questions such as above • Will STEC approach effectively kill the above kind of NLG research?

NLG STEC Workshop April 20-21, 2007 Arlington, VA