250 likes | 437 Views
Introduction to Natural Language Generation. Yael Netzer Department of Computer Science Ben Gurion University. Outline. Introduction – what is NLG Traditional architecture of NLG system Statistical methods in NLG FUF/SURGE An example in Hebrew – the noun phrase
E N D
Introduction to Natural Language Generation Yael Netzer Department of Computer Science Ben Gurion University
Outline • Introduction – what is NLG • Traditional architecture of NLG system • Statistical methods in NLG • FUF/SURGE • An example in Hebrew – the noun phrase • A statistical method for generation Yael Netzer BGU
What is Natural Language Generation (NLG) NLG is the process of constructing natural language outputs from non-linguistic inputs. [VanLinden] NLG is mapping some communication goal to some surface utterance that satisfies the goal. [Reiter & Dale] Yael Netzer BGU
Aspects in NLG • Theoretical and practical interests: • Theoretical: modeling various depths of human language representation and production. • Practical: engineering human/computer interfaces (computer as an author/authoring aid). Yael Netzer BGU
Systems for examples: • NLG as an Author: • Weather reports (FoG) • Stock market descriptions • Museum artifacts descriptions (ILEX) • “Personal” letters to costumers (AlethGen) • NLG as an author aid • Integrated (partial) NLG uses: • NLG in augmentative and alternative communication • Summarization (integrate ‘cut and paste’ techniques with generation) • Machine Translation (generation from interlingua) Yael Netzer BGU
Inputs of NLG systems Formally, a system can be defined as a four-tuple: {k,c,u,d} • k- knowledge source (tables of numbers, knowledge representation lang.) domain dependent, no generalizations. • c - communicative goal: the consequence of a given execution of the system (considering appropriate information) Yael Netzer BGU
NLG input spec. cont. u - user model: characterization of the hearer or intended audience for whom the text is to be generated. d - discourse history: previous interactions between user and NLG controlling anaphoric forms, preventing repetitions. Yael Netzer BGU
The output for an NLG system Any text conveying the communicative goal: It can be a word like ``yes'' in a dialogue - or a text consisting of many paragraphs in other cases. The output should be related to the medium: web pages with hyperlinks, voice stream etc. Yael Netzer BGU
Main (Pipeline) Architecture • Content determination • What information should be included in the text? • Document structuring • how to organize text • Lexicalisation • choosing particular words or phrases • Aggregation • composing chunks of info into sentences. • Referring expression generation – • what properties should be used in referring to an entity. • Surface realization • mapping underlying content of text to a grammatically correct sentence that expresses the desired meaning. Yael Netzer BGU
Content Determination Content determination: • The process of deciding what to say. • No general rules - domain specific. • what is important - what should always be included, what is exceptional information, etc. • Practically – constructs a set of messages from the underlying data (entities, concepts and relations). Yael Netzer BGU
Document Structuring Document Structuring: imposing ordering and structure over the information. - conceptual grouping - rhetorical relationships. Yael Netzer BGU
Lexical choice Lexical chooser: • determining the particular words to be used to express concepts and relations. • complexity of coding vs. richer language. • choosing content words: information is mapped from conceptual vocabulary. • LC should supply a variety of words, consider the user model [precise vs. general description of weather phenomenon], and account for pragmatic considerations (formal vs. casual style). Yael Netzer BGU
Aggregation Aggregation - can be performed in various stages: • the planner: combines similar data. • In lexicalization: aggregates some concepts into one lexical element. • Aggregations of sentences: • The month was cooler than average. The month was drier than average into The month was cooler and drier than average Yael Netzer BGU
Referring expression generation Referring Expression Generation: • an entity can be referred in many ways: initially, subsequently, distinguishing, definite, pronouns. • Proper names: • באר שבע • באר שבע בית הנגב • Definite descriptions: • The train that leaves at 10am • The next train. • Prounouns • it Yael Netzer BGU
Syntactic realizer Syntactic Realizer: syntax and morphology. • Most general, domain independent (but definitely language dependent). • Various Usage Scenarios • Input to syntactic realization is not observable • Input for syntactic realizers in NLG • What knowledge is needed to prepare input? • Who supplies this knowledge? • Can we find a common abstraction, common across languages and applications? Yael Netzer BGU
Possible techniques for realizers • Bi-directional grammar specification. • Grammar specifications tuned for generation. • Templates • Corpus statistics Yael Netzer BGU
A note on bi-directional grammar • Realization, in some aspects, is easier than parsing: no need to handle the full range of syntax that a human might use, no need to resolve ambiguities, no need to recover ill-formed input. • A bi-directional grammar, is, theoretically, a possible elegant approach. • However, most NLG systems use a generation-oriented grammar Yael Netzer BGU
Why not bi-directional? • Output of NLU parser is very different from the input to an NLG realizer. • Not obvious that lexicalization is a part of the realization. • Practically, not easy to engineer large bi-directional grammars. • And more: generation is the process of choices, even to use ‘canned text’ when needed. Yael Netzer BGU
Syntactic Realizer • This work concerns Syntactic Realizers – the grammar • Input for grammar: lexicalized representation of a phrase in various levels of abstractions. • Output of grammar: a grammatical string, representing most accurately the info in the input. Yael Netzer BGU
The input question is: Input?? Application Content planner And lexicon Knowledge base Syntactic Realizer Yael Netzer BGU
FUF/SURGE - Implementation • The grammar is written in FUF – Functional Unification Formalism [Elhadad] • FD - a list of (att val) • val = atom\fd\path • Grammar: meta-FD: disjunction with ALT, control with • NONE, GIVEN, ANY. • All components in the generation process can be implemented with this formalism. Yael Netzer BGU
Requirements for a syntactic realizer • Mapping thematic structure onto syntactic roles. • Control of syntactic paraphrasing and alternations. • Provision of default for syntactic features. • Propagation of agreement features. • Selection of closed class words. • The imposition of linear precedence constraints. • The inflection of open class words. Yael Netzer BGU
SURGE [Elhadad&Robin 96] • Functional Grammar, HPSG and descriptive studies of language • Input for the grammar is a lexicalized representation of a phrase (a clause, NP, AP). • Minimal syntactic information in the input allows isolating earlier stages of the process from containing purely syntactic knowledge, it gives the grammar paraphrasing power, and it is also useful for multilingual application. Yael Netzer BGU
Input for SURGE in general • Each constituent has the feature cat which determines which part of the grammar it will be unified with. • The representation of the clause is mostly semantic: a process (in SFL terms) and its participant. Paraphrasing can be done using one feature, like focus • The input of an NP uses mostly syntactic features. • Paraphrases requires different input. Yael Netzer BGU
An Example The girl was kissed by John. John kissed the girl. ((cat clause) (tense past) (process ((type material) (agentless no) (lex “kiss”))) (participants ((agent ((cat proper) (lex “John”))) (affected ((cat common) (lex “girl”)))))) (focus {partic affected}) Yael Netzer BGU