1 / 22

Syntactic Aggregation in Bengali Text Generation

Syntactic Aggregation in Bengali Text Generation. Sumit Das, Anupam Basu, Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur, India. 2. Overview. Introduction / Motivation Role of aggregation in NLG

chynna
Download Presentation

Syntactic Aggregation in Bengali Text Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Syntactic Aggregation in Bengali Text Generation Sumit Das, Anupam Basu, Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur, India

  2. 2 Overview • Introduction / Motivation • Role of aggregation in NLG • Language dependency of text aggregation • Our Work • Identification of prevalent syntactic aggregation constructs in Bengali through corpus analysis • Rule based approach to perform syntactic aggregation • Evaluation of the proposed approach • Summary / Conclusions

  3. 3 Introduction: Text Aggregation • Combines coherent simple text spans removing repetiting entities • Improves fluncy, conciseness, and coherence • Preserves meaning • Example • Jack went up the hill. Jill went up the hill. • Jack and Jill went up the hill.

  4. 4 Text Aggregation • No general concensus regarding the types of text aggregation. • Existing theories propose following categories • Interpretive • Referential • Syntactic • Lexical • Performed either in sentence planner or surface realizer depending on application requirment

  5. 5 Motivation • Syntactic aggregation is the most common form of text aggregation observed in real discourse • Simple linguistic components are combined in accordance with linguistic rules • Language dependent process, so linguistic knowledges of the target language, e.g., preferred word ordering , special verb form usage are required

  6. 6 Corpus Analysis • Narrative compound sentences used to identify syntactic aggregation constructs in Bengali • Prevalent constructs are • Conjunction reduction • Right node raising (RNR) • Coordinating one constituent • Non-finite verb generation • Any combination of these constructs is allowed

  7. 7 Theoritical Framework • Our work is grounded in the Rhetorical Structure Theory (RST) framework (Mann and Thompson, 1988) • RST uniformly captures semantic, intentional and textual features of a given text • Among the 23 RST relations discussed in the original theory we consider the following • Conjunction ▫ Disjunction • Contrast ▫ Sequence • Parallel

  8. 8 Semantic Representation • Elementery discourse units e.g., sentence are represented by recursive frame based structures • Each frame corresponds to the higher syntactic and functional informations of a sentence • This higher syntactic and functional informations are represented as a set of attribute-value pairs

  9. 9 Problem Specification • Input : Two simple clauses in their semantic representaion, the rhetorical relation and the discourse marker realizing the relation • Output: Surface-form of the fluent, concise and coherent compound sentence

  10. 10 Our Approach • Step 1: Ordering arguments in the constituent clauses • Step 2: Repeating entity identification • Step 3: Ordering constituent clauses • Step 4: Superfluous word identification and non-finite verb generation • Step 5: Correct surface-form generation

  11. 11 Ordering arguments in the constituent clauses • The arg frames in the clause frames are ordered by using a total order among the arg roles • The total order is developed from the Bengali compound sentences used in the corpus analysis and using transitivity rules • Example • AmiAgAmIkAlabAbAra sAtheskule yAba. • ke kakhana kAra sAthe kothAYa • The total order among the roles is • ke < kakhana < kAra sAthe < kothAYa

  12. 12 Repeating entity identification • The entities present in both the input simple clauses with the same syntactic and semantic role are marked as REPEATING Case –frame representation of “rAma eba.n shyAma bhAta khAbe”

  13. 13 Ordering constituent clauses • Constituent clauses are reordered depending on their cronological order and polarity according to somes rules • Increases the fluency and coherency of the generated compound sentences • Example • Ami bA.Di yAba. rAma skule gechhe. (before ordering) • ( I shall go home. Ram has gone to school.) • rAma skule gechhe eba.n Ami bA.Di yAba. (After ordering) • (Ram has gone to school and I shall go home.)

  14. 14 Superfluous word identification and non-finite verb generation • Super fluous words are identified using the following two methods • Forward deletion: • rAma gatakAla khAbAra kheYechhila eba.n rAma gatakAla skule giYechhila (Ram ate food yesterday and Ram went to school yesterday). • Backward deletion: • rAma bhAta khAbe eba.n shyAma bhAta khAbe (Ram will eat rice and Shaym will eat rice). • For Sequence and Parallel relation after forward deletion the verb of the first clause is modified to non-finite form

  15. 15 Correct surface-form generation • Superfluous word deletion form the surface form takes place in this stage • In case of subject coordination and RNR the correct form of the common verb is generated • Example • Ami kAla skule yAba. tumi kAla skule yAbe. • ( I shall go to school tomorrow. You will go to school tomorrow.) • Ami Ara tumi kAla skule yAba. • (I and you will go to school tomorrow.)

  16. 16 Evaluation • We develop a system that performs syntactic aggregation of two simple clauses following the steps described • Evaluation of the system validates our approach • Due to the lack of sufficient gold standard data automatic evaluation techniques are not followed • We perform user based evaluation

  17. 17 Evaluation • Evaluation is performed depending upon the following two criteria: • Well-formedness • Faithfulness • 250 test sentences • Output sentence shown to 3 human experts • They are asked to score the outputs on a scale of 1 to 5

  18. 18 Results: Well-formedness

  19. 19 Results: Faithfulness

  20. 20 Conclusions • Our approach generates aggregated and elliptic sentences in Bengali from clause-sized semantic representations using rules • Current system produce paratactic constructions and use ellipsis to omit repeated entities • Performs all the syntactic aggregation constructs identified during the corpus analysis

  21. 21 Future Scope • Anaphoric pronoun generation to preserve meaning and increase fluency • Discourse marker comes as input. Current system can be extended to select appropriate discourse marker • Current system can be extended to generate multi-sentential textual output

  22. 22

More Related