1 / 22

Cascading XSL filters for content selection in multilingual document generation

Cascading XSL filters for content selection in multilingual document generation. G. Barrutieta, J. Abaitua & J. Díaz (DELi) COLING 2002 W8: NLP XML Sept. 1st, 2002. Introduction – System overview. . Introduction - Corpus. Multilingual parallel corpus or master document

dillon
Download Presentation

Cascading XSL filters for content selection in multilingual document generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cascading XSL filters for content selection in multilingual document generation G. Barrutieta, J. Abaitua & J. Díaz (DELi) COLING 2002 W8: NLP XML Sept. 1st, 2002

  2. Introduction – System overview ......

  3. Introduction - Corpus • Multilingual parallel corpus or master document • Gross-grained RST to represent the gross-grained discourse structure. • XML-DTD to represent digitally the gross-grained RST. • Text > Data > In between tags • Discourse structure > Metadata > XML tags • Gross-grained RST provides the framework for an isomorphic multilingual corpus.

  4. Introduction: Multilingual parallel corpus with gross-grained RST in XML EN ES EU

  5. Introduction: Multilingual parallel corpus with gross-grained RST in XML EN ES EU

  6. Introduction: Multilingual parallel corpus with gross-grained RST in XML EN ES EU

  7. Introduction – User Aspects

  8. CSA – Parallel selection

  9. CSA – Horizontal filtering

  10. CSA – Vertical filtering

  11. CSA – Vertical filteringLevel of expertise If level_expertise = “null” or level_expertise = “basic” Then no relation-satellite is discarded; If level_expertise = “medium” or level_expertise = “high” Then discard example, exercise, background and preparation relation-satellites; Rationale for the rule: Any user with a null or basic level of expertise on the selected subject will need all the information available to understand the text. Alternatively, a user with a medium or high level of expertise will not require examples, exercises, background, preparation and similar relation-satellites.

  12. CSA – Vertical filteringReason to read If reason_to_read = “to get an idea” Then discard exercise and elaboration (all the types of elaboration: textual elaboration, link elaboration and image elaboration) relation-satellites; If reason_to_read = “to get deep into it” Then no relation-satellite is discarded; Rationale: Any user wishing to broaden his knowledge in the selected subject will need additional information. Conversely, a user with the intention of just getting an idea does not need any exercise, elaboration, or similar relation-satellites, which often require a more active role on the part of the user.

  13. CSA – Vertical filteringProfessional background Rationale: Any user whose professional background is not related to the subject will need all the additional supporting text to understand its meaning. Conversely, if the user is related to the selected subject, we may assume that background, preparation and similar relation-satellites will be unnecessary. If job_studies = “not related subject” Then no relation-satellite is discarded; If job_studies = “related subject” Then discard background and preparation relation-satellites;

  14. CSA – Vertical filteringOpinion or motivation If opinion_motivation = “against” or opinion_motivation = “without an opinion or motivation” Then no relation-satellite is discarded; If opinion_motivation = “in favour” Then discard motivate, antithesis, concession and justify relation-satellite; Rationale: A motivated or favourable user will not require additional motivation and, therefore, the motivate, antithesis, concession, justification, and similar relation-satellites will be disregarded, since they play a role in changing the opinion of the user to be in favour of the course material.

  15. CSA – Vertical filteringTime available Rationale: Time availability is a crucial user aspect. If the user is in a rush or has little time, the system has to provide only the most elementary information. In such case only nuclei will be generated. If the user has a bit more time, but not much, exercises are not offered, since they are usually quite time consuming and they require an active participation of the user. Finally, if the user has plenty of time, all the additional information is delivered. If time_available = “a little bit of time” Then discard all the relation-satellites; If time_available = “quite some time” Then discard exercise relation-satellite; If time_available = “enough time” Then no relation-satellite is discarded;

  16. CSA – Vertical filteringComments • The order of application of the filters is irrelevant, each filter acts upon certain parts of the text independently.

  17. Implementation objData.loadXML(sResult); objStyle.load(sXSL1); sResult=objData.transformNode(objStyle); Javascript implementation <xsl:template match="BACKGROUND"> <xsl:copy> <xsl:apply-templates/> </xsl:copy> </xsl:template> XSL implementation

  18. Experimentation • The main objective of the experimentation is to validate the hypothesis expressed in the filtering rules letting people judge the generated document and also the actual filtering mechanism of the CSA.

  19. Demo

  20. Demo

  21. Conclusions • Increase the size of the corpus: • As long as this is done following the same DTD and RST model, the algorithm will not have to change at all. • Augment the user model: • New user aspect requires only a new filter • New values for an existing user aspect requires a change in the corresponding filter • Therefore none of this modifications increase the complexity of the system and are not difficult to implement.

  22. This research work has been partly supported by the Basque Goverment • Questions • Comments • Further information • Suggestions Thank you for your attention.

More Related