1 / 33

Cognitive Corpus-Based LSP Lexicography: Research and Implementation Issues

This case study focuses on the Multilingual Glossary on Risk Management and discusses the research and implementation challenges in cognitive corpus-based LSP lexicography. It explores the motivations, methods, and benefits of creating terminologies for risk communication, with a focus on increasing the transparency and consistency of risk discourse.

grinder
Download Presentation

Cognitive Corpus-Based LSP Lexicography: Research and Implementation Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cognitive corpus-based LSP lexicography – research and implementation issues – a case study on the Multilingual Glossary on Risk Management Gerhard Budin University of Vienna Austrian Academy of Sciences 8th of April, 2011

  2. Our empirical research landscape

  3. The Making of…a Multilingual Glossary on Risk Management

  4. Motivations and Methods:Terminologies for Risk Communication • The Role of LSP Lexicography in domain communication • Increasing the “transparency” of terms • Help negotiate a common understanding of terms in intra-, inter- and trans-disciplinary and transcultural discourse • Help increase the consistency of risk discourse (written and spoken) and increase understanding in target audiences • Reduce unnecessary synonyms, disambiguate polysems, help separate homonyms • Help create risk terminologies in many languages • Support knowledge sharing and knowledge transfer in cooperative work environments • Support cross-cultural discourse (e.g. translation and parallel texts)

  5. The Domains of Risk Management • Multidisciplinary, diverse, and fragmented - or • Transdisciplinary, overlapping, converging, integrated, and complementary • The need for mediating between different approaches, cultures, and discourses: • Technological, engineering, research, science • Administration, legislation, monitoring • Social, sociological, political, cultural • Domain approaches (financial, ecological, chemical, safety, geographical, planning and forecast, health, etc.)

  6. WIN Project (FP6 2004-2009): WP “Human Language Interoperability” • Objectives • WP 2200 is designed to support international risk management and risk communication processes (within the WIN project and beyond) • Achieved results (with ongoing work) • Large parallel corpora collection with risk-related texts and lexical resources (fr, en, de, es, ro, fi, hu, ru) • Multilingual index with conceptual structure • Bibliography and codes of sources • Risk Ontology • Multilingual online terminology database

  7. Integrative R&D Approach • A combination of theoretical approaches and their methods in order to achieve a result that is targeted towards the needs of the project consortium and the cooperation partners • Quantitative (computational) and qualitative (intellectual) methods of corpus analysis • Lexicographical and terminographical (word/text-oriented and concept/knowledge-oriented) • Text linguistics and translation studies • Cross-cultural comparative approach and knowledge system approach, multi-domain communication • Knowledge engineering, computational semantics/Web 2.0 (ontologies, frame semantics, etc.) • Cognitive Science approach (media pedagogy – eLearning, specific learner support, interactive approach (mental lexicon), usability engineering

  8. Motivation and Convergence of Research Interests and Contexts • Interest in cognitive science research applied to terminology management, ontology engineering, translation technologies, E-Learning systems design and implementation • Research Cluster 1 “Translation – Cognition – Technologies” at the Center for Translation Studies, University of Vienna • Interdisciplinary Research Platform on Cognitive Science – Cluster on Cognitive Linguistics • Research Priority 1 Lexicology, Terminology, and Parallel Corpora at the Institute for Corpus Linguistics and Text Technology at the Austrian Academy of Sciences

  9. Research contexts in several projects • Previous and ongoing projects • Dynamont • Methodology for Creating Dynamic Ontologies, BMVIT, national research programme “Semantic Systems” – multi-dimensional ontology modelling • WIN (Wide Area Information Network on Risk Management) MGRM Multilingual Glossary on Risk Management • IP (Integrated Project) in FP6, 2004-2008, focus on creating a multilingual terminology and ontology of risk management – risk ontology for natural hazards • Montific - Multilingual ONTology for Internal Financial Control, a LLP project (Leonardo da Vinci II) • Building a “learning ontology” for an eLearning environment • STABILITY AND ADAPTATION OF CLASSIFICATION SYSTEMS IN A CROSS-CULTURAL PERSPECTIVE - European Science Foundation: COST A 31 project • cognitive linguistics – how “classifiers” are embodied in language incl. ontologies • TES4IP - Terminology Services for the Intellectual Property Domain (Bridge project funded by FFG, Austrian Research Agency • Term extraction, multi-word term recognition, named entity recognition, legal vocabularies and legal ontologies • -> Ongoing study • Cognitive Ontologies • Designing, Generating and Using Domain Ontologies

  10. Ontology Engineering and Cognitive Science • Cognitive Aspects have been of interest in a variety of ontology engineering approaches • Barry Smith • Epistemological focus combined with work on domain ontologies (mainly bio-medical) • Criticizing the epistemological foundations of terminology theory in elaborating his foundational theory of ontology • Aldo Gangemi • DOLCE: Descriptive Ontology for Linguistic and Cognitive Engineering • Foundational theory of ontology • Many projects, also on tools and on domain ontologies • But also many others (Guarino, Sheth, Obrst, Noy, et al) have done research on these aspects • Some criticism, that the focus in ontology evaluation is on syntactic evaluation for computational uses (only) – the classical scenario

  11. “Cognitive Ontologies” • Conceptual clarification: • Ontologies of cognitive processes • In neuroscience research, similar to other bio-medical ontologies (cognitive atlas, neuropsychiatric phenomena, ontology of cognitive objects, etc.) • Ontologies with a focus on their cognitive aspects • DOLCE and other cognitive-oriented approaches • Constructivist epistemology for ontology building, concerning the relation to “reality” • Increasing convergence of these two concepts

  12. Our own research • Our previous and ongoing projects have been focusing on cognitive adequacy of domain ontologies and their use in knowledge acquisition in learning situations • Terminology studies as a contribution from this perspective (related research by Nistrup Madsen/Erdman Thomsen 2005, 2009, etc.) • Using DOLCE design patterns for multi-dimensional conceptual modeling for ontology building • the DYNAMONT project • From domain corpora to terminologies and from there to domain ontologies • for eLearning scenarios – the MONTIFIC project • For domain experts – the WIN/MULTH/MGRM project

  13. Moving up (and down) the Ontology Spectrum • The challenge: from linguistic-cultural diversity of discourse and free-form lexical structures to a unified, formalized, axiomatized ontology – and back, to support human understanding and social processes such as collaborative learning • The method: an integrative, multi-level modelling approach specifying the steps in a process-oriented workflow framework (with variable, combinable steps depending on concrete needs) for • Gradual semantic enrichment • Gradual semantic formalization • Multi- and cross-lingual referencing/alignment for text management • Constant interaction between full texts and lex-term resources • The technology: a multi-component workbench (i.e. Dynamont-WB incl. ProTerm as a central element), using XML, RDF, OWL, SKOS, WordNet + GlobalWordnet, MLIF (containing TBX, TMX, XLIFF, LMF, TMF, etc.), FrameNet, etc. • The advantage: full exploitation of all types of languages resources (LR) and knowledge organization systems (KOS), providing a framework not only for their semantic enrichment and formalization as ontologies but also for ontology-based multilingual authoring, text generation and translation

  14. The global risk communication scenario • Several projects since 1994 covering the following activities: • Thesaurus building • Creating multilingual terminology databases • Creating multilingual text corpora • Lexicographical glossary • Semantic enrichment (e.g. conceptual links, frame semantics) • Collection and analysis of relevant knowledge organization systems • Annotation of resources • Mark-up of resources (TBX, etc.) • Ontology building • Communication design

  15. From texts and terminologies to ontologies- and back to texts • Using the Risk scenario • Termbase • Export XML • Domain Models – meta-models -> patterns • Text corpus • Term extraction – comparative testing ProTerm, MultiTerm Extract, MultiCorpora • Aligning with termbase • Convert to RDF • Ontology import -> editor • Mappings (GMT, XML, RDF, OWL, UML, comma delimited, RDB, for different kinds of lex-term resources, FN->OWL, etc.) • The MULTH-WIN Project as an example of methods integration

  16. Terminological frame semantics • INTERVENTION (ACTOR(S), ACTIVITIES/PHASES): • RISK DETECTING (PRE-EVENT) • - R-ASSESSMENT • - R-PERCEPTION (X is risk) • - EXPERIENCE (statistics, case studies) • - OBSERVATION (monitoring) • - METHOD • - SATELLITE • - PROGNOSES • - R-ANALYSIS • - R-FEATURES • - SITUATION/CONTEXT (danger/hazard) • - SIMULATION (course of events) • - PROBALISTIC METHODS (safety) • - RELIABILITY • - R-IDENTIFICATION (DAMAGE) • - R-SOURCE • - DAMAGE CAUSE • - VULNERABILITY (DAMAGE TARGET) • - SUSCEPTABILITY (capacity/people) Rothkegel

  17. Terminological frame semantics I. Pre-event B. Public awareness and planning, II. In-event: C. Events and response afflux/Hochwasser durch Aufstau BE [[TYPE=flood], [PLACE=], [TIME=]], HAVE [CAUSE [[ORIGIN=], [NIEDERSCHLAG [TYPE=]], [STAU [TYPE= Aufstau]]], DAMAGE [TARGET=, SOURCE=, DEGREE=]], HAPPEN [STATES=, PROCESSES=]] backwater/Rückstau BE [[TYPE=flood], [PLACE=], [TIME=]], HAVE [CAUSE [[ORIGIN=], [NIEDERSCHLAG [TYPE=]], [STAU [TYPE= Rückstau]]], DAMAGE [TARGET=, SOURCE=, DEGREE=]], HAPPEN [STATES=, PROCESSES=]] Rothkegel

  18. Ordnance Survey

  19. Dynamont architecture, tools and workflows

  20. The Glossary • The paper version of the glossary is used by risk managers, civil engineers, but also teachers, students, translators, journalists, etc. • Generally, the purpose of such multilingual conceptual glossaries is to improve domain communication and to facilitate mutual understanding across linguistic boundaries. • The concepts of risk management and their definitions presented in this glossary were carefully selected from a large body of technical literature and authentic text corpora in the respective languages. • These sources are referenced in the bibliography. • The multilingual glossary presented here includes 8 languages: English and French as main pivot languages, as well as German, Spanish, Romanian, Finnish, Hungarian, and Russian. • It comprises about 230 central concepts of risk management with about 400 definitions and about 1400 terms representing these concepts in each language (including synonyms and hyperonyms), indicating the conceptual relations between the entries.

  21. The Glossary • The following themes are used as the macro-structure of the glossary: • A. Risk assessment and technology assessment • B. Public perception of risk, planning, preparation and alarm, • C0. Risk events, equipment and operations, general terms • C1. Fire - events, equipment and operations • C2. Floods - events, equipment and operations • C3. Oil spills - events, equipment and operations. • Each glossary entry follows the same micro-structure with the following information elements: • A conceptual number combined with a theme from the macro-structure • The equivalent terms in the 8 languages, accompanied by grammatical information • The definitions of the concept in each language, including multiple definitions that may differ from each other, accompanied by the textual source of the definition, also including structural semantic information on the concept • Related terms and expressions.

  22. Research issues • Experimental settings • User studies, user modelling • Data modelling • Corpus-analysis • Multilingual – multi-domain – cross-cultural • Knowledge dynamics - Dynamic knowledge representations • Cognitive studies

  23. Conclusions and Outlook I • Online terminology database is continuously used • 8-language Glossary Version produced in February 2011 • Next steps in 2011: • Work in progress! • Database to be extended from 5 to 8 languages • Full text corpora to be extended • Promotion of the glossary in different user communities • Term extraction, research • Extension into more languages • More scientific publications

  24. Conclusions and outlook II • Research perspectives • Further research in • Cognitive ontologies • User modelling, usability of terminological databases and LSP dictionaries • Corpus-linguistic research – semantic annotation modelling • Multilingual, multi-domain, cross-cultural issues

  25. Selected References • Budin, G. Socio-terminology and computational terminology – toward an integrated, corpus-based research approach. In: De Cilia, Rudolf et al. (eds.). Discourse, Politics, Identity. Tübingen: StauffenburgVerlag, 2010, 21-31 • Budin, G. Semantic Systems supporting Cross-Disciplinary Environmental Communication. In: Hryniewicz, O.; Studzinski, J.; Szediw, A. (eds.). Environmental Informatics and Systems Research. Vol 2 Workshop and application papers. EnviroInfo 2007. Aachen 2007, 23-30 • CEDIM , Center for Disaster Management and Risk Reduction Technology c/o University of Karlsruhe (2005). Glossar: Begriffe und Definitionenaus den Risikowissenschaften. • Gangemi DOLCE • Greciano, G. (2001). L'harmonisation de la terminologie en Sciences du Risque. In Proceedings of Security Conference, Montpellier XII. Council of Europe-FER. Strasbourg, France. • Greciano, G. (2001). Les sciences du risque: convergences interculturelles. In Proceedings of Risk Conference, Strasbourg X. Council of Europe-FER. Strasbourg, France. • Greciano, G. (2001). Pour un glossaire combinatoire plurilingue du Risque. Proceedings of Risk-Conference, Mèze V. Council of Europe-FER.Strasbourg, France. • Massué, J.P. (2001). "Mobilisation de la Communauté scientifique au service de l'amélioration de la gestion des risques". Mèze, FER-EUR-OPA.Strasbourg • Nistrup Madsen/Erdman Thomsen 2005, 2009

  26. Acknowledgements GLOSSAIRE MULTILINGUE DE LA GESTION DU RISQUEFrançais / Allemand / Anglais / Espagnol / Roumain / Finlandais / Hongrois / Russe édité par Gertrud Gréciano, Gerhard Budin, Danielle Candel, John Humbleyavec le soutien de la Commission de l’Union Européenne, des Universités de Strasbourg, Vienne, Helsinki, de la Région Alsace, de la Délégation générale à la langue française et aux langues de France, et de l’Académie des Sciences d’Autriche. Auteurs: Gertrud Gréciano (Strasbourg), Gerhard Budin (Vienne),Annely Rothkegel (Chemnitz), Ulrike Hass (Essen) Traducteurs: Cornelia Cujba (Iasi), Attila Frigyer (Budapest), Luis Gonzalez (Caracas-Paris), Csilla Höfler-Bornemisza (Vienne), Annikii Liimatainen (Helsinki), Alexei Milko (Strasbourg-Moscou) Coopération scientifique et technique: Steffi Baumann (Chemnitz), Aban Budin (Vienne), Christian Burghard (Chemnitz), Dimitrij Dobrovolskij (Moscou-Vienne), Eva Haas (Munich-Ispra), Natalia Jonkova (Moscou), Andra Moga (Iasi-Vienne), Maren Runte (Essen), Julia Steuber (Essen), Virginie Tombeux (Paris), Elena Volgina (Moscou)

  27. Thank you for your attention Gerhard Budin Center for Translation Studies University of Vienna Institute of Corpus Linguistics and Text Technology Austrian Academy of Sciences gerhard.budin@univie.ac.at http://mgrm.univie.ac.at

More Related