1 / 28

CLARIN and FLaReNet: new European Initiatives for Language Resources and Language Technologies

CLARIN and FLaReNet: new European Initiatives for Language Resources and Language Technologies. Nicoletta Calzolari Istituto di Linguistica Computazionale del CNR, Pisa, Italy glottolo@ilc.cnr.it. Today, many vitality & s uccess signs… for LRs.

ferris
Download Presentation

CLARIN and FLaReNet: new European Initiatives for Language Resources and Language Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLARIN and FLaReNet: new European Initiatives for Language Resources and Language Technologies Nicoletta Calzolari Istituto di Linguistica Computazionale del CNR, Pisa, Italy glottolo@ilc.cnr.it Boulder, March 2008

  2. Today, many vitality & success signs… for LRs • In Spoken, Written, Multimodal areas … … in new emerging areas • Statistical approaches… • Different dimensions & layers: Content (Ontologies), Emotion, Time, … • For Evaluation • For Training • … • LREC(> 900 submissions); many LRs at COLING and even at ACL!! • ELRA (self-sustaining) & LDC • LRE (new Journal: N. Ide & NC) • ISO-TC37-SC4/WG4 (International Standards for LRs) • AFNLP… • ESFRI - CLARIN (also political & strategic role) • New calls or initiatives in EU, US, ASIA, on LRs, interoperability, cooperation, … Boulder, March 2008

  3. BUT … an important point: In the ’90s • There was a global vision of the field & its main components: • Standards • Creation of LRs • Distribution Then: • Automatic acquisition … towards the Infrastructure of LRs & LT ELRA LDC While today: • There is an ever increasing set of initiatives for new LRs, basic robust technologies, models??, algorithms, • We have a LR community culture • BUT sort of scattered, opportunistic, not much coherence Boulder, March 2008

  4. Today … The wealth of data & of basic technologies is such that: • We should reflect again at the field as a whole & ask if • Standards • Creation of LRs • Automatic acquisition • Distribution are still “the” important components, or how they have changed/must change • Content interoperability • Collaborative creation & Manag. • Dynamic LRs • Sharing could be at the basis of a new Paradigm for LRs & LT & of a new Infrastructure ?? … Which new challenges towards a new & more mature infrastructure of LRs & LTs?? Boulder, March 2008

  5. ISO LMF – Lexical Markup Framework Builds also on EAGLES/ISLE Structural skeleton, with the basic hierarchy of information in a lexical entry + various extensions; LMF specs comply with modeling UML principles; an XML DTD allows implementation NEDO Asian Lang. NICT Language-Grid Service Ontology The field is mature from Monica Monachini Boulder, March 2008

  6. XML based Abstract Lexicon Interchange FormatMapping exercise Major best practices: • OLIF • PAROLE/SIMPLE • LC-Star • WordNet - EuroWordNet • FrameNet • BDef formal database of lexicographic definitions derived from Explanatory Dictionary of Contemporary French • … • …others on the way… Entries from existing lexicons have been mapped to LMF to prove that the model is able to represent many best practices and achieve unification from Monica Monachini Boulder, March 2008

  7. Lexical WEB & Content Interoperability  ‘Standards’ • As a critical step for semantic mark-up in the SemWeb NomLex WordNets WordNets ComLex WordNets with intelligent agents SIMPLE LMF Lex_x FrameNet Lex_y Standards for Interoperability Enough?? Boulder, March 2008

  8. Need of tools to make this vision operational & concrete New prototype “LeXFlow”: (http://xmlgroup.iit.cnr.it:98/MILE/lexflow/demo.xhtml) • web-based collaborative environment for semi-automatic management/integration of lexical resources • enabling interoperability of distributedlexical resources • accessed by different types of agents • From Language Resources • To Language Services Boulder, March 2008

  9. Architecture for cooperative integration of lexicons Agent Role3 Agent Role1 Agent Role4 Agent Role2 Coordination Web service Interface Simple-Wordnet Relation Calculator Application MultiWordnet Relation Calculator Web service Interface Italian Simple Italian Wordnet Chinese Wordnet ILI Mapper Relation Mapper Data Boulder, March 2008

  10. parte, tratto N#12348 iperonimia/HYP A new proposed mero relation passaggio, strada,via N#1290 meronimy/MPT curvatura, svolta,curva N#20944 iponimia/HPO carreggiata N#21225 Synonym Derived ILI1.5-3001757-n road,route ILI1.6-3243979-n ILI1.5-5691718-n stretch ILI1.6-??? ILI1.5-2857000-n passage ILI1.6-3092396-n ILI1.5-3002522-n roadway ILI1.6-3245327-n ILI1.5-8488101-n bend,crook,turn ILI1.6-9992072-n Synonym Reinforcement & validity tong_dao (通道) N#03092396 上位(泛稱)詞_為/HYP che_dao (車道) N#3245327 dao_lu,dao,lu (道路,道,路) N#03243979 下位(特指)詞_為/HPO wan (彎) N#9992072 部件_部份詞_為/MPT Boulder, March 2008

  11. LexFlow • Architecture for making distributed wordnets interoperable • It lends itself to different applications in LR processing: • Enrichment of existing lexical resources • Creation of new resources • Validation of existing resources • Can provide a platform for cooperative & collective creation & management of LRs, by providing a web-based environment for the collaboration & interaction of distributed agents and resources • Prototype of a web application supporting the GlobalWordNet Grid initiative, i.e. a shared multi-lingual knowledge base for cross-lingual processing based on distributed resources over the Grid New project:KYOTO Boulder, March 2008

  12. Some steps for a “new generation” of LRs • From huge efforts in building static, large-scale, general-purpose LRs • Tonon-static LRs rapidly built on-demand, tailored to spefic user needs • From closed, locally developed and centralized resources • To LRs residing over distributed places, accessible on the web, choreographed by agents acting over them • From Language Resources • To Language Services Boulder, March 2008

  13. UIMA at ILC • Create an infrastructure to allow: • Distributed access to resources • Creation of shared resources • Use of methods to access NLP technologies • Integrate available software via Web Services • Standardise resources to be accessed from other research centers Boulder, March 2008

  14. Distributed Language Services A long-term scenario implying • content interoperability standards, • supra-national cooperation and • development of architectures enabling accessibility • Create new resources on the basis of existing • Exchange and integrate information across repositories • Compose new services on demand • Collaborative & collective/social development and validation, cross-resource integration and exchange of information Language Grid Wiki Boulder, March 2008

  15. Many dimensions around the notion of language finally • We need to put together • technical, • organisational, • strategic, • economic, • political issues of LRs Two new European Infrastructural & Networking Initiatives Multilingualism Political issues e.g. a commonly agreed list of minimal requirements for “national” LRs: BLARK Need of bodies for a broad research agenda & strategic actions for LT&LRs (W/S /MM) based on all the dimensions Interdisciplinarity & Multidisciplinarity • Cultural issues • Language … and cultural identity • Language … and the Humanities • Economic, • social issues • Applications • Services Technical issues Boulder, March 2008

  16. Which Communities? Technologies exist, but the infrastructure that puts them together and sustains them is still missing for • Humanities • Social Sciences • Digital Libraries • Cultural Heritage • … • Language Resources • Language Technologies • Standardisation core Enabling infrastr CLARIN ResInfra FLaReNet Network Multilinguality on • Grid • Semantic Web • Ontologists • ICT • … Focus on cooperation • Many application domains (eculture, egovernment, ehealth, …) for Boulder, March 2008

  17. ESFRI Research Infrastructures CLARIN Common Language Resources and Technologies Infrastructure for the Humanities & Social Sciences Large-scale pan-Europeancollaborative effort(31+ countries) • Make LRs & LTs available & readily usable to scholars of humanities & social sciences (& all disciplines) • Need to overcome the present fragmented situation by harmonising structural and terminological differences • Basis is a Grid-type infrastructure and Semantic Web technology • The benefits of computer enhanced language processing become available only when a critical mass of coordinated effort is invested in building an enabling infrastructure, which can provide services in the form of provision of tools & resources as well as training & counseling across a wide span of domains • The infrastructure will be based on a number of resource, service and expertise centres Boulder, March 2008

  18. CLARIN Mission • Create acomprehensive and free to use distributed archive of LRs & LTscovering not only the languages of all member states, but also other languages studied and used in Europe • Through the fact that the tools & resources will be interoperable across languages & domains,contribute to preserving andsupporting multilingual & multicultural European heritage • An operationalopen infrastructure of web serviceswill introduce anew paradigm of distributed collaborative development • Allow many contributors to add all kinds of new services based on existing ones, thus ensuring reusability and allowing scaling up to suit individual needs Boulder, March 2008

  19. How can we tackle these challenges? • J. Taylor • “eScienceis about global collaboration in • key areas of science and the next generation • of infrastructures that will enable it” • Need to build new types of platforms • to allow researchers to combine existing resources easily to new ones to tackle the big challenges • to increase the productivity of all interested researchers, since currently too much time is wasted by preparatory work from P. Wittenburg Boulder, March 2008

  20. CLARIN establishes such a new generation of extended infrastructure • Thus CLARIN is not about creating and building new language resources and technology, but • making them available and accessible • as services • in a stable and persistent infrastructure to allow tackling the great challenges • CLARIN: http://www.clarin.eu • Grid Project: http://www.mpi.nl/dam-lr • ISO TC37/SC4: http://www.tc37sc4.org • Standards Project: http://lirics.loria.fr/ eScience Vision from P. Wittenburg Boulder, March 2008

  21. We have still a long path … & also a “new project” in an e-Contentplus Call for a: • “Thematic Network on Language Resources”: FLaReNet • To providecommon recommendations (to the EC) for future actions • To give priorities • Need of ‘visions’ In a global context, in cooperation with CLARIN & also with non-EU members Boulder, March 2008

  22. Which Communities? LRs & LTs exist, but a global vision, policy and strategy is still missing for • Humanities • Social Sciences • Digital Libraries • Cultural Heritage • … • Language Resources • Language Technologies • Standardisation • Ontologists • Content core CLARIN ResInf EU Forum FLaReNet Network Multilinguality Focus on cooperation for • EC • Funding agencies • … • Many application domains (eculture, egovernment, ehealth, intelligence, domotics, content industry, …) for Boulder, March 2008

  23. FLaReNet Fostering Language Resources Network A European forum • to facilitate interaction among LR stakeholders The Network structure considers that LRs present various dimensions and must be approached from many perspectives: • technical, but also • organisational • economic • legal • political Addresses also • multicultural and multilingual aspects, essential when facing access and use of digital content in today’s Europe Boulder, March 2008

  24. Organised in Thematic Working Groups A layered structure, with leading experts & groups (national and European institutions, SMEs, large companies) for all relevant LR areas (about 40 partners) • in collaboration with CLARIN • to ensure coherence of LR-related efforts in Europe FLaReNet will • consolidate existing knowledge, presenting it analytically and visibly • contribute to structuring the area of LRs of the future by discussing new strategies to: • convert existing and experimental technologies related to LRs into useful economic and societal benefits • integrate so far partial solutions into broader infrastructures • consolidate areas mature enough for recommendation of best practices • anticipate the needs of new types of LRs Boulder, March 2008

  25. Thematic Areas • The Chart for the area of LRs in its different dimensions • Methods and models for LR building, reuse, interlinking and maintenance • Harmonisation of formats and standards • Definition of evaluation protocols and evaluation procedures • Methods for the automatic construction and processing of LRs To build together: • Evolving RoadMap • Blueprint of actions and infrastructures Boulder, March 2008

  26. Objectives & expected results The largest Network of LR and HLT players, with diverse approaches, efforts and technologies • Enable progress toward community consensus • Give an extended picture of LRs & recast its definition in the light of recent scientific, methodological, technological, social developments • Consolidate methods & approaches, common practices, frameworks and architectures • A “roadmap” identifying areas where consensus has been achieved or is emerging vs. areas where additional discussion and testing is required, together with an indication of priorities • Recommendations in the form of a plan of coherent actions for the EU and national organizations • A European model for the LRs of the next years Ambitious! Boulder, March 2008

  27. Outcomes of FLaReNet The outcomes will be of a directive nature • to help the EC, and national funding agencies, identifying priority areas of LRs of major interest for the public that need public funding to develop or improve A blueprint of actions will constitute input to policy development both at EU and national level • for identifying new language policies that support linguistic diversity in Europe • in combination with strengthening the language product market, e.g. for new products & innovative services, especially for less technologically advanced languages Boulder, March 2008

  28. These Initiatives, … together • Call for international cooperation also outside Europe and will be relevant for • setting up a global worldwide Forum of Language Resources and Language Technologies Boulder, March 2008

More Related