440 likes | 446 Views
Ülevaade projektist CLARIN Eesti keeleressursside keskus Koostööst Tulevikust. Pisut ajalugu. Pariis 2006 Genoa 2006 Budapest 2007 Lund 2007 Nijmegen 2008. ESFRI European Strategy Forum on Research Infrastructures.
E N D
Ülevaade projektist CLARIN • Eesti keeleressursside keskus • Koostööst • Tulevikust
Pisut ajalugu • Pariis 2006 • Genoa 2006 • Budapest 2007 • Lund 2007 • Nijmegen 2008
ESFRIEuropean Strategy Forum on Research Infrastructures • ESFRI is a strategic instrument to develop the scientific integration of Europe and to strengthen its international outreach. • The competitive and open access to high quality Research Infrastructures supports and benchmarks the quality of the activities of European scientists, and attracts the best researchers from around the world.
European Roadmap for Research InfrastructuresBrussels, 19 October 2006 • The ESFRI roadmap identifies 35 large-scale infrastructure projects at various stages of development for the next 10 to 20 years.
CLARIN • The CLARIN project is a large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily useable. • www.clarin.eu
Expertise and Standards • CLARIN will make extensive use of the expertise that developed in the European LRT community over the last decades, • CLARIN will rely on a number of standards that have been released and also push new standards where this seems to be necessary.
Standardization Initiatives • linguistic terminology: EAGLES, TEI, ISLE, ISO TC37/SC4 etc • generic schemas: ISO TC37/SC4 etc • knowledge representation: W3C, ISO TC37/SC4 etc • grids, registries and generic APIs: W3C, GGF, OASIS etc • metadata: Dublin Core, IMDI/ISLE, OLAC, METS, MPEG7, ISO 11173 etc • corpus construction: BLARK
Grid and Digital Library Initiatives • Grid/Federation Technology: GGF, DEISA, EGEE, EUGridPMA, TERENA etc • DL Initiatives: Internet2, OAI etc • European RIProjects: DAM-LR, LIRICS, Kalmar Union etc
Integration and Dissemination Projects • Resource Integration: TELRI, INTERA, ECHO, LTWorld, TDS etc • Dissemination: ELSNET, LREC, (E)ACL, ENABLER, LTRC etc
Existing LRT Associations • ELRA • ELDA • TELRI • LDC
Executive Board • Steven Krauwer (UU) - coordinator of CLARIN and chairperson of the EB • Peter Wittenburg (MPG) - leading work package 2 • TamásVáradi (HASRIL) - leading work package 3 • Martin Wynne (OTA) - as the Humanities Liaison Officer • Erhard Hinrichs (UTU) - leading work package 5 • Dan Cristea (UAIC) - leading work package 6 • KimmoKoskenniemi (UHEL) - leading work package 7 • BenteMaegaard (UCPH) - leading work package 8
Work Package 2 - Technical infrastructure • CLARIN is devoted to establish an integrated and interoperable research infrastructure for the language resources and technology (LRT) domain. • The goal is to make language resources and technology much more accessible to all researchers working with language material, in particular in the humanities and social sciences. • Building such an eScience enabling infrastructure requires investments at various layers - an important one is to establish its technical infrastructure.
Work Package 2 - Technical infrastructure • Working Groups • Working Group 1 - Requirements for LRT centres • Working Group 2 - Requirements for the LRT federation • Working Group 3 - LRT federation pilot • Working Group 4 - Specification of the registry infrastructure
WG 2.1: Requirements for LRT centres . Within CLARIN we have to carry out the following steps to build up a first network of service centres: • determine the types of centres we will need and their "business models“define a few initial services we expect from CLARIN centres • define requirements for LRT centres • launch open call for participation in the LRT service centre network prototype • analyze the repository/archive systems and make suggestions for changes/adaptations • select centres for participation
WG 2.2: Requirements for the LRT federation. • define the special requirements of the LRT providers • talk intensively with all national federations and with TERENA about their practices to establish trusts and about the schemas used • define a suitable architecture for the LRT Federation • define the set of attributes and their usage • define the rules of the LRT federation (in collaboration with WP7) • define criteria for the participation of centres as LRT service providers • define criteria for the necessary national support • ask for applications to participate in the prototype of network of centres • investigate the situation of each centre in detail (national support, local expertise, local repository/archiving system architecture, etc), select suitable centres and make priority lists • make training courses to have local experts • install the necessary components with Shibboleth in the core and make suitable adaptations of the authentication and authorization integration components • design and implement methods for the delegated authentication for web applications and portals • come to agreements with national federations
WG 2.3: LRT federation pilot. • Implementation of the requirements as specified in working group 2.
WG 2.4: Specification of the registry infrastructure. • we need to analyse the experiences with the current metadata systems and refine the suggestions to overcome them • we need to analyse the Web Services requirements for registries and the experiences made with various suggestions such as UDDI and ebXML • a reference taxonomy of resources and tools needs to be worked out by WP5 that is widely accepted • we need to come to a generic ODD based component model that is based on an agreed core and suggested extensions for various resource and tool types (similar to LMF) • a standardized component schema needs to be created as well for the XML output • a requirements specification document for the registry infrastructure (portals, repositories, tools, etc) needs to be worked out
WP3 (Humanities overview) • The sole purpose of inviting Humanities projects to collaborate with CLARIN in the preparatory phase is to enable us to assess the technological, methodological, organizational etc. requirements involved in serving the Humanities in the later phases of CLARIN. • We are committed to the idea of collaboration with Humanities projects on a prototype scale as the best means of identifying needs and removing any potential obstance from the way of future synergies between the two fields.
WG3.1 Scoping and Impact Study • The aim of this working group is to identify, mobilize and bring together a critical mass of producers and users around the infrastructure. • The initial scoping study will identify actual and potential users of language tools and resources across the heterogeneous fields that constitute the humanities.
WG3.2 Overview of relevant Humanities projects and professional associations • The aim of this working group is to make an in-depth survey of past and existing Humanities projects and establish contact with leading professional associations in the Humanities that are potential partners in employing language technology in their research. • The overall aim is to have a clear understanding of the research concerns and methods of the Humanities field so that CLARIN could make a maximal impact on the field.
WG3.3 Call for Humanities Projects • This working group will have the task to compile a call for Humanities project that CLARIN will assist. • It will work out evaluation and decision criteria that need to reflect national wishes, relevance of the proposed projects for testing the infrastructure, concepts and standards and the capability to demonstrate the potential of the infrastructure to other humanities disciplines.
WP5(Language resources and technology overview) • This WP5 deals with specifying and implementing standards for language resourcesof all kinds, including e.g. corpora, lexica, grammars and tools for processingthem. • This is a prerequisite for achieving interoperability between linguisticresources and tools. • Both will be made available through webservices, andworkflows integrating several resources, tools, and services will be defined.
WP5(Language resources and technology overview) • WP5, Working Group 1, Tools • WP5, Working group 2, Lexical Resources • WP5, Working Group 3, Corpora
Working Group 5.1, Tools The aims of this working group are: • To keep stock of basic language processing tools (tokeinizing, morphological analysis, part-of-speech tagging, parsing, named-entity recognition). • To keep stock of language processing platforms or middleware (UIMA, GATE, CLARK etc.). • To create a taxonomy of these tools. • To investigate the input- and output-formats / interfaces of these tools. • To investigate other features of the tools (e.g. language / domain dependence, resources needed) • To outline steps towards the integration of the tools into the infrastructure. • To outline criteria for the quality assessment of tools.
Working group 5.2, Lexical Resources The aims of this working group are: • To keep stock of lexical resources (monolingual / bilingual, form based / content based / multimedia, terminological data etc.) • To investigate existing standards, adapt them and make suggestions for changes • To create a taxonomy of these resources. • To investigate the encoding format of these resources • To investigate other features of the tools (e.g. coverage, data types) • To outline steps towards the integration of these resources into the infrastructure. • To outline criteria for the quality assessment of these resources
Working Group 5.3, Corpora The aims of this working group are: • To keep stock of corpora resources (monolingual / bilingual (aligned), domain specific / general, annotated etc.) • To investigate existing standards, adapt them and make suggestions for changes. • To create a taxonomy of these resources. • To investigate the encoding format of these resources. • To investigate other features of the tools (e.g. coverage) • To outline steps towards the integration of these resources into the infrastructure. • To outline criteria for the quality assessment of these resources
WP6(Dissemination) WP6 will be concerned with the following main activities: • to co-ordinate the posting of information inside the consortium during the project’s life. • to co-ordinate the large dissemination of information gathered by the project. This activity will be concerned, • firstly, with organizing a public website area where formation acquired by the project and of interest to people outside the project’s consortium will be displayed. • Secondly, a newsletter appearing electronically 4 times per year and other propaganda materials (brochures, leaflets, posters a.s.o.) will be designed, build and largely disseminated. • to accomplish preparatory work for introducing infrastructure services able to promote appropriate linguistic digital technologies to researchers in the humanities and social sciences, to help them work more efficiently and to facilitate new types of research.
WP6(Dissemination)Working Groups • Working Group 6.1: Planning and Dissemination • Working Group 6.2: Website and Newsletter • Working Group 6.3: Referral Help Desk and Registry of Expertise
WP7(Intellectual property rights and business models) • This work package deals with legal issues of CLARIN, including licensing, authorization and authentication which is necessary for the proper handling and use of language resources.
WP7(Intellectual property rights and business models) Working groups of the WP7 • Groups to be formed immediately • Working group 7.2A: Licensing and authorization of materials • Working group 7.4: Trust relations • Working group 3: ELDA/ELRA coordination • Groups to be formed later on • Working group 7.2B: Software licensing • Working group 7.2C: IPR legislation • Topics not yet assigned to any group • Business models • Ethical issues
WP8(Construction and exploitation agreement) • The main objective of WP8 is the preparation of a ready-to-sign agreement between the participating countries whereby they commit themselves to the joint construction and exploitation of the CLARIN Infrastructure. • This agreement document is called the CLARIN Construction and Exploitation Agreement (CCEA), and in order to be able to reach consensus about such an agreement, a wide variety of organizational and financial topics will have to be addressed.
WP8(Construction and exploitation agreement) • WorkingGroup 8.1 Governance and Management • Working group will first make an inventory of known problems and known best (or current) practice solutions with respect to governance and management of international infrastructures, as well as a list of requirements that follow from the way we see the construction and exploitation of CLARIN, and it will make proposals for governance and management for CLARIN after the preparatory phase.
Links to other sites relevant to CLARIN • Association for Computational Linguistics, http://www.aclweb.org/ • Digital Research Infrastructure for the Arts and Humanities (DARIAH), http://www.dariah.eu/ • Distributed Access Management for Language Resources (DAM-LR), http://www.mpi.nl/DAM-LR/ • European Chapter of the ACL (EACL), http://www.eacl.org/ • Evaluations and Language resources Distribution Agency – ELDA, http://www.elda.org/ • Linguistic Infrastructure for Interoperable Resources and Systems, Project No.22236 - LIRICS Programme e-content, http://lirics.loria.fr/ • Northern European Association for Language Technology, http://omilia.uio.no/nealt/ • Text Encoding Initiative - TEI, http://www.tei-c.org/
EKKT toetatud projekt 2008-2010 Partnerid Tegevused 2008 Eesti keeleressursside keskus
WP1 (General Management) • Rahvuslik juhtkomitee • HTM peaks nimetama uue esindaja CLARINi Strategic Coordination Board’i.
WP2 (Technical infrastructure) WG 2.1 - Requirements for LRT centres WG2.2 - Requirements for the LRT federation WG 2.3 - LRT federation pilot WG 2.4 - Specification of the registry infrastructure
WP3 (Humanities overview) • WG 3.1 -Scoping and Impact Study • WG 3.2 -Overview of relevant Humanities projects and professional associations • WG 3.3 -Call for Humanities Projects
WP5(Language resources and technology overview) WG 5.1 - Tools WG5.2 - Lexical Resources WG 5.3 - Corpora
WP6(Dissemination) WG 6.1 - Planning and Dissemination WG 6.2 - Website and Newsletter WG 6.3 - Referral Help Desk and Registry of Expertise
WP7(Intellectual property rights and business models) WG 7.2A - Licensing and authorization of materials WG 7.4 - Trust relations WG 7.3 - ELDA/ELRA coordination
WP8(Construction and exploitation agreement) WG 8.1 - Governance and Management