230 likes | 369 Views
CLARIN: Goals and Structure of the Project. Steven Krauwer CLARIN Coordinator Utrecht institute of Linguistics UiL-OTS (NL). Problem & Mission Some why-questions Some who-questions Overall plan Technical dimension Language dimension User dimension Governance and legal dimension.
E N D
CLARIN: Goals and Structure of the Project Steven Krauwer CLARIN Coordinator Utrecht institute of Linguistics UiL-OTS (NL)
Problem & Mission Some why-questions Some who-questions Overall plan Technical dimension Language dimension User dimension Governance and legal dimension What CLARIN is NOT about How we work Funding Structure To conclude Overview CLARIN - Riga 03-11-2008
Much data in digital archives language based Only known to insiders Archives mostly unconnected Every archive has its own standards for storage and access Normally only simple retrieval of files (text, audio or video documents) Social sciences and humanities researchers are not language or speech technologists They are often not aware of the potential benefits of using language and speech technology Available tools are hard to use for non-specialist The problem CLARIN - Riga 03-11-2008
What: Create an infrastructure that makes language resources and technology (LRT),available to scholars of all disciplines, especially social sciences and humanities (SSH) How: Unite existing digital archives into a federation of archives with unified web access Provide existing language and speech technology tools as web services operating on language data in archives The CLARIN Mission CLARIN - Riga 03-11-2008
too much fragmentation lack of coordination across countries lack of visibility lack of interoperability lack of sustainability expertise exists but not in all countries language independent tools can be shared language dependent tools can often be ported most countries not able to bear the cost Why a European infrastructure? CLARIN - Riga 03-11-2008
Exponential growth of digital data Increasing maturity of language and speech technology: high speed large volumes new research questions Growing interest at EU level in research infrastructures (RI) RI Roadmap published in 2006 by ESFRI includes 35 accepted proposals for RIs CLARIN is one of them all of them will get funding for a 1-3 year preparatory phase Why now? CLARIN - Riga 03-11-2008
The CLARIN consortium has now 32 partners from 22 EU and associated countries (and more on the waiting list) The CLARIN community has 142 members in 32 countries (Oct 2008) CLARIN is based on 4 earlier initiatives with many participants: LangWeb EARL TELRI (and later) DAM-LR Who we are and where we come from CLARIN - Riga 03-11-2008
Both our membership and our consortium are quite unbalanced: Speech & multimodality underrepresented Humanities other than linguistics underrepresented Social sciences underrepresented Some countries still missing There is no money to extend the consortium but we have to fill these gaps Who else do we need? CLARIN - Riga 03-11-2008
Preparatory phase: 2008-2010 Put everything in place Construction phase: 2011-2015 Build and populate with tools and resources Exploitation phase: 2016-…. CLARIN in full service Budget: Prep phase 4.1 M€ from EC ??? from countries Estimated budget until 2020: ca 200 M€ Overall plan for CLARIN CLARIN - Riga 03-11-2008
First 3 years dedicated The technical dimension The language dimension to the design: The user dimension The governance and legal dimension 4-dimensional approachin the preparatory phase CLARIN - Riga 03-11-2008
Technical specification of the infrastructure Construction of a prototype Validation on rich variety of languages (>20) resources services Federation of existing archives Based on existing resources, tools Strong focus on interoperability standards Conversion of existing resources Encapsulation of existing tools Technical CLARIN - Riga 03-11-2008
Cover all languages spoken or studied in participating countries Representational and descriptive standards should be adequate and validated for all languages Same minimal coverage of basic resources and tools for all languages BLARK (Basic Language Resources Toolkit) to be defined and implemented (funds from other sources needed) Languages CLARIN - Riga 03-11-2008
Survey of resources and tools, including: encoding and annotation data quality indicators taxonomies and ontologies agreeing on common standards Focus on integration of tools interoperability usage scenarios creating missing essential resources validating specifications and prototype Language activities CLARIN - Riga 03-11-2008
Users are SSH scholars (including linguists, translation experts) Do WE know what they need? Do THEY know what they need? Actions: analyze past and ongoing SSH projects user consultation launch typical example projects to show potential expertise centers awareness actions User CLARIN - Riga 03-11-2008
IPR issues aim at open source, but IPR for existing and future non-open resources must be accommodated federation of archives requires authentication, authorization and trust between archives aim at limited number of template license agreements for most common cases respect national legislation address ethical issues Legal CLARIN - Riga 03-11-2008
Agree on e.g.: Who is going to pay for the construction and exploitation of the infrastructure How will it be managed How will it be coordinated with national policies Actions: Analyse best practice in funding and management of transnational projects Prepare agreement between (now) 22 countries about long term joint funding of CLARIN Governance andFunding CLARIN - Riga 03-11-2008
building the infrastructure – we are just preparing it creating new resources – at this stage we want to use what is there and adapt it if necessary creating new applications – except maybe some essential tools or demonstrators focusing on the big languages – we find all languages equally important strengthening European industry – our target audience are SSH researchers, but we don’t want to exclude anyone What CLARIN is NOT about CLARIN - Riga 03-11-2008
Work packages: WP1: Management and coordination WP2: Designing the infrastructure and building the prototype WP3: Humanities overview WP5: Language resources and technology overview WP6: Dissemination WP7: IPR and business models WP8: Construction and exploitation agreement How we work (1) CLARIN - Riga 03-11-2008
WP8 Org&Legal Framework 5 1 8 WP7 IPR, A&A, licensing 4 WP2 Infrastructure Prototype 6 3 2 WP3 Humanities Projects WP5 LRT Exploration 7 How we work (2) CLARIN - Riga 03-11-2008
Most tasks executed in Working Groups WGs consist of project partners & other experts (CLARIN is open!) Some WGs do work (e.g. build prototype), others create consensus Participation by others essential as e.g. standards cannot be imposedby a small group Unfortunately no EC funding available for WG participation – only reward is influence! How we work (3) CLARIN - Riga 03-11-2008
From EC: 4.1 M€, used for generic, language independent tasks From countries: ??? M€, to be used for preparing CLARIN at the national level in every country: build and organize local national CLARIN community support for participation in working groups (e.g. travel) validation tasks for own language(s) creation or adaptation of essential resources pilots and demonstrators & humanities projects (co-)organisation of local or international events preparing for future role (expertise centers, repositories) Funding & what to use it for CLARIN - Riga 03-11-2008
Executive Board, consisting of the 7 WP leaders plus a special representative to liaise with the humanities community (a.o. through the DARIAH sister project) Boards: Scientific Board Strategic Coordination Board International Advisory Board Meetings (virtual or face to face): Consortium meetings Member meetings Working group meetings Structure CLARIN - Riga 03-11-2008
CLARIN Website: http://www.clarin.eu CLARIN Office: clarin@clarin.eu CLARIN Newsletter: http://www.clarin.eu/newsletter CLARIN Members: http://www.clarin.eu/members More info CLARIN - Riga 03-11-2008