1 / 20

Steven Krauwer Utrecht institute of Linguistics UiL-OTS (NL) INFuture, Zagreb Nov 7 2007

CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humanities. Steven Krauwer Utrecht institute of Linguistics UiL-OTS (NL) INFuture, Zagreb Nov 7 2007. Problem & Mission Some why-questions Approach How we work and who we are Why this talk

davenporta
Download Presentation

Steven Krauwer Utrecht institute of Linguistics UiL-OTS (NL) INFuture, Zagreb Nov 7 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humanities Steven Krauwer Utrecht institute of Linguistics UiL-OTS (NL) INFuture, Zagreb Nov 7 2007

  2. Problem & Mission Some why-questions Approach How we work and who we are Why this talk Summing up Overview INFuture 2007, Zagreb

  3. Much data in digital archives language based Many archives only known to local insiders and mostly unconnected Every archive has its own standards for storage and access, normally only simple retrieval of files (text, audio or video documents) Social sciences and humanities researchers are often not aware of the potential benefits of using language and speech technology tools, and these tools are hard to use for non-specialist The problem INFuture 2007, Zagreb

  4. What: Create an infrastructure that makes language resources and technology (LRT)available to scholars of all disciplines, especially social sciences and humanities (SSH) How: Unite existing digital archives into a federation of connected archives with unified web access Provide language and speech technology tools as web services operating on language data in archives The CLARIN Mission INFuture 2007, Zagreb

  5. too much fragmentation lack of coordination lack of visibility lack of interoperability lack of sustainability expertise exists but not in all countries language independent tools can be shared language dependent tools can often be ported most countries not able to bear the cost Why a European infrastructure? INFuture 2007, Zagreb

  6. Exponential growth of digital data Maturity of language and speech technology: allows for high speed processing allows for large volumes allows for new research questions Growing interest at EU level in research infrastructures (RI) for the ERA ESFRI RI Roadmap published in 2006 includes 34 proposals for RIs all of them will get EC funding for a 1-3 year preparatory phase Why now? INFuture 2007, Zagreb

  7. Preparatory phase 2008 – 2010: Put everything in place to get started for real Build prototype Budget in preparatory phase 4.1 M€ from EC ??? M€ from participating countries Construction phase 2011 – 2015: Build and populate with tools and resources Exploitation phase 2016 - …. CLARIN in full service Overall budget 2008 - 2020: ca 200 M€ Overall plan for CLARIN INFuture 2007, Zagreb

  8. The technical dimension The language dimension The user dimension The governance and legal dimension 4-dimensional approach for the prep phase INFuture 2007, Zagreb

  9. Technical specification of the infrastructure Construction of a prototype Validation on rich variety of languages (>20) resources services based on existing resources and tools (i.e. not a digitization or tools creation project) Strong focus on interoperability standards Conversion of existing resources Encapsulation of existing tools Technical INFuture 2007, Zagreb

  10. Strong sustainable centers INFuture 2007, Zagreb

  11. Intention to cover all languages spoken or studied in participating countries Representational and descriptive standards should be adequate and validated for all languages Same minimal coverage of basic resources and tools for all languages is to be defined (and implemented if additional funds are available) Languages INFuture 2007, Zagreb

  12. Survey of resources and tools, including: encoding and annotation data quality indicators agreeing on taxonomies and ontologies agreeing on common standards Focus on integration of tools interoperability usage scenarios if possible creation of missing essential resources validating specifications and prototype Language activities INFuture 2007, Zagreb

  13. Users are SSH scholars Do WE know what they need? Do THEY know what they need? Actions: analyze past and ongoing SSH projects user consultation launch typical example projects to show potential create expertise centers awareness actions User INFuture 2007, Zagreb

  14. Agree on e.g.: Who is going to pay for the construction and exploitation of the infrastructure How will the costs be shared How will it be managed How will it be coordinated with national policies Actions: Analyse best practice in funding and management of transnational projects Prepare agreement between (now) 22 countries about long term joint funding of CLARIN Set up IPR framework Governance, fundingand legal issues INFuture 2007, Zagreb

  15. Most tasks executed in Working Groups WGs consist of project partners & other experts (CLARIN is open for contributions by others!) Some WGs do work (e.g. build prototype), others create consensus Participation by others essential as e.g. standards cannot be imposed by a small group Unfortunately no funding available for WG participation by others – only influence! How we work INFuture 2007, Zagreb

  16. The CLARIN consortium has 32 partners from 22 EU and associated countries, including Croatia (FFZG) The CLARIN community has 92 members in 32 countries (Nov 07) Leading partners are: Utrecht University (Steven Krauwer coordinator) Max Planck Institute Nijmegen (Peter Wittenburg) Hungarian Academy of Sciences (Tamas Varadi) Who we are INFuture 2007, Zagreb

  17. EC funds managed by consortium, will pay for generic tasks (e.g. research, prototyping, coordination, dissemination) participation by a single national coordination point in every country (in HR: FFZG Zagreb) National funds to be managed nationally, will pay for participation by other sites in the country taking care of own language and priorities (standards, & validation, adaptation of tools & resources) carrying out example humanities projects (hopefully) participating in Working Groups National vs EC funding INFuture 2007, Zagreb

  18. Invitation to join CLARIN: We need user involvement We need archives willing to join the federation We need experts for our centers of expertise We need example humanities projects for the preparatory phase Why this talk? INFuture 2007, Zagreb

  19. CLARIN is about to embark on its 3 year Preparatory Phase project aimed at designing and building an LRT infrastructure for the SSH It can only work with support from the whole SSH community, both inside and outside the EU Please join us if you feel you can and want to contribute. We don’t pay you but don’t charge you either – it’s free! Contact: http://www.clarin.eu, steven.krauwer@let.uu.nl or your national contact point Summing up (1) INFuture 2007, Zagreb

  20. One day any SSH scholar should be able to ask without any difficulty: “List all uses of enthusiasm in 19th century English novels written by women” “Find all video clips of Tony Blair on BBC in 2007” “Summarize Le Monde of October 7th 2007 – in Croatian” Summing up (2) INFuture 2007, Zagreb

More Related