110 likes | 120 Views
This presentation discusses the practices and systems of different Dutch CLARIN centers and their preferences for technical solutions and data formats. It also highlights the goals and projects of CLARIN.NL in building and supporting relevant infrastructure services.
E N D
Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest
CLARIN NL Context 4 Dutch CLARIN centers each with their own interests and traditions • DANS, Dutch Academy data archiving service • INL, Dutch Institute for Lexicography • Meertens Institute, Dutch dialects and language variation • MPI for Psycholinguistics, Endangered Languages, acquisition corpora • Different cross center relations • Organizational relations • Past and existing project cooperation • Can all lead to different preferences for technical solutions, interoperability approaches and data-formats • All have production environments that need to deliver services, so they tend to be conservative with changes • New technology needs to be understood first and usually parallel systems are created • General adaptations for CLARIN requirements can only be slowly introduced • Although centers made commitments, resources are limited.
CLARIN NL Goals • Build and support relevant central infrastructure services • Guide harmonizing the relevant practices and systems at the centers by long-term funded projects • Accept and deliver CLARIN metadata (CMDI) for LRT resources • Use PIDs to identify resources • Federated Identity management as an AAI solution • Use CLARIN recommended formats… • Connect these to the Dutch LRT research world • Offering access to resources and technology • Offering infrastructure services: e.g. catalog of LRs • Run LT services as standardized web-services • Therefore: • infrastructure projects for and by the centers • small short-term projects cross-linking research groups with CLARIN centers
Infrastructure Projects • Creating and testing CLARIN metadata components • Two major Dutch Language Resource centers testing CMDI for their resources • Infrastructure Integration Project • Building & maintaining registries: • ISO-Cat, REL-Cat • CMDI Component registry, ARBIL metadata editor • Planning and supporting the AAI for the CLARIN centers and and user organizations • For format & tag set standards we look to CLARIN EU documentation, but .. • Archivable format + installed base = ok • Should be reluctant to adopt new formats • Search and Development • Federated content search for the CLARIN centers • In cooperation with the CLARIN EU EDC initiative • Find we have to extend the SRU/CQL standard • CLAVAS, CLARIN Vocabulary Service
CLARIN standards info • CLARIN EU website. CLARIN EU FAQ has a few standard recommendations and a CLARIN Standardization Action Plan. There was some criticism about the ‘too theoretical’ content of this document. • CLARIN short guide http://www.clarin.eu/files/standards-CLARIN-ShortGuide.pdf. The references in this document are out of date. • The CLARIN EU standardization action plan: http://www.clarin.eu/node/2841 also has a list of recommended standards and best practices and points to open issues and the CLARIN position. • CLARIN official documents: there is a document with a very large enumeration of LR&T standards and best practices, but contains no specific recommendation http://www-sk.let.uu.nl/u/D5C-3.pdf • CLARIN NL Helpdesk has a FAQ with a standards section: http://trac.clarin.nl/trac/wiki/WikiStart#Formatsandstandards references to known CLARIN docs
CLARIN Standards for LRT v6 Standards for LRT V6-3.pdf (http://www.clarin.eu/system/files/Standards%20for%20LRT-v6.pdf): Marc Kemps-Snijders, NúriaBel, Peter Wittenburg, Daan Broeder, Dieter van Uytvanck (CLARIN), Laurent Romary (ISOTC37, TEI), Erhard Hinrichs (CLARIN) and Gerhard Budin (Flarenet) – January 2009 • Each known name of a standard or best-practice guideline is commented according to a few criteria: • Standard indicates whether it is a standard (++), a best practice in the field (+) or simply known (0) • State indicates the state: proven (++), ready (+) or in progress (0) • Pivot indicates whether the guideline is meant as a pivot mechanism • Advise indicates whether in CLARIN the usage should be obligatory (++), recommended (+) or whether CLARIN is neutral (0)
Recommendations • Create a CLARIN EU standard registry of the form as in the “standards for LRT” doc • Setup a governance structure • With adequate representation of the • National CLARIN partners • Kindred organizations & projects as DARIAH, Flarenet, ISO-TC37 • But with emphasis on practicality • Create additional documentation as recipe books to support further uptake and application.
Thank you for your attention CLARIN has received funding fromthe European Community's Seventh Framework Programmeunder grant agreement n° 212230