110 likes | 255 Views
Populating the Infrastructure using Standards. Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics. CLARIN Coordinators Meeting June 29,30 Budapest. CLARIN NL Context. 4 Dutch CLARIN centers each with their own interests and traditions DANS, Dutch Academy d ata archiving s ervice
E N D
Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29,30 Budapest
CLARIN NL Context 4 Dutch CLARIN centers each with their own interests and traditions • DANS, Dutch Academy data archiving service • INL, Dutch Institute for Lexicography • Meertens Institute, Dutch dialects and language variation • MPI for Psycholinguistics, Endangered Languages, acquisition corpora • Different cross center relations • Organizational relations • Past and existing project cooperation • Can all lead to different preferences for technical solutions, interoperability approaches and data-formats • All have production environments that need to deliver services, so they tend to be conservative with changes • New technology needs to be understood first and usually parallel systems are created • General adaptations for CLARIN requirements can only be slowly introduced • Although centers made commitments, resources are limited.
CLARIN NL Goals • Build and support relevant central infrastructure services • Guide harmonizing the relevant practices and systems at the centers by long-term funded projects • Accept and deliver CLARIN metadata (CMDI) for LRT resources • Use PIDs to identify resources • Federated Identity management as an AAI solution • Use CLARIN recommended formats… • Connect these to the Dutch LRT research world • Offering access to resources and technology • Offering infrastructure services: e.g. catalog of LRs • Run LT services as standardized web-services • Therefore: • infrastructure projects for and by the centers • small short-term projects cross-linking research groups with CLARIN centers
Infrastructure Projects • Creating and testing CLARIN metadata components • Two major Dutch Language Resource centers testing CMDI for their resources • Infrastructure Integration Project • Building & maintaining registries: • ISO-Cat, REL-Cat • CMDI Component registry, ARBIL metadata editor • Planning and supporting the AAI for the CLARIN centers and and user organizations • For format & tag set standards we look to CLARIN EU documentation, but .. • Archivable format + installed base = ok • Should be reluctant to adopt new formats • Search and Development • Federated content search for the CLARIN centers • In cooperation with the CLARIN EU EDC initiative • Find we have to extend the SRU/CQL standard • CLAVAS, CLARIN Vocabulary Service
CLARIN standards info • CLARIN EU website. CLARIN EU FAQ has a few standard recommendations and a CLARIN Standardization Action Plan. There was some criticism about the ‘too theoretical’ content of this document. • CLARIN short guide http://www.clarin.eu/files/standards-CLARIN-ShortGuide.pdf. The references in this document are out of date. • The CLARIN EU standardization action plan: http://www.clarin.eu/node/2841 also has a list of recommended standards and best practices and points to open issues and the CLARIN position. • CLARIN official documents: there is a document with a very large enumeration of LR&T standards and best practices, but contains no specific recommendation http://www-sk.let.uu.nl/u/D5C-3.pdf • CLARIN NL Helpdesk has a FAQ with a standards section: http://trac.clarin.nl/trac/wiki/WikiStart#Formatsandstandards references to known CLARIN docs
CLARIN Standards for LRT v6 Standards for LRT V6-3.pdf (http://www.clarin.eu/system/files/Standards%20for%20LRT-v6.pdf): Marc Kemps-Snijders, NúriaBel, Peter Wittenburg, Daan Broeder, Dieter van Uytvanck (CLARIN), Laurent Romary (ISOTC37, TEI), Erhard Hinrichs (CLARIN) and Gerhard Budin (Flarenet) – January 2009 • Each known name of a standard or best-practice guideline is commented according to a few criteria: • Standard indicates whether it is a standard (++), a best practice in the field (+) or simply known (0) • State indicates the state: proven (++), ready (+) or in progress (0) • Pivot indicates whether the guideline is meant as a pivot mechanism • Advise indicates whether in CLARIN the usage should be obligatory (++), recommended (+) or whether CLARIN is neutral (0)
Recommendations • Create a CLARIN EU standard registry of the form as in the “standards for LRT” doc • Setup a governance structure • With adequate representation of the • National CLARIN partners • Kindred organizations & projects as DARIAH, Flarenet, ISO-TC37 • But with emphasis on practicality • Create additional documentation as recipe books to support further uptake and application.
Thank you for your attention CLARIN has received funding fromthe European Community's Seventh Framework Programmeunder grant agreement n° 212230