370 likes | 533 Views
„Metadata“. The DRIVER experience and the OpenAIRE direction. The metadata scope of this talk. Metadata is a multifacted thing and you can do many beautiful things with it… Focus in DRIVER and OpenAIRE
E N D
„Metadata“ The DRIVER experienceandthe OpenAIRE direction Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann
The metadata scope of this talk • Metadata is a multifacted thing and you can do many beautiful things with it… • Focus in DRIVER and OpenAIRE • Metadata for Research Publications but also administrative, authority files, terminologies etc. • Format: Simple DC but also DIDL, OAI-ORE, RDF… • Protocol: OAI-PMH but also Feeds, Syncs… • Function: aggregation & search but also deploy, mine …
A coarse genealogy Portal D-NET v1.0 D-NET v1.2 2008 2009 2010 2011 2007
The Beginnings & Essentials • Since 2004, originally a service forresearchersat Bielefeld University forfindingdocuments in repositoriesdistributedacrosstheglobe • In themeantimeusedworld-wide • Indexing >25 Mio. docsfrom >1500 sources • Simple, pragmatic, informal andindependent; minimal effort but highreliabilityandvalue • Mostly OAI-PMH > Synergieswith DRIVER • Nowwork on Thesauri, Mining, Syncing etc.
Lessons learnt • OAI-PMH/SimpleDCallowseffectivesearchenginewithimmediateaddedvalue • Manyyearsofoperationshowthateven simple, distributedapproachesrequire a lotofcareandpatience • Heterogeneityofdistributedresourcesintroducesambiguityandrequires service-sidedeffort • Over 1000 profilesandprocessingpipelinesforsources • Negative effectsattenuatedbydisplayforhumans • „usersknowwhattheysee, whentheyseeit“ • Main drawbacks • Localdataquality • missingsharingandre-usebetween service-providers • „Repository Infrastructure“ needed
A coarse genealogy Portal D-NET v1.0 D-NET v1.2 2008 2009 2010 2011 2007
DRIVER Objectives: Infrastructure! • Organisational structuresforrepositories • e.g. the „Confederation“ • Improvingqualityandstandards in localrep. • e.g. guidelinesandvalidationprocedures • Building a distributedinfrastructureformetadata • e.g. service andfunctionsharing • Target Groups • Repository Managers • Service Providers • Information System Executives
What infrastructures are: DRIVER terms • Not an infrastructure • Single repository • Single application for search and retrieval (e.g. BASE) • Only local operation • Backwards causation on repositories is missing • Maybe an infrastructure • Distributed repository landscape as a whole • As a capacity for emergent properties, e.g. quality and quantity incentive for data population • Nurturing development of service providers • Definitely an infrastructure • Many service providers in one organisational and technical context (e.g. run-time environment) • Enabling re-use and remix of data and services
The DRIVER approach was incremental • Start with publication metadata • Existing distributed system, somehow connected • Considerable homogeneity and formats: OAI-PMH • Extend geographical coverage • From 5 countries, to 10, to 27, to ??? • Extend towards other contents • From publication metadata to enhanced publications, i.e. representations of „texts + data“ • Learn about subject specificity • Data bring in disciplinary requirements
The DRIVER Initiative • DRIVER-I 6/2006 – 11/2007 • Organisational Models and Technical Test-Bed • DRIVER-II 12/2007 – 11/2009 • Running Organisation and Production Infrastructure • DRIVER-Confederation and Technical Service 2010ff • Organisation and Technical Deployment 14
Some Results: Guidelines • Build on knowledge from past & current IR projects (EU) • 26 actively involved contributors (experts and repository managers) from 8 countries. • Practical answers on how to: • Improve full-text access • Standardize metadata quality • Create a reliable infrastructure for permanent identification, resolution, traceability and storage • Resolve semantic and classification issues
Some Results: Service-Oriented-Arch. 9hosting nodes 25+ Functionality typologies (services) 36 service Instances + other applications: Spain, Slovenia, EFG …
Some Results: Runtime-System & Hosting National portals Advanced User Interfaces Project Applications End users Functionality Layer EU Open Access Repositories Data Layer Administrators Enabling Layer 23
Some Results: A software Meant for large service providers only!
Lessons learnt • Distributed data infrastructure requires links between organisational and technical concepts • Data specialists, computer scientists, service providers • Guidelines / content policies as a „glue“ • In distributed data provision, quality and access measures are the most ‚expensive‘ tasks • Infrastructure AND data focus very demanding • Distributed service operation (not data provision) can be solved but asks novel questions (SLAs) • Infrastructure is there, applications are next…
Metadata aspects in DRIVER • OAI-PMH/SimpleDC corroborated • Necessity for other extensions shown • Administrative (CRIS): ‚project‘, ‚funder‘ • Subject-specific: NLM, PACS etc. • Authority files: institutions, journals, authors… • Enhanced Publications = Text + Data • Aggregation-Encoding: DIDL, OAI-ORE • Introduce preservation-challenges • Necessity for different Service-Typology
A coarse genealogy Portal D-NET v1.0 D-NET v1.2 2008 2009 2010 2011 2007
Primer Metadata-Workshop | Nijmegen | 7/8-SEP-2010 Wolfram Horstmann
OpenAIRE Assignment • OpenAIRE Open Access Infrastructure for Research in Europe • Objective Support the Open Access Pilot ofthe EC & ERC (Practicalimplementationof „clause 39“) - European Helpdesk: National Nodes - Repository Infrastructure: Deposit-Multiplexer - Research on Metadata, Impact & Disciplines Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann
OpenAIRE - factsheet Open Access Infrastructure for Research in Europe • Programme: FP7 – Research Infrastructures • Starting date: December 1, 2009 • Duration: 36 months • Budget: 4.1 Million • 38 partners covering all European member-states • To be reached at www.openaire.eu Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann
European Helpdesk • Promote FP7-pilot and ERC OA guidelines • National Open Access Liaison Offices (27 countries) • Provide OA “toolkits” for • Researchers • Institutions • Setup 24/7 portal for deposit, search of OA publications • Liaison with • Other European OA initiatives • Publishers • CRIS systems Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann
Liaison Offices Region 1 North(DTU) Region 2 South(UMINHO) Region 3 East(eIFL) Region 4 West(UGENT) Denmark (Danish Technical University) Cyprus (UniverstityofCyprus) Bulgaria (BulgarianAcademyofSciences) Austria (University of Wien) Finland (University of Helsinki) Greece (National Documentation Center) Czech Republic (Technical University of Ostrava) Belgium (Universtiyof Gent) Sweden (National Library ofSweden) Italy (CASPAR) France (Couperin) Estonia (University of Tartu) Malta (Malta Council for Science & Technology) Hungary (HUNOR) Germany (University ofKostanz) Latvia (University ofLatvia) Portugal (University ofMinho) Ireland (Trinity College) Spain (SpanishFoundationfor Science & Technology) Lithuania (Kaunas Technical University) Netherlands (Utrecht University) Poland (ICM – University ofWarsaw) UK (SHERPA) Romania (Kosson) Slovakia (university Library of Bratislava) Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann Slovenia (University ofLjubljana)
Supporting Repository Infrastructure • OpenAIRE portal built on D-NET • Access to scientific publications • Search, browse • Visualization tools • Deposition of articles • Setup repository for homeless researchers (INVENIO) • Multiplexer for OA publications in existing repositories • Provide monitoring tools for • Document/depositing statistics • Usage statistics from repository infrastructure • Interoperation with other infrastructures Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann
OpenAIRE system in a nutshell OpenAIRE overall overview: functionalities and domains served Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann
Explorative activities JRA • Interoperability for usage statistics / metrics and administrative research information systems (CRIS/CERIF) • Explore the requirements, practices, incentives, workflows, data models, and technologies to deposit, access, and otherwise manipulate research datasets • Work with four (4) scientific communities • Health (Life Sciences) • Environment • Information & Communication Science • Socio-economic Sciences and Humanities Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann
Metadatadirectionsforeseeable • Repository compliance even more important than in DRIVER • Interface to administrative systems essential • E.g. EC project database • Authority files for authors, journals etc. • Exchange with others: ArXiV, PubMed etc. • Data extensions will introduce new worlds this is a demo slide presentation to show you all the layouts
A coarse genealogy Portal D-NET v1.0 D-NET v1.2 2008 2009 2010 2011 2007
Conclusions • Metadata allow and require serious international infrastructure in research • Even very simple approaches unfold complexity in distributed systems • „Division of labour“ necessary • Keep an eye on trade-offs between specialized expertise vs. organisational overhead • Suggested approach: Simple and integrative rather than complex and integrated