150 likes | 348 Views
This work is licensed under a Creative Commons Licence Attribution-ShareAlike 3.0. http://creativecommons.org/licenses/by-sa/3.0/. eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton, UK
E N D
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0 http://creativecommons.org/licenses/by-sa/3.0/ eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton, UK Chemical Informatics Workshop, Manchester, March 2008 Federation
Themes • Context: Institutional data repositories crystallography exemplar • Scale: repository federations • Longevity: Digital curation and preservation • Integration: Semantic challenges
eBank Project – building the eCrystals Data Repository Started Sept 2003 Scholarly knowledge cycle context UKOLN-led interdisciplinary team ePrints platform @ Southampton Institutional Repository exemplar Embedded in workflow http://ecrystals.chem.soton.ac.uk
Scaling Up Report Phase 3 findings: Data policy should reflect lab practice & institutional model Diverse lab practice LIMS proprietary formats Data quality criteria/validation “Prior publication” problem We need automated assignment of terms for data discovery No discipline preservation model
The nλ = 2 d sinθ
eCrystals Repository ePrints.org v3.0
Repository Foundations Learned society + subject repository support • Using simple Dublin Core • Crystal structure • Title (Systematic IUPAC Name) • Authors • Affiliation • Creation Date • Additional chemical information through Qualified Dublin Core • Empirical formula • International Chemical Identifier (InChI) • Compound Class & Keywords • Specifies which ‘datasets’ are present in an entry • Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/ • DOI links http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145 • Rights & Citation http://ecrystals.chem.soton.ac.uk/rights.html
Federation interoperability & linking services • Roll-out in 2 phases led by University of Southampton • Establish Federation policies, application profile, mappings • Bi-directional links with derived articles in “publisher repositories”, IUCr, Royal Society of Chemistry (RSC), Chemistry Central: scholarly knowledge cycle • StOReLink project - Test linking options: StORe middleware and CLADDIER • OAI-ORE Testbed eChemistry project
Laboratory practice & workflow X-ray diffractometers Community standard CIF Mixed lab practice – central service facility versus single “staff crystallographer” in department Achieve end-to-end workflow Challenge of instrument manufacturers with proprietary formats “Repository Lite” for smaller lab operations?
eBank-UK Phase 3 Curation & Preservation Study: Sustainability issues http://www.ukoln.ac.uk/projects/ebank-uk/curation/ Examined four main areas • Audit and certification (TRAC, DRAMBORA, NESTOR, ISO International repository audit and certification BOF Group) • The Open Archival Information System (OAIS) and Representation Information (RI) • eBank-UK application profile and preservation metadata • ePrints.org repository platform Recommendations: Self-assessment using DRAMBORA Consider Representation Information in wider context Develop preservation strategy Capture preservation metadata - PREMIS
Semantic issues Crystallographic schema underpins CIF (Crystallographic Information Framework), but is limited to data parameters e.g. cell_length_a
IUCr Acta Cryst 1992 Limited set of keywords describing methods, properties & applications, compounds, attributes No established crystallography dictionary or controlled vocabulary to give chemistry context
What do we want to do? • Support depositors’ keyword/term assignment • Facilitate and improve automated indexing • Support advanced search / browse • Allow metadata validation & enhancement • Apply across a heterogeneous Federation • Cross search, cross browse functionality • Link data to all associated digital objects • Develop domain semantics / vocabulary • Use domain-specific authority files • Mine to “discover” rather than “find” • Achieve full inter-disciplinary integration
Some (semantic) issues….. • How are terms assigned? • Informal tags and/or structured KOS? • How is a vocabulary curated and maintained? • Can a vocabulary be transformed into a (Semantic Web related understanding) ontology? • Disambiguation, acronyms, IUPAC names • Persistent identification for data citation • Granularity of data citation • Data (and metadata) quality, provenance, validation • Embedding within complex workflows • Use collaborative social approaches? • Community adoption: becomes part of the culture
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0 http://creativecommons.org/licenses/by-sa/3.0/ Questions? Slides will be available at :http://wiki.ecrystals.chem.soton.ac.uk/index.php http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.html Federation