1 / 15

Federation

This work is licensed under a Creative Commons Licence Attribution-ShareAlike 3.0. http://creativecommons.org/licenses/by-sa/3.0/. eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton, UK

uri
Download Presentation

Federation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0 http://creativecommons.org/licenses/by-sa/3.0/ eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton, UK Chemical Informatics Workshop, Manchester, March 2008 Federation

  2. Themes • Context: Institutional data repositories crystallography exemplar • Scale: repository federations • Longevity: Digital curation and preservation • Integration: Semantic challenges

  3. eBank Project – building the eCrystals Data Repository Started Sept 2003 Scholarly knowledge cycle context UKOLN-led interdisciplinary team ePrints platform @ Southampton Institutional Repository exemplar Embedded in workflow http://ecrystals.chem.soton.ac.uk

  4. Scaling Up Report Phase 3 findings: Data policy should reflect lab practice & institutional model Diverse lab practice LIMS proprietary formats Data quality criteria/validation “Prior publication” problem We need automated assignment of terms for data discovery No discipline preservation model

  5. The nλ = 2 d sinθ

  6. eCrystals Repository ePrints.org v3.0

  7. Repository Foundations Learned society + subject repository support • Using simple Dublin Core • Crystal structure • Title (Systematic IUPAC Name) • Authors • Affiliation • Creation Date • Additional chemical information through Qualified Dublin Core • Empirical formula • International Chemical Identifier (InChI) • Compound Class & Keywords • Specifies which ‘datasets’ are present in an entry • Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/ • DOI links http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145 • Rights & Citation http://ecrystals.chem.soton.ac.uk/rights.html

  8. Federation interoperability & linking services • Roll-out in 2 phases led by University of Southampton • Establish Federation policies, application profile, mappings • Bi-directional links with derived articles in “publisher repositories”, IUCr, Royal Society of Chemistry (RSC), Chemistry Central: scholarly knowledge cycle • StOReLink project - Test linking options: StORe middleware and CLADDIER • OAI-ORE Testbed eChemistry project

  9. Laboratory practice & workflow X-ray diffractometers Community standard CIF Mixed lab practice – central service facility versus single “staff crystallographer” in department Achieve end-to-end workflow Challenge of instrument manufacturers with proprietary formats “Repository Lite” for smaller lab operations?

  10. eBank-UK Phase 3 Curation & Preservation Study: Sustainability issues http://www.ukoln.ac.uk/projects/ebank-uk/curation/ Examined four main areas • Audit and certification (TRAC, DRAMBORA, NESTOR, ISO International repository audit and certification BOF Group) • The Open Archival Information System (OAIS) and Representation Information (RI) • eBank-UK application profile and preservation metadata • ePrints.org repository platform Recommendations: Self-assessment using DRAMBORA Consider Representation Information in wider context Develop preservation strategy Capture preservation metadata - PREMIS

  11. Semantic issues Crystallographic schema underpins CIF (Crystallographic Information Framework), but is limited to data parameters e.g. cell_length_a

  12. IUCr Acta Cryst 1992 Limited set of keywords describing methods, properties & applications, compounds, attributes No established crystallography dictionary or controlled vocabulary to give chemistry context

  13. What do we want to do? • Support depositors’ keyword/term assignment • Facilitate and improve automated indexing • Support advanced search / browse • Allow metadata validation & enhancement • Apply across a heterogeneous Federation • Cross search, cross browse functionality • Link data to all associated digital objects • Develop domain semantics / vocabulary • Use domain-specific authority files • Mine to “discover” rather than “find” • Achieve full inter-disciplinary integration

  14. Some (semantic) issues….. • How are terms assigned? • Informal tags and/or structured KOS? • How is a vocabulary curated and maintained? • Can a vocabulary be transformed into a (Semantic Web related understanding) ontology? • Disambiguation, acronyms, IUPAC names • Persistent identification for data citation • Granularity of data citation • Data (and metadata) quality, provenance, validation • Embedding within complex workflows • Use collaborative social approaches? • Community adoption: becomes part of the culture

  15. This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0 http://creativecommons.org/licenses/by-sa/3.0/ Questions? Slides will be available at :http://wiki.ecrystals.chem.soton.ac.uk/index.php http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.html Federation

More Related