180 likes | 192 Views
This presentation discusses the importance of making data openly available in the field of chemistry research. It covers topics such as data creation, metadata, data analysis, validation, and publication. The speaker also discusses the establishment of a common ground and the role of metadata schemas in data curation. The presentation concludes with future directions for open data in chemistry research.
E N D
Making Data Openly Available • Simon Coles CombeDay 2005
Data Overload! CombeDay 2005
Simulation Video Analysis StructuresDatabase Diffractometer Propertiese-Lab X-Raye-Lab Grid Middleware CombeChem: eScience testbed Properties CombeDay 2005
Chemistry Publications Ideas and interpretations Hooks into the literature Raw data! Results & derived data CombeDay 2005
Presentation services: subject, media-specific, data, commercial portals Searching , harvesting, embedding Resource discovery, linking, embedding Resource discovery, linking, embedding Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Data analysis, transformation, mining, modelling Learning object creation, re-use Aggregator services: eBank UK Harvestingmetadata Learning & Teaching workflows Research & e-Science workflows Repositories : institutional, e-prints, subject, data, learning objects Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Deposit / self-archiving Deposit / self-archiving Validation Validation Publication Resource discovery, linking, embedding Validation Linking Peer-reviewed publications: journals, conference proceedings Quality assurance bodies Data curation: databases & databanks CombeDay 2005
Establishing common ground… • Understand the data creation process • Terminology and definitions • Data • Metadata • Datafile • Dataset • Data holding • Different views • Digital library researchers, computer scientists, chemists • Generic vs specific • Modeller vs practitioner • Aim for a common ontology • Modelling the domain • Creating a metadata schema CombeDay 2005
RAW DATA DERIVED DATA RESULTS DATA Crystallography workflow • Initialisation: mount new sample on diffractometer & set up data collection • Collection: collect data • Processing: process and correct images • Solution: solve structures • Refinement: refine structure • CIF: produce CIF (Crystallographic Information File format) • Report: generate Crystal Structure Report CombeDay 2005
Deposition into the archive CombeDay 2005
An Archive entry ecrystals.chem.soton.ac.uk CombeDay 2005
Access to the underlying data CombeDay 2005
Some metadata issues • Using simple and qualified Dublin Core • Additional chemical information in schema for harvesting e.g. empirical formula • Schema contains International Chemical Identifier (InChI) • Specifies which ‘parts’ of a dataset are present • Links to eprints (and other published literature) derived from the data • Using vocabularies specific to crystallography • Engaging the broader scientific community to ensure different schemas are compliant and standards can emerge CombeDay 2005
Dataset Data flow in eBank Dataset Dataset dcterms:references Harvesting OAI-PMH oai_dc Crystal structure (data holding) ePrint UK aggregator service Linking Harvesting OAI-PMH ebank_dc ebank_dc record (XML) Deposit dc:type=“CrystalStructure” and/or “Collection” eBank UK aggregator service Institutional repository dc:identifier Crystal structure report (HTML) dcterms:isReferencedBy Harvesting OAI-PMH oai_dc Eprint “jump-off” page (HTML) dc:identifier Eprint manifestation (e.g. PDF) Eprint oai_dc record (XML) Subject service dc:type=“Eprint” and/or ”Text” Linking CombeDay 2005 Model input Andy Powell, UKOLN.
Harvesting: OAIster CombeDay 2005
Linking and aggregating CombeDay 2005
Embedded in a science portal CombeDay 2005
Current situation • Version 2.0 eBank metadata schema • Pilot institutional e-data repository for harvesting (raw, derived, results data) using EPrints software • Exports records as ebank_dc and oai_dc • Validation of schema & discussion with International Union of Crystallography for final developments and wider deployment • Pilot eBank UK aggregator service • Developing search interface Version 1.0 • Testing with PSIgate physical sciences portal – embedding eBank UK CombeDay 2005
What’s next? • Progress towards generic metadata schemas • Validation against other schema (CCLRC Model) • Eprints.org software: allow for more generic scientific data and schemas? • Metadata enhancement: keywords based on knowledge of keywords in related publications? • Investigate identifiers: International Chemical Identifier • Explore context sensitive linking • Full embedding into chemical and crystallographic research and publishing • e-Learning embedding and pedagogic evaluation • Feasibility study in related domains CombeDay 2005