1 / 18

Open Data for Chemistry Research

This presentation discusses the importance of making data openly available in the field of chemistry research. It covers topics such as data creation, metadata, data analysis, validation, and publication. The speaker also discusses the establishment of a common ground and the role of metadata schemas in data curation. The presentation concludes with future directions for open data in chemistry research.

ruano
Download Presentation

Open Data for Chemistry Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Data Openly Available • Simon Coles CombeDay 2005

  2. Data Overload! CombeDay 2005

  3. Simulation Video Analysis StructuresDatabase Diffractometer Propertiese-Lab X-Raye-Lab Grid Middleware CombeChem: eScience testbed Properties CombeDay 2005

  4. Chemistry Publications Ideas and interpretations Hooks into the literature Raw data! Results & derived data CombeDay 2005

  5. CombeDay 2005

  6. Presentation services: subject, media-specific, data, commercial portals Searching , harvesting, embedding Resource discovery, linking, embedding Resource discovery, linking, embedding Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Data analysis, transformation, mining, modelling Learning object creation, re-use Aggregator services: eBank UK Harvestingmetadata Learning & Teaching workflows Research & e-Science workflows Repositories : institutional, e-prints, subject, data, learning objects Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Deposit / self-archiving Deposit / self-archiving Validation Validation Publication Resource discovery, linking, embedding Validation Linking Peer-reviewed publications: journals, conference proceedings Quality assurance bodies Data curation: databases & databanks CombeDay 2005

  7. Establishing common ground… • Understand the data creation process • Terminology and definitions • Data • Metadata • Datafile • Dataset • Data holding • Different views • Digital library researchers, computer scientists, chemists • Generic vs specific • Modeller vs practitioner • Aim for a common ontology • Modelling the domain • Creating a metadata schema CombeDay 2005

  8. RAW DATA DERIVED DATA RESULTS DATA Crystallography workflow • Initialisation: mount new sample on diffractometer & set up data collection • Collection: collect data • Processing: process and correct images • Solution: solve structures • Refinement: refine structure • CIF: produce CIF (Crystallographic Information File format) • Report: generate Crystal Structure Report CombeDay 2005

  9. Deposition into the archive CombeDay 2005

  10. An Archive entry ecrystals.chem.soton.ac.uk CombeDay 2005

  11. Access to the underlying data CombeDay 2005

  12. Some metadata issues • Using simple and qualified Dublin Core • Additional chemical information in schema for harvesting e.g. empirical formula • Schema contains International Chemical Identifier (InChI) • Specifies which ‘parts’ of a dataset are present • Links to eprints (and other published literature) derived from the data • Using vocabularies specific to crystallography • Engaging the broader scientific community to ensure different schemas are compliant and standards can emerge CombeDay 2005

  13. Dataset Data flow in eBank Dataset Dataset dcterms:references Harvesting OAI-PMH oai_dc Crystal structure (data holding) ePrint UK aggregator service Linking Harvesting OAI-PMH ebank_dc ebank_dc record (XML) Deposit dc:type=“CrystalStructure” and/or “Collection” eBank UK aggregator service Institutional repository dc:identifier Crystal structure report (HTML) dcterms:isReferencedBy Harvesting OAI-PMH oai_dc Eprint “jump-off” page (HTML) dc:identifier Eprint manifestation (e.g. PDF) Eprint oai_dc record (XML) Subject service dc:type=“Eprint” and/or ”Text” Linking CombeDay 2005 Model input Andy Powell, UKOLN.

  14. Harvesting: OAIster CombeDay 2005

  15. Linking and aggregating CombeDay 2005

  16. Embedded in a science portal CombeDay 2005

  17. Current situation • Version 2.0 eBank metadata schema • Pilot institutional e-data repository for harvesting (raw, derived, results data) using EPrints software • Exports records as ebank_dc and oai_dc • Validation of schema & discussion with International Union of Crystallography for final developments and wider deployment • Pilot eBank UK aggregator service • Developing search interface Version 1.0 • Testing with PSIgate physical sciences portal – embedding eBank UK CombeDay 2005

  18. What’s next? • Progress towards generic metadata schemas • Validation against other schema (CCLRC Model) • Eprints.org software: allow for more generic scientific data and schemas? • Metadata enhancement: keywords based on knowledge of keywords in related publications? • Investigate identifiers: International Chemical Identifier • Explore context sensitive linking • Full embedding into chemical and crystallographic research and publishing • e-Learning embedding and pedagogic evaluation • Feasibility study in related domains CombeDay 2005

More Related