1 / 25

UKOLN is supported by:

Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of Bath Leslie Carr, Simon Coles University of Southampton. UKOLN is supported by:. JCDL 2005, June 7-11, Denver. www.bath.ac.uk.

liluye
Download Presentation

UKOLN is supported by:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enhancing access to research data: the challenge of crystallography Rachel Heery, Monica Duke, Michael Day UKOLN, University of Bath Leslie Carr, Simon Coles University of Southampton UKOLN is supported by: JCDL 2005, June 7-11, Denver www.bath.ac.uk A centre of expertise in digital informaion management

  2. Enhancing access to research data: overview • Crystallography as an exemplar • Impact of digital technologies on scientific research process • Need new modes of data curation • eBank project: applying digital library techniques to support data curation • Next steps

  3. Changes in scientific research process • Increasing data volumes from eScience / Grid-enabled / cyber-infrastructure applications, “big science” • Changing research methods: high througput technologies, automation, ‘smart labs’ • Potential for re-use of data, new inter-disciplinary research • Different types of data: observational data, experimental data, computational data: different stewardship requirements

  4. Data Overload! EPSRC National Crystallography Service How do we disseminate? The data deluge: crystallography

  5. Data overload & the publication bottleneck 2,000,000 25,000,000 300,000

  6. Current Publishing Process • Journal articles: aims, ideas, context, conclusions – only most significant data • Raw & underlying data required by peers not readily available

  7. Context: existing data repositories • National data archives: • UK Data Archive, Arts and Humanities Data Service, US National Archives and Records Administration (NARA), Atlas Datastore • Discipline specific archives: • GenBank, Protein Data Bank • Crystallography archives • Cambridge Crystallographic Data Centre (Cambridge Structural Database) , Indiana University Molecular Structure Center (Crystal Data Server, Reciprocal Net), FIZ Karlsruhe (Inorganic crystals), Toth Information Systems (CHRYSTMET) • Journals require deposit of data to support articles • Typically deposit of summary data…. partial coverage

  8. RAW DATA DERIVED DATA RESULTS DATA Crystallography workflow • Initialisation: mount new sample on diffractometer & set up data collection • Collection: collect data • Processing: process and correct images • Solution: solve structures • Refinement: refine structure • CIF: produce CIF (Crystallographic Information File) • Validation: chemical & crystallographic checks

  9. eBank UK project overview • JISC funded in 2003, now in Phase 2 to 2006 • Joint effort between crystallographers, computer scientists, digital library researchers • Investigating contribution of existing digital library technologies to enable ‘publication at source’ • Partners have interest in dissemination of chemistry research data, open access, OAI, institutional repositories http://www.ukoln.ac.uk/projects/ebank-uk/

  10. eBank project team University of Bath, UKOLN • Michael Day, Monica Duke, Rachel Heery, Liz Lyon, Traugott Koch University of Southampton, School of Chemistry • Simon Coles, Jeremy Frey, Mike Hursthouse University of Southampton, School of Electronics and Computer Science • Leslie Carr, Chris Gutteridge University of Manchester, PSIgate • John Blunden-Ellis

  11. eBank phase one: achievements • Gathered requirements from crystallographers • Established pilot institutional repository for crystallography data at Southampton with web interface • Developed a demonstrator aggregator service at UKOLN (CCDC exploring aggregation service) • Developed appropriate schema • Demonstrated a search interface as an embedded service at PSIgate portal • Demonstrated an added value service linking research data to papers (one-off)

  12. Institutional repositories…publication at source • Institution establishes repository(s) • Institution pro-actively supports deposit process • OAI provides basis for interoperability • Potential for added value services • And/Or ….international subject based archives?

  13. Crystallography good fit…. • Crystallography has well defined data creation workflow • Tradition of sharing using standard file format • Crystallography Information File (CIF) • What about other chemistry sub-disciplines? other scientific disciplines?

  14. HTML Submit present Store/link Harvest (XML) HTML present Data Flow in eBank UK Create OAI-PMH Index and Search Institutional repository eBank aggregator Data files Metadata

  15. Southampton digital repository http://ecrystals.chem.soton.ac.uk

  16. Access to ALL underlying data

  17. OAI-PMH: harvesting and aggregating eBank aggregator at UKOLN http://eprints-uk.rdn.ac.uk/ebank-demo/ Demonstrating potential for linking between data and journal article

  18. Embedded search service at PSIgate PSIgate subject gateway: service provider

  19. Schema for records made available for harvesting • Data holding (collection of files associated with experiment) • Qualified Dublin Core data elements plus additional chemical properties • Empirical formula • International Chemical Identifier (InChI) • Compound Class • Individual data files • Separate records for stage status of each file • Description set wrapped into one XML record using METS • Research metadata/data as a complex object

  20. Dataset eBank data model Dataset Dataset dcterms:references Harvesting OAI-PMH oai_dc Crystal structure (data holding) ePrint UK aggregator service Linking dc:type=“CrystalStructure” Harvesting OAI-PMH ebank_dc ebank_dc record (XML) Deposit eBank UK aggregator service dc:identifier Institutional repositories dcterms:isReferencedBy Crystal structure report (HTML) Deposit Harvesting OAI-PMH oai_dc,ebank_dc Eprint “jump-off” page (HTML) dc:identifier Eprint manifestation (e.g. PDF) Eprint oai_dc record (XML) Other aggregators and services dc:type=“Eprint” and/or ”Text” Linking Model input Andy Powell, UKOLN.

  21. Creating the metadata • Potential to embed ‘deposit and disseminate’ into workflow of chemist in automated way

  22. Setup via GUI BruNo Unmount Sample Tray BruNo Mount PreScans Diffraction No Yes Unit Cell Success No Yes Strategy Data Collection Data Process System Y Data Collection

  23. eBank phase two work areas • Sub-disciplines of chemistry and physical sciences • Pursue generic data model • Use of identifiers for citing datasets • Subject approach to discovering research data • Access to research data in teaching and learning context • Liaise with other digital repository initiatives

  24. For the future… • Who provides added value services? • Authority files, automated subject indexing, annotation, data mining, visualisation • What are the preservation issues? • UK Digital Curation Centre http://www.dcc.ac.uk • National Science Board Draft report on long-lived data collections http://www.nsf.gov/nsb/meetings/2005/LLDDC_draftreport.pdf • How to manage complex objects descriptions within OAI • Digital curation of research data presents new roles for scientists, computer scientists, data managers…. ‘data scientists’

  25. Thank you.Comments, questions?http://www.ukoln.ac.uk/projects/ebank-uk/Acnowledgement to all project partners for their contributions to this presentation.

More Related