1 / 32

Curation of Chemistry Data from the Laboratory to Publication

Curation of Chemistry Data from the Laboratory to Publication. Jeremy Frey & Simon Coles School of Chemistry University of Southampton. The Comb e Chem Project. End to End linking of data and information Laboratory to publication and back again

bishop
Download Presentation

Curation of Chemistry Data from the Laboratory to Publication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton

  2. The CombeChem Project • End to End linking of data and information • Laboratory to publication and back again • Very long data chains can be involved e.g. from a chemistry lab to mouse genetic expression • The exponential world of combinatorial synthesis and high throughput analysis meets the exponentially growing power of computing • “Automation, Semantics & the Grid” Data Curation Workshop

  3. Smart Laboratory Smart HCI Goal Knowledge not just one laboratory but many co-laboratoriesworking together Literature Report Plan & COSHH Information Integration Digital Model Analysis Synthesis Smart Storage Smart Dissemination Data Curation Workshop

  4. Problems with ‘Small Laboratory’ Working Practice “Data from experiments conducted as recently as six months ago might be suddenly deemed important, but those researchers may never find those numbers – or if they did might not know what those numbers meant” “Lost in some research assistant’s computer, the data are often irretrievable or an undecipherable string of digits” “To vet experiments, correct errors, or find new breakthroughs, scientists desperately need better ways to store and retrieve research data” “Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.” ‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006) Data Curation Workshop

  5. The concept of Publication@Source • Trace all the way back from publication to the original data – provenance • The data is the key - DataGrid • Start as you mean to go on – ELNs are a necessity • Curation of subsequently produced data Data Curation Workshop

  6. Observationsarenever collected on note pads, filter paper or other temporary paper for later transfer into a notebook If you are caught using the “scrap of paper” technique, your improperly recorded data may be confiscated by your TA Data Curation Workshop

  7. Lab books are a big block to publication@source: if it’s not digital, it is more difficult to share Only some equipment is networked This is where it all starts: The Lab & The Lab Book Need a usable digital lab book. Design by analogy to help Chemists and Computer Scientists work together. Data Curation Workshop

  8. COSHHleverage off things we already have to do Data Curation Workshop

  9. PLAN Process Record Data Curation Workshop

  10. Data Curation Workshop

  11. getRecord() There is a potential containment problem in pulling back partial RDF graphs from the triple store. Solved by using multiple triple stores but boundaries are a major issue for the future. Data Curation Workshop

  12. SURIG SURIG SURIG Data stores Architecture “Client” Libraries SOAP Planner0 Semantic Data PHP Jena Viewer0 Institutional archives and metadata publication Bench Applications Weights & Measures Java SURIG Other services Data Curation Workshop

  13. The Analytical Laboratory • Capture information from places you would not want to put your eyes • Capture environmental data automatically • Capture people and movements • Provide this information in real time as well as for the laboratory record Data Curation Workshop

  14. Pub-Sub systems provide the flexible & extensible approach to distribution Data Source Data Source BLOG Message Broker Translator Service Mobile phone Web Client Archive Client PDA Data Curation Workshop

  15. Temperature – room, laser Air Conditioning failed Door & interlock, Motion Sensors Data Curation Workshop

  16. Databases - Our experience • What do you do when the actual users keep changing their mind? • Is a traditional relational database suitable? • Danger of re-enforcing scientific bias against relational database for laboratory data. • RDF & Triple stores were again the solution Data Curation Workshop

  17. RDF/RDFS High level Schema for chemical properties Data Curation Workshop

  18. Data Curation Workshop

  19. Triple Stores - The Heart of the Semantic Web Scaling - 3Store response Memory leak in testing program! Data Curation Workshop

  20. The Semantic Web! Scaling the triplestores Moved from… • A model of harvesting data from multiple sources into one scalable store to • A model of distributed RDF sources and caching what is needed for the task at hand into multiple stores fit-for-purpose Data Curation Workshop

  21. Experiments on the Grid: The NCS Service HTTPS Data Curation Workshop

  22. ADS Binary raw data archived in Atlas Datastore x300 £’s Data Curation Workshop

  23. A Data-Rich Subject – the Crystallography Problem 1.5,000,000 30,000,000 450,000 Data Curation Workshop

  24. The eCrystals Digital Repository http://ecrystals.chem.soton.ac.uk Data Curation Workshop

  25. Access to the underlying data Data Curation Workshop

  26. The eCrystals ‘Global’ Model Data analysis, transformation, mining, modelling Presentation services / portals Data discovery, linking, citation Publishers: peer-review journals, conference proceedings, etc Aggregator services Publication Laboratory repository Deposit Validation Institutional data repositories Search, harvest Validation Preservation and curation Deposit Data Curation Workshop

  27. Laboratory Repositories and Information Management Data Curation Workshop

  28. Need for a data archive in the laboratory Not just the published spectra! Data Curation Workshop

  29. The R4L Repository Create new compound Add experiment data and metadata Deposit Search / Browse Data Curation Workshop

  30. Several groups making and analysing; the library Administrative Domains transfer or share the data National Archive Research Group Researcher International Database Research Group Institution Data Curation Workshop

  31. Paper organized using RDF SVG “active” graphics Link to data, follow links back to the raw data archive R4L Link to simulation, full simulation data archived in BioSimGrid Data Curation Workshop

  32. Summary: • Making sure other people can find, understand and re-use your data easily and with confidence (even when there is a huge amount of it!) • Make use of Plans to inform the digital context - metadata in advance • Have concern for the “End-to-End life cycle” of chemistry information from the start. • Understanding Usability and Human Computer Interaction is vital for adoption Data Curation Workshop

More Related