1 / 12

Smart Storage for Physical Properties

Or How on Earth do we Store this Stuff?. Smart Storage for Physical Properties. Kieron Taylor with Jeremy Frey and Jonathan Essex. What makes up chemical data?. Numbers - big, small, precise and vague Circumstances - How hot? What pressure? Assumptions

genica
Download Presentation

Smart Storage for Physical Properties

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Or How on Earth do we Store this Stuff? Smart Storage for Physical Properties Kieron Taylor with Jeremy Frey and Jonathan Essex

  2. What makes up chemical data? • Numbers - big, small, precise and vague • Circumstances - How hot? What pressure? • Assumptions • This is pretty pure, let's say it's pure • Standard conditions? More or less • That peak on the spectrum isn't important

  3. Using the Data: QSPR Take lots of data Magical statistics occur Validate results Predictive model

  4. So What is Real Data like? Bad - take the commercial Physprop Database Can we handle these melting points?

  5. Let's Make a Database • One data source is not enough • Good(?) data isn't free • Different sources have varied style of content • Most database software not suited to data mining • We cannot plumb these varied sources for data, we must reconcile them to make sensible statistics

  6. Relational Design For one molecule: Cyclohexanone Property Value Error Units Source Method Author Note Solubility 2500 +/-50 mg/L Physprop Laboratory ... 2650+/-60 mg/L Southampton Simulation Me Superceded 2599+/-25 mg/L Southampton Simulation B Me Melting point -31 +/-0.1 C Detherm Laboratory ... Boiling point 155.4 +/-0.5 C Merck Index Laboratory ... Decomposing Property Value Units Solubility 2500 mg/L Melting point -31 C Boiling point 155.4 C Property Value Error Units Source Solubility 2500 +/-50 mg/L Physprop 2650 +/-60 mg/L Our lab Melting point -31 +/-0.1 C Detherm Boiling point 155.4 +/-0.5 C Merck Index Property Value Error Units Source Method Author Solubility 2500 +/-50 mg/L Physprop Laboratory ... 2650 +/-60 mg/L Southampton Simulation Me Melting point -31 +/-0.1 C Detherm Laboratory ... Boiling point 155.4 +/-0.5 C Merck Index Laboratory ... Arbitrary numbers of points are hard to store in relational databases We're not done yet: We still have to account for multiple experimental conditions, statements of validity and molecules. Provenance = Senary relational model?

  7. RDF Triplestore is the Solution • RDF describes trees and networks of entities • Data of this complexity lends itself well to a tree representation • RDF trees enable additional clever things • Triplestores provide persistent RDF models

  8. What can we do with this? • Store almost any chemical data as normal • Track the where, when and how of each and every data point • Filter values down whether real, simulated, old, new, from a particular source, or done by a particular person. • Bolt on RDF schemas such as FOAF and our units system.

  9. What have we done with this? http://green.chem.soton.ac.uk/triangle/query.html

  10. Thanks to: • AKT and Steve Harris for 3store • Rob Gledhill for web tech and discussion • Perl for s/ / /g

More Related