1 / 17

3. Chemical Data and Data Bases

3. Chemical Data and Data Bases. Datasets and Databases. Many small datasets are available Several commercial databases of compounds and reactions (e.g. CAS) Large but not comprehensive public databases of compounds are just starting to become available

Download Presentation

3. Chemical Data and Data Bases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3. Chemical Data and Data Bases

  2. Datasets and Databases • Many small datasets are available • Several commercial databases of compounds and reactions (e.g. CAS) • Large but not comprehensive public databases of compounds are just starting to become available • As of today, there is no large public database of reactions 2

  3. Data: Small Datasets (examples) • Mutag (Mutagenicity) • 200 compounds (125/63), mutagenicity in Salmonella • PTC (Predictive Toxicity Challenge) • A few hundred compounds, carcinogenicity (FM,MM,FR,MR) • NCI (Anti-cancer activity) • 70,000 compounds screened for ability to inhibit growth in 60 human tumor cell lines • Alkanes (Boiling points) • All 150 non-cyclic alkanes (CnH2n+2) with n<11 and their boiling points ([-164,174]) • Benzodiazepines (QSAR) • 79 1,4-benzodiazepines-2-one, affinity towards GABAA • Solubility (Delaney and XLogP) • 1440 compounds (Delaney); 1991 compounds (XLogP) 3

  4. Large Databases • Private/ Commercial • Example: ACS Chemical Registry (CAS) [~10sM] • Expensive and cannot be “mined” • Cambridge Structural DB (CSD) [crytallographic structures, ~350K] • More recent trends • Example: eMolecules (formerly Chmoogle) • Free search engine but cannot be “mined” 4

  5. CAS CHEMICAL REGISTRY 5

  6. GROWTH of CAS CHEMICAL REGISTRY SYSTEM 6

  7. Large “Public” Databases • Zinc (UCSF) • ChemBank (Harvard) • PubChem (NIH) • ChemDB (UCI) http://cdb.ics.uci.edu J. Chen, S. J. Swamidass, Y. Dou, J. Bruand, and P. Baldi ChemDB: A Public Database of Small Molecules and Related Chemoinformatics Resources. Bioinformatics, 21, 4133-4139, (2005) 7

  8. Example of Large Public DB: ChemDB • ~5M unique compounds • Commercially available compounds • PostgreSQL/Oracle • Annotation (Experimental, Computational) • Searchable • Web interface • Similarity, in silico reactions,… 8

  9. Example of Statistics 9

  10. Molecular Weight/Solubility 10

  11. 11

  12. 12

  13. 13

  14. 14

  15. 15

  16. R M ChemDB RChemDB Filters Experiments NM 16

  17. Chemo/Bio Informatics Two Key Ingredients 1. Data 2. Similarity Measures Bioinformatics analogy and differences: • Data (GenBank, Swissprot, PDB) • Similarity (BLAST) 17

More Related