180 likes | 207 Views
Explore small datasets like Mutag and PTC, large databases like ACS Chemical Registry, and public resources like Zinc and PubChem in the field of chemical informatics.
E N D
Datasets and Databases • Many small datasets are available • Several commercial databases of compounds and reactions (e.g. CAS) • Large but not comprehensive public databases of compounds are just starting to become available • As of today, there is no large public database of reactions 2
Data: Small Datasets (examples) • Mutag (Mutagenicity) • 200 compounds (125/63), mutagenicity in Salmonella • PTC (Predictive Toxicity Challenge) • A few hundred compounds, carcinogenicity (FM,MM,FR,MR) • NCI (Anti-cancer activity) • 70,000 compounds screened for ability to inhibit growth in 60 human tumor cell lines • Alkanes (Boiling points) • All 150 non-cyclic alkanes (CnH2n+2) with n<11 and their boiling points ([-164,174]) • Benzodiazepines (QSAR) • 79 1,4-benzodiazepines-2-one, affinity towards GABAA • Solubility (Delaney and XLogP) • 1440 compounds (Delaney); 1991 compounds (XLogP) 3
Large Databases • Private/ Commercial • Example: ACS Chemical Registry (CAS) [~10sM] • Expensive and cannot be “mined” • Cambridge Structural DB (CSD) [crytallographic structures, ~350K] • More recent trends • Example: eMolecules (formerly Chmoogle) • Free search engine but cannot be “mined” 4
Large “Public” Databases • Zinc (UCSF) • ChemBank (Harvard) • PubChem (NIH) • ChemDB (UCI) http://cdb.ics.uci.edu J. Chen, S. J. Swamidass, Y. Dou, J. Bruand, and P. Baldi ChemDB: A Public Database of Small Molecules and Related Chemoinformatics Resources. Bioinformatics, 21, 4133-4139, (2005) 7
Example of Large Public DB: ChemDB • ~5M unique compounds • Commercially available compounds • PostgreSQL/Oracle • Annotation (Experimental, Computational) • Searchable • Web interface • Similarity, in silico reactions,… 8
R M ChemDB RChemDB Filters Experiments NM 16
Chemo/Bio Informatics Two Key Ingredients 1. Data 2. Similarity Measures Bioinformatics analogy and differences: • Data (GenBank, Swissprot, PDB) • Similarity (BLAST) 17