1.01k likes | 1.29k Views
Bridging Bioinformatics and Chem(o)informatics. Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith Saba (SLIS MLS Student). Provocative Thought.
E N D
Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith Saba (SLIS MLS Student)
Provocative Thought “While much bioscience is published with the knowledge that machines will be expected to understand at least part of it, almost all chemistry is published purely for humans to read.” • Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3201.
Overview of the Talk • Review of ACS CINF 2004 Papers • Review of Relevant Articles • Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links • Overview of Web Services • NIH-funded Projects Underway or Planned at Indiana University
“The Bigger Picture — Linking Bioinformatics to Cheminformatics” • American Chemical Society Division of Chemical Information (CINF) Symposium, Anaheim, Spring 2004 • All-day session with 16 papers • http://www.acscinf.org/new/docs/meetings/227nm/227cinfabstracts.htm
Problems from ACS CINF 2004 • Both technical and people factors hinder knowledge exchange between biology and chemistry. (Lipinski) • People Problems per Chris Lipinski • Meta data capture is complicated by people issues, particularly those between chemists and biologists. • Discipline-based disconnects occur distressingly often and are frequently overlooked as a cause of lost productivity.
Interdisciplinary Collaborations: Biology and Chemistry • [What’s] “... important for these collaborations is, not only do you have to accept the other guy’s paradigm or at least live with it; you have to be willing to accept the other guy’s foibles or your perception of the other guy’s foibles (and recognize the opposite of this). We each have our own approaches to how we do science, and it’s just different cultures.” --Thom Kauffman interview in ACS LiveWire, March 2005, 7.3. http://pubs.acs.org/4librarians/livewire/2006/7.3/profile.html
Some Questions from the ACS CINF 2004 Symposium • "Find all proteins related to protein A (i.e. within a given path length of A) in a protein interaction graph, and retrieve related assay results and compound structures.” • “Find all pathways where compound X inhibits or slows a reaction, and retrieve Gene Ontology classifications for all proteins involved in the reaction.”
Problems from ACS CINF 2004 • Commercial vs. public data • Batch mode data processing possible in biology, but primitive in chemistry • Primary HTS data has a very high noise factor • Data format standardization problem • Chemoinformatics and bioinformatics use completely different data formats and analysis tools • Chemical and protein sequence information has been largely analyzed separately
Solutions from ACS CINF 2004 • Linking biological and chemical information in computational approaches to predict biological activity, ADME profiles, and adverse drug reactions (ADR) • Energetics of binding for more accurate and sensitive chemical representation of DNA-protein interactions • A discovery informatics platform that facilitates archival, sharing, integration, and exploration of synthetic methods and biological activity data
Solutions from ACS CINF 2004 • Data pipelining approach makes it possible to apply bioinformatics and chemoinformatics data and analyses together. • Visualizations are the best way for people to understand data.
Solutions from ACS CINF 2004 • Cabinet (Chemical And Biological Information NETwork, formerly Fedora) servers include • Metabolic pathway network chart (Empath) • Protein-Ligand Association Network (Planet) • Enzyme Commission Codebook (EC Book) • Traditional Chinese Medicines (TCM) • World Drug Index (WDI), and others. • Built on the Daylight HTTP toolkit • http://www.metaphorics.com/products/cabinet.html
Overview of the Talk • Review of ACS CINF 2004 Papers • Review of Relevant Articles • Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links • Overview of Web Services • NIH-funded Projects Underway or Planned at Indiana University
What is Chemoinformatics? (Brown) • “…the essence of chemoinformatics is integration and focus rather than its components, which are independent disciplines.” • Supporting disciplines: • Chemical information • Computational chemistry • Chemometrics
Toolkits as Integrators (Brown) • Companies such as Daylight, Advanced Visual Systems, OpenEye, and SciTegic provide integration systems for: • Statistical methods • Text mining • Computational chemistry • Visualization
Genego’s MetaDrug Product • Toxicogenomics platform for the prediction of human drug metabolism and toxicity of novel compounds • Enables the visualization of pre-clinical and clinical high-throughput data in the context of the complete biological system • Integrates chemical, biological, and protein function data • http://www.genego.com/
BioWisdom • Examination of vast amounts of available information using its Sofia KnowledgeScan methodology • SRS data integration platform • http://www.biowisdom.com/
Lessons from Hip Hop (Salamone) • Mashup technique • Bring together disparate informatics, biological, chemical, and imaging information when conducting research • Example of an integration tool: iSpecies.org • A search for a species returns a page with NCBI genomics information, Yahoo images of the species, and articles culled from Google Scholar
iSpecies.org Search • For mus musculus
Chemogenomics and Chemoproteomics (Gagna) • Chemogenomics (def.)—The description of all potential drugs that can be used against all possible target sites, OR the actions of target-specific chemical ligands and how they are used to globally examine genes • Chemoproteomics (def.)—Uses chemistry to characterize protein structure and functions • They are “. . . a form of chemical biology brought up to date in the area of genome and proteome analysis.”
New Interdisciplinary Journals • ACS Chemical Biology (ACS) • ChemBioChem; A European Journal of Chemical Biology (Wiley/VCH) • Chemical Biology and Drug Design (Blackwell) • JBIC; Journal of Biological and Inorganic Chemistry (Springer) • Journal of Biochemical and Molecular Toxicology (Wiley) • Molecular Biosystems (RSC) • Nature Chemical Biology (Nature Publishing) • Organic & Biomolecular Chemistry (RSC)
Open Source Software (Geldenhuys) • Log P calculator from Interactive Analysis • http://www.logp.com • University of Utah’s Computational Science and Engineering Online • Can submit jobs for molecular mechanics, quantum chemical calculations, and biomolecular interfaces for viewing PDB files • http://www.cse-online.net • Virtual Computational Chemistry Laboratory • http://www.vcclab.org
The Blue Obelisk (Guha) • Several open chemistry and chemoinformatics projects that have pooled forces to enhance interoperability • Maintain: • Chemoinformatics Algorithms Dictionary • Data Repository for standardized data for chemical properties and other facts (e.g., mass) • http://www.blueobelisk.org/
BlueObelisk.org • Working collaboratively on projects such as: • Chemistry Development Kit (CDK) • JChemPaint • Jmol • JUMBO • NMRShiftDB • Octet • Open Babel • QSAR • World Wide Molecular Matrix (WWMM)
Barriers to the Use of Open Source Software • Unix command line • Problem: Lack of known standards and datasets of compounds for validation, e.g., in docking programs
Lessons from the Human Genome Project (Austin) • Keys to success in the HGP were: • Comprehensiveness • Commitment to open access to the sequence as a research tool without encumbrance • Proposed tools for a “genome functionation toolbox”: • Whole-genome transcriptome and proteome characterization • Development of small inhibitory RNAs (siRNAs) and knockout mice for every gene • Small molecules and the druggable genome
ChEBI, Chemical Entities of Biological Interest • Dictionary of molecular entities focused on small chemical compounds • Features an ontological classification, showing the relationships between molecular entities or classes of entities and their parents and/or children
The IUPAC International Chemical Identifier (InChI) • Open source, non-proprietary, public-domain identifier for chemicals • String of characters that uniquely represent a molecular substance • Independent of the way the chemical structure is drawn • Enables reliable structure recognition and easy linking of diverse data compilations • Accepts as input MOLfiles (or SDfiles) and CML files • Download the program to your computer at: • http://www.iupac.org/inchi/license.html
The Elsevier MDL/NIH Link via PubChem and DiscoveryGate • Cross-indexes PubChem to the Compound Index hosted on Elsevier MDL’s DiscoveryGate platform • MDL added 5 million structures from PubChem to their index, resulting in over 14 million unique chemical structures • Links go both ways • Can move from biological data in PubChem to bioactivity, chemical sourcing, synthetic methodology, and EHS data in DiscoveryGate sources
Elsevier MDL’s xPharm • Comprehensive set of records linking: • Agents (compounds) (2300) • Targets (600) • Disorders (450) • Principles that govern their interactions (180) • Answers questions such as: • What targets are associated with control of blood pressure? • What adverse effects are associated with monoamine oxidase inhibitors?
Text Datamining (Banville) • “In the pharmaceutical field, it is ideally the marriage of biological and chemical information that needs to be the ultimate focus of text data mining applications.” • Problems: • Lack of universal publication standards for identifying each unique chemical entity • Selective indexing policies of A&I services • Need to understand how chemical structures link to biological processes
Chemical Datamining Software • SureChem • http://surechem.reeltwo.com/ • CLiDE • Recognizes structures, reactions, and text • http://www.simbiosys.ca/clide/ • OSCAR • “OSCAR1” to check experimental data • http://www.ch.cam.ac.uk/magnus/checker.html • http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/ExperimentalDataChecker/ • CSR (Chemical Structure Reconstruction) • http://www.scai.fraunhofer.de/uploads/media/MZ-ERCIM05_04.pdf • MDL DocSearch—combines MDL’s Isentris platform and EMC’s Documentum
Overview of the Talk • Review of ACS CINF 2004 Papers • Review of Relevant Articles • Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links • Overview of Web Services • NIH-funded Projects Underway or Planned at Indiana University
Themes from SwissProt’s 20th Anniversary Conference, “In silico Analysis of Proteins” • Knowledgebases, databases and other information resources for proteins • Sequence searches and alignments • Protein sequence analysis • Protein structure prediction, analysis and visualization • Proteomics data analysis
Chemoinformatics Databases (Jónsdóttir) • Lists databases relevant to drug discovery and development, including: • General databases • DBs for screening compounds • DBs for medicinal agents • DBs with ADMET properties • DBs with physico-chemical properties • Curiously does not mention Chemical Abstracts
Databases with Protein and Ligand Information (Jónsdóttir) • Protein Data Bank • Target Registration Database • Relibase—uses structural info to analyze protein-ligand interactions; Relibase+ for protein-protein interaction searching • Cambridge Structural Database • KEGG LIGAND DB for enzyme reactions • http://www.genome.ad.jp/ligand
Other Databases with Protein and Ligand Information • SitesBase--a database of known ligand binding sites within the PDB • http://www.bioinformatics.leeds.ac.uk/sb/main.html • Binding MOAD • http://www.bindingmoad.org/ • sc-PDB (Kellenberger) • http://bioinfo-pharma.u-strasbg.fr:8080/scPDB/index.jsp
sc-PDB http://bioinfo-pharma.u-strasbg.fr:8080/scPDB/index.jsp
Other Databases with Protein-Protein Interaction Data (Jónsdóttir) • YPD, Yeast Proteome Database (for proteins from S. cerevisiae) • http://www.biobase.de/pages/index.php?id=139 • Human Protein Reference Database • http://www.hprd.org/ • BIND, Biomolecular Interaction Network Database (ceased as of 11/16/2005?) • http://www.bind.ca/Action
International Molecular Exchange (IMEx) Consortiumhttp://imex.sourceforge.net/ • BIND (http://www.blueprint.org) The Blueprint Initiative AsiaPte. Ltd, Singapore and The Blueprint Initiative North America,Toronto Canada • DIP (http://dip.doe-mbi.ucla.edu) UCLA-DOE Institute for Genomics & Proteomics • IntAct (http://www.ebi.ac.uk/intact), EMBL–European Bioinformatics Institute, Hinxton, UK; • MINT (http://mint.bio.uniroma2.it/mint/) University of Rome “Tor Vergata”, Rome Italy • MPact (http://mips.gsf.de/genre/proj/mpact), MIPS / Institute for Bioinformatics, Munich, Germany.
Protein Sites from IU I533 Students and others • LigandDepot—integrated source for small molecules • http://ligand-depot.rutgers.edu/index.html • PSIPRED Protein Structure Prediction Server • http://bioinf.cs.ucl.ac.uk/psipred/ • DSSP--a database of secondary structure assignments (and much more) for all protein entries in the PDB • http://swift.cmbi.ru.nl/gv/dssp/ • Dr. Predrag Radivojac’s I690 class on Structural Bioinformatics • http://www.informatics.indiana.edu/predrag/2006springi690/2006springi690.htm
Protein Secondary Structure Prediction • Methods • Neural Network • Rule Based • Other Machine Learning • Homology Based
Protein Secondary Structure Prediction Software • PredictProtein • http://www.predictprotein.org/ Chou-Fasman http://fasta.bioch.virginia.edu/fasta_www/chofas.htm • NN Predict • http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html