1 / 100

Bridging Bioinformatics and Chem(o)informatics

Bridging Bioinformatics and Chem(o)informatics. Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith Saba (SLIS MLS Student). Provocative Thought.

hester
Download Presentation

Bridging Bioinformatics and Chem(o)informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu Yan He (SLIS MLS Student) Meredith Saba (SLIS MLS Student)

  2. Provocative Thought “While much bioscience is published with the knowledge that machines will be expected to understand at least part of it, almost all chemistry is published purely for humans to read.” • Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3201.

  3. Overview of the Talk • Review of ACS CINF 2004 Papers • Review of Relevant Articles • Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links • Overview of Web Services • NIH-funded Projects Underway or Planned at Indiana University

  4. “The Bigger Picture — Linking Bioinformatics to Cheminformatics” • American Chemical Society Division of Chemical Information (CINF) Symposium, Anaheim, Spring 2004 • All-day session with 16 papers • http://www.acscinf.org/new/docs/meetings/227nm/227cinfabstracts.htm

  5. Problems from ACS CINF 2004 • Both technical and people factors hinder knowledge exchange between biology and chemistry. (Lipinski) • People Problems per Chris Lipinski • Meta data capture is complicated by people issues, particularly those between chemists and biologists. • Discipline-based disconnects occur distressingly often and are frequently overlooked as a cause of lost productivity.

  6. Interdisciplinary Collaborations: Biology and Chemistry • [What’s] “... important for these collaborations is, not only do you have to accept the other guy’s paradigm or at least live with it; you have to be willing to accept the other guy’s foibles or your perception of the other guy’s foibles (and recognize the opposite of this). We each have our own approaches to how we do science, and it’s just different cultures.” --Thom Kauffman interview in ACS LiveWire, March 2005, 7.3. http://pubs.acs.org/4librarians/livewire/2006/7.3/profile.html

  7. Some Questions from the ACS CINF 2004 Symposium • "Find all proteins related to protein A (i.e. within a given path length of A) in a protein interaction graph, and retrieve related assay results and compound structures.” • “Find all pathways where compound X inhibits or slows a reaction, and retrieve Gene Ontology classifications for all proteins involved in the reaction.”

  8. Problems from ACS CINF 2004 • Commercial vs. public data • Batch mode data processing possible in biology, but primitive in chemistry • Primary HTS data has a very high noise factor • Data format standardization problem • Chemoinformatics and bioinformatics use completely different data formats and analysis tools • Chemical and protein sequence information has been largely analyzed separately

  9. Solutions from ACS CINF 2004 • Linking biological and chemical information in computational approaches to predict biological activity, ADME profiles, and adverse drug reactions (ADR) • Energetics of binding for more accurate and sensitive chemical representation of DNA-protein interactions • A discovery informatics platform that facilitates archival, sharing, integration, and exploration of synthetic methods and biological activity data

  10. Solutions from ACS CINF 2004 • Data pipelining approach makes it possible to apply bioinformatics and chemoinformatics data and analyses together. • Visualizations are the best way for people to understand data.

  11. Solutions from ACS CINF 2004 • Cabinet (Chemical And Biological Information NETwork, formerly Fedora) servers include • Metabolic pathway network chart (Empath) • Protein-Ligand Association Network (Planet) • Enzyme Commission Codebook (EC Book) • Traditional Chinese Medicines (TCM) • World Drug Index (WDI), and others. • Built on the Daylight HTTP toolkit • http://www.metaphorics.com/products/cabinet.html

  12. Overview of the Talk • Review of ACS CINF 2004 Papers • Review of Relevant Articles • Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links • Overview of Web Services • NIH-funded Projects Underway or Planned at Indiana University

  13. What is Chemoinformatics? (Brown) • “…the essence of chemoinformatics is integration and focus rather than its components, which are independent disciplines.” • Supporting disciplines: • Chemical information • Computational chemistry • Chemometrics

  14. Chemoinformatics and Disease

  15. Toolkits as Integrators (Brown) • Companies such as Daylight, Advanced Visual Systems, OpenEye, and SciTegic provide integration systems for: • Statistical methods • Text mining • Computational chemistry • Visualization

  16. Genego’s MetaDrug Product • Toxicogenomics platform for the prediction of human drug metabolism and toxicity of novel compounds • Enables the visualization of pre-clinical and clinical high-throughput data in the context of the complete biological system • Integrates chemical, biological, and protein function data • http://www.genego.com/

  17. BioWisdom • Examination of vast amounts of available information using its Sofia KnowledgeScan methodology • SRS data integration platform • http://www.biowisdom.com/

  18. Lessons from Hip Hop (Salamone) • Mashup technique • Bring together disparate informatics, biological, chemical, and imaging information when conducting research • Example of an integration tool: iSpecies.org • A search for a species returns a page with NCBI genomics information, Yahoo images of the species, and articles culled from Google Scholar

  19. iSpecies.org Search • For mus musculus

  20. Chemogenomics and Chemoproteomics (Gagna) • Chemogenomics (def.)—The description of all potential drugs that can be used against all possible target sites, OR the actions of target-specific chemical ligands and how they are used to globally examine genes • Chemoproteomics (def.)—Uses chemistry to characterize protein structure and functions • They are “. . . a form of chemical biology brought up to date in the area of genome and proteome analysis.”

  21. New Interdisciplinary Journals • ACS Chemical Biology (ACS) • ChemBioChem; A European Journal of Chemical Biology (Wiley/VCH) • Chemical Biology and Drug Design (Blackwell) • JBIC; Journal of Biological and Inorganic Chemistry (Springer) • Journal of Biochemical and Molecular Toxicology (Wiley) • Molecular Biosystems (RSC) • Nature Chemical Biology (Nature Publishing) • Organic & Biomolecular Chemistry (RSC)

  22. Open Source Software (Geldenhuys) • Log P calculator from Interactive Analysis • http://www.logp.com • University of Utah’s Computational Science and Engineering Online • Can submit jobs for molecular mechanics, quantum chemical calculations, and biomolecular interfaces for viewing PDB files • http://www.cse-online.net • Virtual Computational Chemistry Laboratory • http://www.vcclab.org

  23. The Blue Obelisk (Guha) • Several open chemistry and chemoinformatics projects that have pooled forces to enhance interoperability • Maintain: • Chemoinformatics Algorithms Dictionary • Data Repository for standardized data for chemical properties and other facts (e.g., mass) • http://www.blueobelisk.org/

  24. BlueObelisk.org • Working collaboratively on projects such as: • Chemistry Development Kit (CDK) • JChemPaint • Jmol • JUMBO • NMRShiftDB • Octet • Open Babel • QSAR • World Wide Molecular Matrix (WWMM)

  25. Barriers to the Use of Open Source Software • Unix command line • Problem: Lack of known standards and datasets of compounds for validation, e.g., in docking programs

  26. Lessons from the Human Genome Project (Austin) • Keys to success in the HGP were: • Comprehensiveness • Commitment to open access to the sequence as a research tool without encumbrance • Proposed tools for a “genome functionation toolbox”: • Whole-genome transcriptome and proteome characterization • Development of small inhibitory RNAs (siRNAs) and knockout mice for every gene • Small molecules and the druggable genome

  27. ChemDB http://cdb.ics.uci.edu/CHEM/Web/

  28. ChEBI, Chemical Entities of Biological Interest • Dictionary of molecular entities focused on small chemical compounds • Features an ontological classification, showing the relationships between molecular entities or classes of entities and their parents and/or children

  29. Vioxx Entry in ChEBI

  30. The IUPAC International Chemical Identifier (InChI) • Open source, non-proprietary, public-domain identifier for chemicals • String of characters that uniquely represent a molecular substance • Independent of the way the chemical structure is drawn • Enables reliable structure recognition and easy linking of diverse data compilations • Accepts as input MOLfiles (or SDfiles) and CML files • Download the program to your computer at: • http://www.iupac.org/inchi/license.html

  31. Generation of InChI for Vioxx with wInChI

  32. Vioxx Entry in PubChem Compounds Found with InChI

  33. Vioxx Bioassay Data in PubChem

  34. Vioxx PubChem Link to External Sources of Information

  35. The Elsevier MDL/NIH Link via PubChem and DiscoveryGate • Cross-indexes PubChem to the Compound Index hosted on Elsevier MDL’s DiscoveryGate platform • MDL added 5 million structures from PubChem to their index, resulting in over 14 million unique chemical structures • Links go both ways • Can move from biological data in PubChem to bioactivity, chemical sourcing, synthetic methodology, and EHS data in DiscoveryGate sources

  36. Elsevier MDL’s xPharm • Comprehensive set of records linking: • Agents (compounds) (2300) • Targets (600) • Disorders (450) • Principles that govern their interactions (180) • Answers questions such as: • What targets are associated with control of blood pressure? • What adverse effects are associated with monoamine oxidase inhibitors?

  37. Text Datamining (Banville) • “In the pharmaceutical field, it is ideally the marriage of biological and chemical information that needs to be the ultimate focus of text data mining applications.” • Problems: • Lack of universal publication standards for identifying each unique chemical entity • Selective indexing policies of A&I services • Need to understand how chemical structures link to biological processes

  38. Chemical Datamining Software • SureChem • http://surechem.reeltwo.com/ • CLiDE • Recognizes structures, reactions, and text • http://www.simbiosys.ca/clide/ • OSCAR • “OSCAR1” to check experimental data • http://www.ch.cam.ac.uk/magnus/checker.html • http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/ExperimentalDataChecker/ • CSR (Chemical Structure Reconstruction) • http://www.scai.fraunhofer.de/uploads/media/MZ-ERCIM05_04.pdf • MDL DocSearch—combines MDL’s Isentris platform and EMC’s Documentum

  39. Overview of the Talk • Review of ACS CINF 2004 Papers • Review of Relevant Articles • Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links • Overview of Web Services • NIH-funded Projects Underway or Planned at Indiana University

  40. Themes from SwissProt’s 20th Anniversary Conference, “In silico Analysis of Proteins” • Knowledgebases, databases and other information resources for proteins • Sequence searches and alignments • Protein sequence analysis • Protein structure prediction, analysis and visualization • Proteomics data analysis

  41. Chemoinformatics Databases (Jónsdóttir) • Lists databases relevant to drug discovery and development, including: • General databases • DBs for screening compounds • DBs for medicinal agents • DBs with ADMET properties • DBs with physico-chemical properties • Curiously does not mention Chemical Abstracts

  42. Databases with Protein and Ligand Information (Jónsdóttir) • Protein Data Bank • Target Registration Database • Relibase—uses structural info to analyze protein-ligand interactions; Relibase+ for protein-protein interaction searching • Cambridge Structural Database • KEGG LIGAND DB for enzyme reactions • http://www.genome.ad.jp/ligand

  43. Other Databases with Protein and Ligand Information • SitesBase--a database of known ligand binding sites within the PDB • http://www.bioinformatics.leeds.ac.uk/sb/main.html • Binding MOAD • http://www.bindingmoad.org/ • sc-PDB (Kellenberger) • http://bioinfo-pharma.u-strasbg.fr:8080/scPDB/index.jsp

  44. sc-PDB http://bioinfo-pharma.u-strasbg.fr:8080/scPDB/index.jsp

  45. Isatin Search on sc-PDB

  46. Other Databases with Protein-Protein Interaction Data (Jónsdóttir) • YPD, Yeast Proteome Database (for proteins from S. cerevisiae) • http://www.biobase.de/pages/index.php?id=139 • Human Protein Reference Database • http://www.hprd.org/ • BIND, Biomolecular Interaction Network Database (ceased as of 11/16/2005?) • http://www.bind.ca/Action

  47. International Molecular Exchange (IMEx) Consortiumhttp://imex.sourceforge.net/ • BIND (http://www.blueprint.org) The Blueprint Initiative AsiaPte. Ltd, Singapore and The Blueprint Initiative North America,Toronto Canada • DIP (http://dip.doe-mbi.ucla.edu) UCLA-DOE Institute for Genomics & Proteomics • IntAct (http://www.ebi.ac.uk/intact), EMBL–European Bioinformatics Institute, Hinxton, UK; • MINT (http://mint.bio.uniroma2.it/mint/) University of Rome “Tor Vergata”, Rome Italy • MPact (http://mips.gsf.de/genre/proj/mpact), MIPS / Institute for Bioinformatics, Munich, Germany.

  48. Protein Sites from IU I533 Students and others • LigandDepot—integrated source for small molecules • http://ligand-depot.rutgers.edu/index.html • PSIPRED Protein Structure Prediction Server • http://bioinf.cs.ucl.ac.uk/psipred/ • DSSP--a database of secondary structure assignments (and much more) for all protein entries in the PDB • http://swift.cmbi.ru.nl/gv/dssp/ • Dr. Predrag Radivojac’s I690 class on Structural Bioinformatics • http://www.informatics.indiana.edu/predrag/2006springi690/2006springi690.htm

  49. Protein Secondary Structure Prediction • Methods • Neural Network • Rule Based • Other Machine Learning • Homology Based

  50. Protein Secondary Structure Prediction Software • PredictProtein • http://www.predictprotein.org/ Chou-Fasman http://fasta.bioch.virginia.edu/fasta_www/chofas.htm • NN Predict • http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html

More Related