390 likes | 509 Views
Course Review. Name one important thing that you learnt from this course that you feel will be important to your research career Name one aspect you were hoping to learn that you did not. Some Thoughts on the Future of Biological Data with Emphasis on Structural Bioinformatics.
E N D
Course Review • Name one important thing that you learnt from this course that you feel will be important to your research career • Name one aspect you were hoping to learn that you did not Pharm 201 Lecture 19, 2011
Some Thoughts on the Future of Biological Data with Emphasis on Structural Bioinformatics Philip E. Bourne Dept. of Pharmacology University of California San Diego pbourne@ucsd.edu Pharm 201 Lecture 19, 2011
Agenda • What is structural genomics and what is its impact? • Unsolved problems in structural bioinformatics • New challenges related to structural bioinformatics • The bigger picture • The final Pharm 201 Lecture 19, 2011
Structural Genomics:A Broad Working Definition Structural genomics is the process of high-throughput determination of the 3-dimensional structures of biological macromolecules Pharm 201 Lecture 19, 2011
SG - What is the Goal? • The goal of the human genome project was clear cut.. The goal of structural genomics is not so clear cut • Phase I.. • Provision of enough structural templates to facilitate homology modeling of most proteins • Structures of all proteins in a complete proteome • Structural elucidation of a complete biological pathway • Structural elucidation of a complete disease Pharm 201 Lecture 19, 2011
Example Goals (Phase I) “The hyperthermophilic bacterium Thermotoga maritima has been the target of choice for pipeline development and genome-wide fold coverage.“ 1257 “The SGPP consortium will determine and analyze the three-dimensional structures of a large number of proteins from major global pathogenic protozoa, Leishmania major, Trypanosoma brucei, Trypanosoma cruzi and Plasmodium falciparum. “ 70 StructuralGenomicsof PathogenicProtozoa “It is aimed at determining structures of proteins and protein complexes directly relevant to human health and diseases. “ 117 Pharm 201 Lecture 19, 2011
Growth in the Number of New Topologies per Year According To CATH SG Had Very Little Direct Impact on New Folds and Hence Homology Modeling New Folds Total Folds http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=fold-cath Pharm 201 Lecture 19, 2011 from Nov., 2011
SG - What is the Goal? – Phase II Pharm 201 Lecture 19, 2011
SG – Phase III – PSI-Biology • The third phase of the PSI is called PSI:Biology and is intended to reflect the emphasis on the biological relevance of the work http://en.wikipedia.org/wiki/Protein_Structure_Initiative Pharm 201 Lecture 19, 2011
Implications of Phase III SG • Less single domains more complex structures • More p-p complexes • More protein-ligand complexes • More membrane proteins • Better models • More hybrid structures • More molecular machines Pharm 201 Lecture 19, 2011
SG Accounts for 14% of Structures Pharm 201 Lecture 19, 2011 From RCSB PDB Nov 2011
Agenda • What is structural genomics and what is its impact? • Unsolved problems in structural bioinformatics • New challenges related to structural bioinformatics • The bigger picture • The final Pharm 201 Lecture 19, 2011
Crude Estimators of What We Know and How We Might Get Better - Basics • Data accessibility (60%) • Domain definitions (80%) • Structure comparison (80%) • Disorder predictors (70%) • Structure classification (80%) • Need more computer accessible information on function etc. • Need fresh approaches • Need a better understanding of the role of protein disorder period • More quantitative approaches Pharm 201 Lecture 19, 2011
Crude Estimators of What We Know and How We Might Get Better • Basic knowledge of macromolecular structure (50%) • PPI’s Protein-ligand interactions ligand view (30%) • Integrated view of structure as part of a biological continuum of data and associated knowledge (30%) • Structure prediction from sequence (40%) • Missing temporal view, alternative views • Missing robust rules for molecular recognition • Need better quantification • Need more structures Pharm 201 Lecture 19, 2011
Crude Estimators of What We Know and How We Might Get Better • Inferring function from structure (40%) • Macromolecular assemblies (40%) • Docking (30%) • Rational drug discovery (10%) • Evolution (10%) • A combination of improvements • Hybrid methods • Better scoring, flexible docking, allostery • Polypharmacology, network pharmacology • Accurate proteome coverage Pharm 201 Lecture 19, 2011
Example 0f What Could be Done in Evolution: Structural Domains and the Tree of Life http://itol.embl.de/ Natalie Dawson Unpublished Pharm 201 Lecture 19, 2011
Example 0f What Could be Done in Evolution: Structural Domains and the Tree of Life Pharm 201 Lecture 19, 2011
Example: Structural Mapping and Subsequent Insights from All Biochemical Pathways Pharm 201 Lecture 19, 2011
Example: Better Understanding of Drug Receptor Interactions • Tykerb – Breast cancer • Gleevac – Leukemia, GI cancers • Nexavar – Kidney and liver cancer • Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive Collins and Workman 2006 Nature Chemical Biology 2 689-700
Agenda • What is structural genomics and what is its impact? • Unsolved problems in structural bioinformatics • New challenges related to structural bioinformatics • The bigger picture • The final Pharm 201 Lecture 19, 2011
New Challenges • Effective use of structural information in systems biology – eg structural ppis • Bridging the biological scales in an easily understood way • New ways of visualizing and hence thinking about proteins • Protein design/engineering Pharm 201 Lecture 19, 2011
Agenda • What is structural genomics and what is its impact? • Unsolved problems in structural bioinformatics • New challenges related to structural bioinformatics • The bigger picture • The final Pharm 201 Lecture 19, 2011
The Bigger Picture - Numbers On the Future of Genomic Data Science 11 February 2011: vol. 331 no. 6018 728-729 Pharm 201 Lecture 19, 2011
The Bigger Picture – AccuracyFunctional Misannotation PLoS Comput Biol 2009 5(12): e1000605. Pharm 201 Lecture 19, 2011
The Bigger Picture – Data Culture • Data are not available • Data are undervalued • Data are stovepiped • This is a long tail of data which are lost • Institutional repositories are roach motels • Data repositories will go like journals Pharm 201 Lecture 19, 2011
Beyond Data What is Wrong Today? Pharm 201 Lecture 19, 2011
What is Wrong Today? • Formal science communication: • Occurs too slowly • Reaches too few people • Costs too much • Ignores the data • Is very hard to reproduce • Is stuck in the era of the printing press – we need to move Beyond the PDF and use the power of the medium https://sites.google.com/site/beyondthepdf/ http://www.force11.org
The Research Enterprise Methods Data Literature
The Current Reality http://www.flickr.com/photos/51282757@N05/5585299226/lightbox/
Data Knowledge Database Knowledgebase Wikis Datapacks Journals Data Only Annotation Data + Annotation Data + Some Annotation Data + Some Annotation + Some Integration PLoS iStructure Pharm 201 Lecture 19, 2011
My Dream User reads a paper (one view of the info) Clicks on a figure which can be analyzed Clicking the figure gives a composite database + journal view This takes you to yet more papers or databases The Knowledge and Data Cycle 0. Full text of PLoS papers stored in a database 4. The composite view has links to pertinent blocks of literature text and back to the PDB 4. 1. 3. A composite view of journal and database content results 1. A link brings up figures from the paper 3. 2. 2. Clicking the paper figure retrieves data from the PDB which is analyzed
It Goes Beyond Data Methods Data Literature • Its hard and embarrassing to reproduce your own work • We have a working prototype using Wings • I can feel the potential productivity gains • My students are more doubtful • Its been a lot of fun and will enable us to improve our processes regardless of the workflow system itself
Yes The Workflow is Real Methods Data Literature
Problems with Publishing Workflows Methods Data Literature • Workflows are not linear • Workflow : paper is not 1:1 • Confidentiality • Peer review • Infrastructure • Community acceptance • Reward system • No publisher seems willing to touch them
Agenda • What is structural genomics and what is its impact? • Unsolved problems in structural bioinformatics • New challenges related to structural bioinformatics • The bigger picture • The final Pharm 201 Lecture 19, 2011
The Final • Prepare a mini-grant research proposal with the following ingredients: • Background and Significance • Preliminary Results • Proposed Research and Methods • Expected Outcomes • The theme is any aspect of the course where you would like to contribute new research ideas and potential outcomes Pharm 201 Lecture 19, 2011
The Final • Points (50) will be awarded for: • B&S – literature coverage, justification of the originality and potential importance of the contribution (20) • Pre Res – anything you can actually accomplish to support the proposal egpseudocode, computations using existing tools, etc. (15) • Proposed Research – the credibility and rigor of what you propose (10) • Expected Outcomes (5) • There is no length requirement but I would anticipate ~10, 12pt single space pages to do the topic justice • This should not relate to one of your previous assignments • Feel free to email me to discuss ideas before starting Pharm 201 Lecture 19, 2011