340 likes | 603 Views
Modeling Protein Function. MED260 Philip E. Bourne Department of Pharmacology, UCSD pbourne@ucsd.edu http://www.sdsc.edu/pb Slides on-line at: http://www.sdsc.edu/pb/edu/med260/med260.ppt. Agenda. Why model protein function? Where does it fit as a technique in modern medical research?
E N D
Modeling Protein Function MED260 Philip E. Bourne Department of Pharmacology, UCSD pbourne@ucsd.edu http://www.sdsc.edu/pb Slides on-line at: http://www.sdsc.edu/pb/edu/med260/med260.ppt MED260 Modeling Protein Function - October 11, 2006
Agenda • Why model protein function? • Where does it fit as a technique in modern medical research? • The data deluge as a motivator • The extent of what can be modeled • Ontologies – establishing order from chaos • Examples of what can be learnt • Accuracy – a word of caution MED260 Modeling Protein Function - October 11, 2006
Why Model Protein Function • The rate of discovery of new proteins far outweighs our ability to functionally characterize them • Functional discovery of new proteins has implications in: • Drug discovery • Biomarker identification • Understanding of biological processes • Identification of disease states and treatment regimes MED260 Modeling Protein Function - October 11, 2006 Why model protein function?
SCIENTIFIC RESEARCH & DISCOVERY Anatomy Migratory Sensors Organisms Physiology Ventricular Modeling Organs Cell Biology Electron Microscopy Cells Macromolecules Biopolymers X-ray Crystallography Proteomics Genomics Medicinal Chemistry Protein Docking Atoms & Molecules EXAMPLE UNITS REPRESENTATIVE DISCIPLINE REPRESENTATIVE TECHNOLOGY MRI Heart Neuron Structure Sequence Protease Inhibitor Where does it fit as a technique in modern medical research?
SCIENTIFIC RESEARCH & DISCOVERY Anatomy Migratory Sensors Organisms Physiology Ventricular Modeling Organs Cell Biology Electron Microscopy Cells Macromolecules Biopolymers X-ray Crystallography Proteomics Genomics Medicinal Chemistry Protein Docking Atoms & Molecules EXAMPLE UNITS REPRESENTATIVE DISCIPLINE REPRESENTATIVE TECHNOLOGY MRI Heart Translational Medicine Neuron Structure Sequence Protease Inhibitor Where does it fit as a technique in modern medical research?
The Ability to Model Protein Function Influences and can be Influenced by Any Level of Biological Complexity - Examples • Genome - rapid increase in sequenced genomes provides new raw material • Proteome – large increase in the number of 3D structures highlights new functions • Interactome – identification of a binding partner points to a new function • Metabolome – isolation of a protein within a metabolic pathway • Cell - localization points to function • Organ – gene expression in heart tissue points to function • Organism – different physiology observed in species can be related to protein functions MED260 Modeling Protein Function - October 11, 2006 Where does it fit as a technique in modern medical research?
SCIENTIFIC RESEARCH & DISCOVERY Anatomy Migratory Sensors Organisms Ventricular Modeling Physiology Organs Cell Biology Electron Microscopy Cells Macromolecules Biopolymers X-ray Crystallography Proteomics Genomics Medicinal Chemistry Protein Docking Atoms & Molecules EXAMPLE UNITS REPRESENTATIVE DISCIPLINE REPRESENTATIVE TECHNOLOGY MRI Heart Neuron We will focus here Structure Sequence Protease Inhibitor MED260 Modeling Protein Function - October 11, 2006
At All Levels We Are Being Driven By Data Biological Experiment Data Information KnowledgeDiscovery Collect Characterize Compare Model Infer Complexity Technology Data Higher-life 1 10 100 1000 100000 Computing Power Organ Brain Mapping Cardiac Modeling Virtual Communities Cellular Model Metaboloic Pathway of E.coli Sub-cellular 102 106 1 Neuronal Modeling # People/Web Site Ribosome Assembly Virus Structure Genetic Circuits Structure Human Genome Project Yeast Genome E.Coli Genome C.Elegans Genome 1 Small Genome/Mo. Sequencing Technology ESTs Gene Chips Human Genome Sequence 90 95 00 05 Year The Data Deluge
New type of genomics New data (and lots of it) and new types of data 17M new (predicted proteins!) 4-5 x growth in just few months and much more coming New challenges and exacerbation of old challenges Metagenomics A First Look MED260 Modeling Protein Function - October 11, 2006 The Data Deluge
More then 99.5% of DNA in very environment studied represent unknown organisms Culturable organisms are exceptions, not the rule Most genes represent distant homologs of known genes, but there are thousands of new families Everything we touch turns out to be a gold mine Environments studied: Water (ocean, lakes) Soil Human body (gut, oral cavity, human microbiome) Metagenomics: First Results MED260 Modeling Protein Function - October 11, 2006 The Data Deluge
Metagenomics New DiscoveriesEnvironmental (red) vs. Currently Known PTPases (blue) 1 2 3 4 Higher eukaryotes MED260 Modeling Protein Function - October 11, 2006 The Data Deluge
The Good News and the Bad News • Good news • Data pointing towards function are growing at near exponential rates • IT can handle it on a per dollar basis • Bad news • Data are growing at near exponential rates • Quality is highly variable • Accurate functional annotation is sparse MED260 Modeling Protein Function - October 11, 2006 The Data Deluge
Genomes - 2004 • We all know about the human – what is not so well known is: • 191 completed microbial genomes • 44 archaea • 727 bacteria • 785 eukaryotes (complete or in progress) • Viroids …. MED260 Modeling Protein Function - October 11, 2006 The Data Deluge
Proteome • We are reasonably good at finding proteins in genomes with intergenic regions but not perfect – eg alternative initiation codons • Regulatory elements provide a different set of challenges • We are not so good at assigning functions to those proteins • Moreover the devil is in the details MED260 Modeling Protein Function - October 11, 2006 The Extent of What Can Be Modeled
Estimated Functional Roles (by % of Proteins) of the Proteome in a Complex Organism MED260 Modeling Protein Function - October 11, 2006 The Extent of What Can Be Modeled
Functional Nomenclature Needs to be Consistent for Orderly Progress – Enter EC and GO • EC classifies all enzymes - http://www.chem.qmul.ac.uk/iubmb/enzyme/ • Gene Ontology Consortium characterizes by molecular function, biochemiscal process and cellular location http://www.geneontology.org/ Ontologies – establishing order from chaos MED260 Modeling Protein Function - October 11, 2006
Functional Coverage of the Human Genome 40% covered http://function.rcsb.org:8080/pdb/function_distribution/index.html The Extent of What Can Be Modeled
Step 1. Learn What You Can from the Protein Sequence • Find it • Pay attention to the quality of the functional annotation – errors are transitive • Understand its 1-D structure – domain organization, {signatures, fingerprints} MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
Step 2. Is there a 3D Structure? If so What Can You Learn from That? • Find it • Understand it • Characterize it • Understand its function(s) – these follow a power law at the fold level – some folds are promiscuous (many functions) others are solitary or of unknown function MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
(a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA (e) antibodies (f) viruses (g) actin (h) the nucleosome (i) myosin (j) ribosome Courtesy of David Goodsell, TSRI
First Why Bother with Structure?An Example: Protein Kinase A This “molecular scene” for cAMP dependant protein kinase depicts years of collective knowledge. Beyond basics, only the atomic coordinates are captured by the PDB. Functional annotation requires the literature MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
Two domains with associated functions ATP binding & substrate binding Through conserved residues and their spatial location details of the ATP and substrate binding and mechanism of the phospho transfer reaction So is structure the answer to functional modeling? What Did that Picture Tell Us? MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
Question: So is structure the answer to functional modeling? Answer: Partly - The number of unique protein sequences still outnumbers the number of unique structures by 100:1 Enter Structural Genomics Enter Structure Prediction MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
The Structural Genomics Pipeline (X-ray Crystallography) Basic Steps • Crystallomics • Isolation, • Expression, • Purification, • Crystallization Target Selection Data Collection Structure Solution Structure Refinement Functional Annotation Publish MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
Structural Genomics Will Give Us.. • Good news • More structures (definitely) • New folds (some but not as anticipated) • New understanding of specific diseases and pathways (maybe) • Representatives from each major protein family (maybe) • Bad news • Many new structures that are functionally unclassified (definitely) MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
What About Structure Prediction? • Current rule We will be able to predict a structure when we know all the structures MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
Why is Structure Prediction so Hard? Random 1000 structurally similar PDB polypeptide chains with z > 4.5 (% sequence identity vs alignment length) Twilight Zone Midnight Zone MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
Approaches to Structure Prediction • Homology modeling • Threading (aka fold recognition) • Ab initio • How well do we do? – see CASP • Consensus servers • Eva - http://cubic.bioc.columbia.edu/eva/ • LiveBench - http://bioinfo.pl/meta/ MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
Step 3. What Can Be Got from Structure When You Have it? From Structural Bioinformatics Ed Bourne and Weissig p394 Wiley 2002 MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
Specific Example • Mj0577 – putative ATP molecular switch Mj0577 is an open reading frame (ORF) of previously unknown function from Methanococcus jannaschii. Its structure was determined at 1.7Å (Figure 7a) (Zarembinski et al, 1998). The structure contains a bound ATP molecule, picked up from the E. coli host. The presence of bound ATP led to the proposition that Mj0577 is either an ATPase, or an ATP-binding molecular switch. Further experimental work showed that Mj0577 cannot hydrolyse ATP by itself, and can only do so in the presence of M. jannaschii crude cell extract. Therefore it is more likely to act as a molecular switch, in a process analogous to ras-GTP hydrolysis in the presence of GTPase activating protein. From Structural Bioinformatics Ed Bourne and Weissig p402 Wiley 2002 MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
Step 4. Proteins Do Not Function in Isolation But are Part of Complex Interaction Networks http://www.genome.jp/kegg/ MED260 Modeling Protein Function - October 11, 2006 Examples of what can be learnt
Accuracy - A Word of Caution • Errors are transitive • Proteins A and B are observed to have similar functions through sequence homology • Proteins B and C are observed to have similar functions through sequence homology • Is protein A related to protein C? • Up to 30% of current annotation may be wrong MED260 Modeling Protein Function - October 11, 2006 Accuracy - A Word of Caution
Questions? MED260 Modeling Protein Function - October 11, 2006
Demo of Steps 1-4 • Step 1. Learn What You Can from the Protein Sequence • Step 2. Is there a 3D Structure? If So, What Can You Learn from That? • Step 3. What Can Be Got from Structure When You Have it? • Step 4. Proteins Do Not Function in Isolation But are Part of Complex Interaction Networks MED260 Modeling Protein Function - October 11, 2006