1 / 46

Protein Evolution, Co-evolution and Interaction Networks (Day 2)

Protein Evolution, Co-evolution and Interaction Networks (Day 2). Matteo Pellegrini Rosetta Inpharmatics. Identifying The Components of Cellular Pathways and Protein Complexes using Co-evolution. Proteins are Components of Molecular Machines.

ronni
Download Presentation

Protein Evolution, Co-evolution and Interaction Networks (Day 2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Evolution, Co-evolution and Interaction Networks(Day 2) Matteo Pellegrini Rosetta Inpharmatics

  2. Identifying The Components of Cellular Pathways and Protein Complexes using Co-evolution

  3. Proteins are Components of Molecular Machines Hartwell LH, Hopfield JJ, Leibler S, Murray AW. From molecular to modular cell biology. Nature. 1999 Dec 2;402(6761 Suppl):C47-52.

  4. Techniques to Study Protein Interactions Protein Interactions

  5. Bacterial Diversity • 150 fully sequenced genomes in Genbank • 30,000 species represented in Genbank • Sea may support 2,000,000* • Soil may support 4,000,000* *T.P. Curtis, W.T. Sloan, and J.W. Scannell. 2002. Estimating prokaryotic diversity and its limits Proc Natl Acad Sci USA 99: 10494-10499.

  6. The Study of the Co-Evolution of Non-Homologous Proteins • Because selection generally acts to maintain or delete entire complexes and pathways, pairs of proteins that are part of these will appear to co-evolve across bacteria • By studying the co-evolution of non-homologous proteins across these bacteria we attempt to reconstruct the components of complexes and pathways

  7. Methods to Infer Co-evolution Method Basis Phylogenetic Profile Pairs of genes that are always present or absent together in genomes Rosetta Stone Pairs of proteins that are fused in some organism Gene Neighbor Pairs of genes that are coded nearby in multiple organisms Gene Cluster Gene proximity within genome

  8. Phylogenetic Profile Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO, Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A. 96(8):4285-8,. 1999

  9. Flagellar Proteins Phylogenetic Profiles

  10. Hypergeometric Function m n k N

  11. Gene Neighbor Method Pellegrini M, Thompson MJ, Fierro J, Bowers P, A Computational Method to Assign Microbial Genes to Pathways. Journal of Cellular Biochemistry Suppl 37:106-9, 2001

  12. Linking Dihydrofolate reductase and Thymidilate synthase

  13. Gene Neighbor Probability

  14. Rosetta Stone Method Identifies Protein Fusions Monomeric proteins that are found fused in another organism are likely to be functionally related and physically interacting. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D, Detecting protein function and protein-protein interactions from genome sequences. Science 285(5428):751-3, 1999

  15. Rosetta Stone Probability Protein i Protein j K Rosetta Stone fusion proteins has m homologs has n homologs

  16. Gene Cluster genomic DNA genes

  17. Tryptophan Operon P<0.01 P=0.09 P<0.01 P<0.01 P=0.53 P=0.67 P=0.91 yciG trpA trpB trpC trpD trpE trpL yciV Here, a p-value threshold of 0.1 captures all but one of the genes for this operon.

  18. Combining Inferences of Co-evolution from Previous Methods We use a Bayesian approach to combine the probabilities from the previous four methods to arrive at a single probability that two proteins co-evolve: Where positive pairs are proteins with common pathway annotation and negative pairs are proteins with different annotation Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science. 2003 Oct 17;302(5644):449-53.

  19. True and False Interactions are derived from Pathway Classification Schemes Information Storage and Processing Translation, ribosomal structure and biogenesis Transcription DNA replication, recombination and repair Cellular processes Cell division and chromosome partitioning Posttranslational modification, protein turnover, chaperones Cell envelope biogenesis, outer membrane Cell motility and secretion Inorganic ion transport and metabolism Signal transduction mechanisms Metabolism Energy production and conversion Carbohydrate transport and metabolism Amino acid transport and metabolism Nucleotide transport and metabolism Coenzyme metabolism Lipid metabolism Secondary metabolites biosynthesis, transport and catabolism Pathway categorization scheme:

  20. Networks of Co-evolving Proteins We can generate networks of co-evolution by selecting only pairs of proteins whose probability of co-evolution is above a threshold

  21. Bacterial Flagella Network Using Combined Methods

  22. Alternative Representations of Network Strong M, Graeber TG, Beeby M, Pelligrini M, Thompson MJ, Yeates TO, Eisenberg D. Inference and Visualization of Protein Networks in Mycobacterium tuberculosis Based on Hierarchical Clustering of Whole Genome Functional Linkage Maps. Submitted to Nucleic Acids Research

  23. Hierarchical Clustering Reveals Modular Evolution

  24. Clusters are Enriched for Pathways and Complexes

  25. Examples of Clusters that Contain Components of Biochemical Pathways

  26. Cluster Reeveals Additional ORFs Involved in Lipopolysaccharide Biosynthesis

  27. Clusters are also Enriched for Subunits of Protein Complexes True positive interactions are between subunits of known complexes and false positive ones are between subunits of different complexes. For high confidence links, we recover one third of true interactions and only one thousandth of the false positive ones

  28. Clusters Containing Subunits of Protein Complexes Cytochrome c oxidase controls the last step of food oxidation ATP Synthase

  29. Identification of an Uncharacterized Protein Complex in Pseudomonas Auruginosa

  30. Parallel Pathways and Protein Complexes Clustered Maps of co-evolving genes may be used not only to identify groups of proteins that are part of a complex or pathway but also to identify duplicated complexes and pathways* *Li H, Pellegrini M, Eisenberg D. Discovering parallel pathways and protein complexes from genome sequences. In preparation.

  31. Schematic of Pathway Duplication Identification

  32. Nitrogenases in Rhodopseudomonas palustris N2 + 8e + 8H+ + 16ATP = 16ADP + 16Pi + 2NH3 + H2 Iron protein (NifH) Mo-Fe protein (α2β2) NifD – α subunit NifK – β subunit

  33. Co-evolution Network of Nitrogenases Full network Distinct sub-networks

  34. Predicting Protein Functions In Yeast Xiaoqun Joyce Duan, Matteo Pellegrini, and David Eisenberg. Discovering Biological Modules and Function from Various Genome-scale Protein Networks Submitted to PLOS Biology.

  35. P Function Functional Link Un-annotated protein Annotated Protein Score S(P, Fj) Guilt By Association: Predicting MIPS Categories for Yeast Genes

  36. Derive score from each network separately Compute p(P,F) from S1, S2: probability protein P has function F Combining Methods to Improve Function Prediction P P Network 2 Network 1 P P S1 S2

  37. Using Bayesian Formalism to Combine Methods

  38. Benchmarking Combined Method Accuracy CS Recovery Accuracy Accuracy Coverage Coverage

  39. MIPS Category Accuracy and Coverage Accuracy CS GN PP RS TFBP Methods Coverage CS GN PP RS TFBP Methods 67.07 C-compound transporters 67.04.07 anion transporters 67.04.01 cation transporters 67.04.01.07 other cation transport 67.04.01.01 heavy methal transporter 67.04 ion transporter 67 Transport facilitation

  40. Identifying the function of a Histone Deacetylase Complex YDR155C YDR155C YIL112W YIL112W YOL068C YOL068C YMR273C YMR273C YGL194C YGL194C YKR029C YKR029C YCR033W YCR033W YBR103W YBR103W 3 un-annotated (empty circles) two share “transcription control” (dark gray circles) 3 annotated with other various functions (light gray circles) Our combined scoring algorithm infers function “transcriptional control” to seven out of the eight proteins (dark gray circles)

  41. Conclusions • Protein modules appear to co-evolve across bacterial species • Modules are enriched for proteins that participate in the same pathway or complex • We can identify and reconstruct duplicated complexes and pathways • Co-evolution may be used to identify functions of yeast proteins

  42. PROLINKS Database We have constructed a database that contains co-evolution links between the genes of 83 (soon to be 150) fully sequenced genomes The Prolinks database may be accessed through the Proteome Navigator web browser interface at: dip.doe-mbi.ucla.edu/pronav Peter M Bowers, Matteo Pellegrini, Mike J. Thompson, Joe Fierro, Todd O. Yeates, David Eisenberg. PROLINKS: A Database of Protein Functional Linkages Derived from Co-evolution, Genome Biology, in press

  43. Proteome Navigator Access Page

  44. Proteome Navigator Network Page

  45. Future Directions • Distinguish pathways from complexes • Combine multiple data types • Compute statistics for 3 or more genes to co-evolve • Account for Phylogenies

  46. Acknowledgements Michael Thompson Peter Bowers Michael Strong Huiying Li Todd Yeates Joseph Fierro David Eisenberg Joyce duan Edward Marcotte

More Related