470 likes | 681 Views
Protein Evolution, Co-evolution and Interaction Networks (Day 2). Matteo Pellegrini Rosetta Inpharmatics. Identifying The Components of Cellular Pathways and Protein Complexes using Co-evolution. Proteins are Components of Molecular Machines.
E N D
Protein Evolution, Co-evolution and Interaction Networks(Day 2) Matteo Pellegrini Rosetta Inpharmatics
Identifying The Components of Cellular Pathways and Protein Complexes using Co-evolution
Proteins are Components of Molecular Machines Hartwell LH, Hopfield JJ, Leibler S, Murray AW. From molecular to modular cell biology. Nature. 1999 Dec 2;402(6761 Suppl):C47-52.
Techniques to Study Protein Interactions Protein Interactions
Bacterial Diversity • 150 fully sequenced genomes in Genbank • 30,000 species represented in Genbank • Sea may support 2,000,000* • Soil may support 4,000,000* *T.P. Curtis, W.T. Sloan, and J.W. Scannell. 2002. Estimating prokaryotic diversity and its limits Proc Natl Acad Sci USA 99: 10494-10499.
The Study of the Co-Evolution of Non-Homologous Proteins • Because selection generally acts to maintain or delete entire complexes and pathways, pairs of proteins that are part of these will appear to co-evolve across bacteria • By studying the co-evolution of non-homologous proteins across these bacteria we attempt to reconstruct the components of complexes and pathways
Methods to Infer Co-evolution Method Basis Phylogenetic Profile Pairs of genes that are always present or absent together in genomes Rosetta Stone Pairs of proteins that are fused in some organism Gene Neighbor Pairs of genes that are coded nearby in multiple organisms Gene Cluster Gene proximity within genome
Phylogenetic Profile Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO, Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A. 96(8):4285-8,. 1999
Hypergeometric Function m n k N
Gene Neighbor Method Pellegrini M, Thompson MJ, Fierro J, Bowers P, A Computational Method to Assign Microbial Genes to Pathways. Journal of Cellular Biochemistry Suppl 37:106-9, 2001
Rosetta Stone Method Identifies Protein Fusions Monomeric proteins that are found fused in another organism are likely to be functionally related and physically interacting. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D, Detecting protein function and protein-protein interactions from genome sequences. Science 285(5428):751-3, 1999
Rosetta Stone Probability Protein i Protein j K Rosetta Stone fusion proteins has m homologs has n homologs
Gene Cluster genomic DNA genes
Tryptophan Operon P<0.01 P=0.09 P<0.01 P<0.01 P=0.53 P=0.67 P=0.91 yciG trpA trpB trpC trpD trpE trpL yciV Here, a p-value threshold of 0.1 captures all but one of the genes for this operon.
Combining Inferences of Co-evolution from Previous Methods We use a Bayesian approach to combine the probabilities from the previous four methods to arrive at a single probability that two proteins co-evolve: Where positive pairs are proteins with common pathway annotation and negative pairs are proteins with different annotation Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science. 2003 Oct 17;302(5644):449-53.
True and False Interactions are derived from Pathway Classification Schemes Information Storage and Processing Translation, ribosomal structure and biogenesis Transcription DNA replication, recombination and repair Cellular processes Cell division and chromosome partitioning Posttranslational modification, protein turnover, chaperones Cell envelope biogenesis, outer membrane Cell motility and secretion Inorganic ion transport and metabolism Signal transduction mechanisms Metabolism Energy production and conversion Carbohydrate transport and metabolism Amino acid transport and metabolism Nucleotide transport and metabolism Coenzyme metabolism Lipid metabolism Secondary metabolites biosynthesis, transport and catabolism Pathway categorization scheme:
Networks of Co-evolving Proteins We can generate networks of co-evolution by selecting only pairs of proteins whose probability of co-evolution is above a threshold
Alternative Representations of Network Strong M, Graeber TG, Beeby M, Pelligrini M, Thompson MJ, Yeates TO, Eisenberg D. Inference and Visualization of Protein Networks in Mycobacterium tuberculosis Based on Hierarchical Clustering of Whole Genome Functional Linkage Maps. Submitted to Nucleic Acids Research
Examples of Clusters that Contain Components of Biochemical Pathways
Cluster Reeveals Additional ORFs Involved in Lipopolysaccharide Biosynthesis
Clusters are also Enriched for Subunits of Protein Complexes True positive interactions are between subunits of known complexes and false positive ones are between subunits of different complexes. For high confidence links, we recover one third of true interactions and only one thousandth of the false positive ones
Clusters Containing Subunits of Protein Complexes Cytochrome c oxidase controls the last step of food oxidation ATP Synthase
Identification of an Uncharacterized Protein Complex in Pseudomonas Auruginosa
Parallel Pathways and Protein Complexes Clustered Maps of co-evolving genes may be used not only to identify groups of proteins that are part of a complex or pathway but also to identify duplicated complexes and pathways* *Li H, Pellegrini M, Eisenberg D. Discovering parallel pathways and protein complexes from genome sequences. In preparation.
Nitrogenases in Rhodopseudomonas palustris N2 + 8e + 8H+ + 16ATP = 16ADP + 16Pi + 2NH3 + H2 Iron protein (NifH) Mo-Fe protein (α2β2) NifD – α subunit NifK – β subunit
Co-evolution Network of Nitrogenases Full network Distinct sub-networks
Predicting Protein Functions In Yeast Xiaoqun Joyce Duan, Matteo Pellegrini, and David Eisenberg. Discovering Biological Modules and Function from Various Genome-scale Protein Networks Submitted to PLOS Biology.
P Function Functional Link Un-annotated protein Annotated Protein Score S(P, Fj) Guilt By Association: Predicting MIPS Categories for Yeast Genes
Derive score from each network separately Compute p(P,F) from S1, S2: probability protein P has function F Combining Methods to Improve Function Prediction P P Network 2 Network 1 P P S1 S2
Benchmarking Combined Method Accuracy CS Recovery Accuracy Accuracy Coverage Coverage
MIPS Category Accuracy and Coverage Accuracy CS GN PP RS TFBP Methods Coverage CS GN PP RS TFBP Methods 67.07 C-compound transporters 67.04.07 anion transporters 67.04.01 cation transporters 67.04.01.07 other cation transport 67.04.01.01 heavy methal transporter 67.04 ion transporter 67 Transport facilitation
Identifying the function of a Histone Deacetylase Complex YDR155C YDR155C YIL112W YIL112W YOL068C YOL068C YMR273C YMR273C YGL194C YGL194C YKR029C YKR029C YCR033W YCR033W YBR103W YBR103W 3 un-annotated (empty circles) two share “transcription control” (dark gray circles) 3 annotated with other various functions (light gray circles) Our combined scoring algorithm infers function “transcriptional control” to seven out of the eight proteins (dark gray circles)
Conclusions • Protein modules appear to co-evolve across bacterial species • Modules are enriched for proteins that participate in the same pathway or complex • We can identify and reconstruct duplicated complexes and pathways • Co-evolution may be used to identify functions of yeast proteins
PROLINKS Database We have constructed a database that contains co-evolution links between the genes of 83 (soon to be 150) fully sequenced genomes The Prolinks database may be accessed through the Proteome Navigator web browser interface at: dip.doe-mbi.ucla.edu/pronav Peter M Bowers, Matteo Pellegrini, Mike J. Thompson, Joe Fierro, Todd O. Yeates, David Eisenberg. PROLINKS: A Database of Protein Functional Linkages Derived from Co-evolution, Genome Biology, in press
Future Directions • Distinguish pathways from complexes • Combine multiple data types • Compute statistics for 3 or more genes to co-evolve • Account for Phylogenies
Acknowledgements Michael Thompson Peter Bowers Michael Strong Huiying Li Todd Yeates Joseph Fierro David Eisenberg Joyce duan Edward Marcotte