300 likes | 416 Views
FunCoup: reconstructing protein networks in the worm and other animals. Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center. C. elegans computed interactomes. A worm. ?. Mouse. High-throughput evidence. Fly. Find orthologs*. B worm. Human. Yeast.
E N D
FunCoup:reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center
Aworm ? Mouse High-throughput evidence Fly Find orthologs* Bworm Human Yeast FunCoup is a data integration framework to discover functional coupling in eukaryotic proteomes with data from model organisms
FunCoup • Each piece of data is evaluated • Data FROM many eukaryotes (7) • Practical maximum of data sources (>60) • Predicted networks FOR a number of eukaryotes (8) • Organism-specific efficient and robust Bayesian frameworks • Orthology-based information transfer and phylogenetic profiling • Networks predicted for different types of functional coupling (metabolic, signaling etc.)
Li&Vidal’s set 5535 pairs IntAct (Oct. 2007) 4517 pairs C. elegans’ benefit from the model species data integration: 6841 Other C. elegans data 6 eukaryotes' data 36000 predicted C.elegans pairs
Data sources in FunCoup: • Species: • H. sapiens • M. musculus • R. norvegicus • D. melanogaster • C. elegans • S. cerevisiae • A. thaliana • Types: • Protein-protein interactions • Protein domain associations • Protein-DNA interactions • mRNA expression • Protein expression • miRNA targeting • Sub-cellular co-localization • Phylogenetic profiling
Multilateral data transfer Human Mouse Rat FunCoup Ciona Fly Worm Yeast Arabidopsis Data from the same species is an important but not indispensable component of the framework. Hence, a network can be constructed for an organism with no experimental datasets at all.
InParanoid P r o t e o m e A P r o t e o m e B Reciprocally best hits ~ seed orthologs Inparalogs Automatic clustering of orthologs and in-paralogs from pairwise species comparisons Maido Remm, Christian E. V. Storm and Erik L. L. Sonnhammer Journal of Molecular Biology 314, 5, 14 December 2001, Pages 1041-1052
How orthology works? Log overlap between KEGG pathways and complexes (Gavin et al., 2006)
Rat Human Mouse Comparing networks
Conclusions FunCoup: • is a flexible, exhaustive, and robust framework to infer confident functional links • enables practical web access to candidate interactions in both small and global-scale network context • is open towards better data quality and coverage http://FunCoup.sbc.su.se
Acknowledgements: • Carsten Daub • Kristoffer Forslund • Anna Henricson • Olof Karlberg • Martin Klammer • Mats Lindskog • Kevin O’Brien • Tomas Ohlson • Sanjit Rupra • Gabriel Östlund • Sean Hooper • All previous interaction network developers
Talk outline • Other network resources • Why FunCoup • Orthology and InParanoid • Implementation • Applications and future development
FunCoup is a naïve Bayesian network (NBN)Bayesian inference: Genes A and B co-expressed P(C|E) = (P(C) * P(E|C)) / P(E) A<->B Genes A and B are functionally coupled
Problem: Solution: In situatons with multiple inparalogs, how to deal with alternative evidence? Treat ALL inparalogs equally, and choose the BEST value
Problem: Solution: Absolute probabilities of FC are intractable. The full Bayesian network is impossible Naïve Bayesian network. Calculate a belief change instead (likelihood ratios, LR). Assume NO data dependency P(A|C), P(C|A) P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) P(D|C), P(C|D) P(B|A), P(A|B) P(A|D), P(D|A) P(B|C), P(C|B) A<->B P(B|D), P(D|B) A<->B
gene evolution functional link Problem: Solution: How to establish optimal bridges between species? Via groups of orthologs that emerged from speciation
Homologs P r o t e o m e A P r o t e o m e B Homologs: proteins with similar sequence and, thus, common origin
An InParanoid cluster of orthologs Inparalogs
Problem: Solution: Some LR are weak and arise due to non-representative sampling Enforce confidence check and remove insignificant nodes P(E|+) / P(E|-) P(E|+) / P(E|-) χ2-test P(E|+) / P(E|-) P(E|+) / P(E|-) A<->B
Reciprocally best hits P r o t e o m e A P r o t e o m e B Reciprocally best hits
P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) A|B A<>B Problem: Solution: Multinet Decide which types of FC are needed (provide as positive training sets) and perform the previous steps customized Definitions and notions of FC vary A<>B A|B A||B
Multinet presents several link types in parallel Proteins of the Parkinson’s disease pathway (KEGG #05020) Physical protein-protein interaction “Signaling” link Metabolic “non-signaling” link
FunCoup’s web interface http://FunCoup.sbc.su.se Hooper S., Bork P. Medusa: a simple tool for interaction graph analysis. Bioinformatics. 2005 Dec 15;21(24):4432-3. Epub 2005 Sep 27.
Reconctructing the “regulatory blueprint”* in C. intestinalis *Imai KS, Levine M, Satoh N, Satou Y (2006) Regulatory blueprint for a chordate embryo. Science, 26:1183-7. Proteins of the “Regulatory Blueprint for a Chordate Embryo” [*] 18 links mentioned in [*] AND found by FunCoup Links found by FunCoup (about 140) The rest, 202 links from [*] that FunCoup did not find, not shown
Overview and comparison of ortholog databases Alexeyenko A, Lindberg J, Pérez-Bercoff Å, Sonnhammer ELL Drug Discovery Today:Technologies (2006) v. 3; 2, 137-143 Orthologs Functional link Inparalogs C. elegans D. melanogaster human S cerevisiae
Solution: Find them individually for each data set and FC class, accounting for the joint “feature – class” distribution Problem: Distribution areas informative of FC may vary + + + + + + + +++ +++ +++ ++ + ++ - - - ----- -- ------ - - -- - - - -1 0 Pearson r 1
Validation Jack-knife procedure: • Take “positive” and “negative” sets • Split each randomly as 50:50 • Use the first parts to train the algorithm, the second to test the performance • Repeat a number of times Analysis Of VAriance: • Introduce features A, B, C in the workflow of FunCoup (e.g., using PCA, selecting nodes of BN by relevance, ways of using ortholog data etc.) • Run FunCoup with all possible combinations of absence/presence of A, B, C to produce a balanced and orthogonal ANOVA design with replicates • Study effects of A,B,C or their combinations AxB, BxC,.. AxBxC to see if they influence the performance significantly (whereas all other effects did not exist)