250 likes | 358 Views
TORQUE: Topology-Free Querying of Protein Interaction Networks. Sharon Bruckner 1 , Falk Hüffner 1 , Richard M. Karp 2 , Ron Shamir 1 , and Roded Sharan 1 1 School of computer science, Tel Aviv University 2 Int. Computer Science Institute, Berkley, CA. To appear in RECOMB 09.
E N D
TORQUE: Topology-Free Querying of Protein Interaction Networks Sharon Bruckner1, Falk Hüffner1 , Richard M. Karp2, Ron Shamir1, and Roded Sharan1 1 School of computer science, Tel Aviv University 2 Int. Computer Science Institute, Berkley, CA To appear in RECOMB 09
The problem Input: • Graph G=(V,E) , |V|=n, |E|=m • Color set C={1,2,...,k} • A function c: VC assigning v the color c(v).
The problem We seek: Is there are connected subgraph of G that has exactly one vertex of each color? Call such a subgraph “colorful”
But why? • Our graph = A protein-protein interaction network of some species. • Our colors = set of proteins from another species that constitute a complex. Each network vertex is given the color of the protein in that set most similar to it.
But why? • Our graph = A protein-protein interaction network of some species. • Our colors = set of proteins from another species that constitute a complex. Each network vertex is given the color of the protein in that set most similar to it. • What is the meaning of a match? • Hints at an evolutionary conserved region • May infer the functionality of the matched subgraph from that of the complex.
ABOUT THE PROBLEM • NP-complete • Hard even when the graph is a tree with max degree 3 (by reduction from 3SAT ([FFHV07]) • But! We know the number of colors k • is relatively small. • Solution: A fixed parameter algorithm! A problem is fixed-parameter tractable with respect to a parameter k if an instance of size n can be solved in time where f is an arbitrary function (see e.g. [N06])
Defining The Basic algorithm Every connected subgraph has a spanning tree Every colorful connected subgraph will have a colorful spanning tree Instead of looking for a colorful subgraph, look for a colorful tree Input: A graph where each vertex is colored by one of k colors. Output: Is there a colorful tree? Input: A graph where each vertex is colored by one of k colors. Output: What is the highest scoring colorful tree?
Dynamic Programming Algorithm IDEA: Instead of looking at all nk possible subgraphs, look only at all 2k color sets • Row for each vertex • Column for each subset of colors, in increasing size. Score of best tree Rooted in v3 that Is colored exactly By S3 Table verts
Dynamic Programming Algorithm • The last column contains, for every vertex v, the highest scoring tree • rooted in v • colored by all the colors of the query! • Running time: O(3km).
example B(v, { } ) w u u v v
Allowing deletions – matching with less colors • Simply look at all columns with color sets of size at least k - num_dels
Allowing Insertions: Special non-colored vertices or arbitrary vertices
Allowing non-colored insertions • For j insertions, we would expect: • Running time: O(3k+jm). • Actually, • Running time: O(3kmj). • Simply make j copies of each column, and answer the question: B(v, S, j’) = What is the highest scoring tree, rooted in v, colored by S, using exactly j’ insertions?
Formula & Example b f a c e d g Running Time: O(3km*ins)
Details • For every vertex v, color subset S, the algorithm will accurately find the best tree of those having the minimal number of insertions. • Once B(v,S,j) < ∞ for some j, the value for j+i will never be computed! • Cannot guarantee that B(v,S,j+i) will have exactly j+i insertions. v u
Experiments • We applied our method to query complexes within: • yeast (5430 proteins, 39936 interactions), • fly (6650 proteins, 21275 interactions) • human (7915 proteins, 28972 interactions). • Queries: • yeast, fly, human • bovine, mouse, and rat.
Implementation comments We color the graph according to the similarity between the network and query proteins. In practice, in some problem instances the number of colors was not significantly smaller than the graph size This is a result of data reduction in the cases where many network vertices were not sufficiently similar to any query vertex. Therefore, the dynamic programming algorithm is supplemented by an ILP algorithm and some heuristics to handle these instances!
Comparison with other methods • Most previous work tested queries with a known topology. ? • We compare our results with those of QNet ([DSGRBS08] ) , designed to tackle topology-based queries. • QNet is also based on dynamic programming and color coding .
Summary The colorful connected subgraph problem is motivated by the PPI network querying problem. A fixed parameter dynamic programming algorithm, allowing insertions, deletions, and multiple colors per vertex, along with an ILP formulation and heuristics, obtains good results. Thanks: The ACGT group (Igor, Ofer, Chaim, Seagull, Guy…), Nir Yosef. Israel Science Foundation, Edmond J. Safra Bioinformatics Program, Tel Aviv Univ.
References • [FFHV07] M. R. Fellows, G. Fertin, D. Hermelin, and S. Vialette. Borderlines for finding connected motifs in vertex-colored graphs. In Proc. ICALP’07, volume 4596, pages 340–351. Springer-Verlag, 2007. • [N06] R. Niedermeier. Invitation to Fixed-Parameter Algorithms. Number 31 in Oxford Lecture Series in Mathematics and Its Applications. Oxford University Press, 2006. • [BFKN08] N. Betzler, M. R. Fellows, C. Komusiewicz, and R. Niedermeier. Parameterized algorithms and hardness results for some graph motif problems. In Proc. 19th CPM, volume 5029 of LNCS, pages 31{43. Springer, 2008. • [AYZ95] N. Alon, R. Yuster, and U. Zwick. Color coding. Journal of the ACM, 42: 844{856, 1995}. • [DSGRBS08] B. Dost, T. Shlomi, N. Gupta, E. Ruppin, V. Bafna, and R.Sharan. Qnet: A tool for querying protein interaction networks. Journal of Computational Biology, 15(7):913{925, 2008.