1 / 46

Network Querying Algorithms

Network Querying Algorithms. Roded Sharan Tel-Aviv University. Protein Interactions. Crucial to cell function. Measured by high-throughput technologies: yeast two-hybrid co-immunoprecipitation Systematic data available for several species. Network Querying Problem.

Download Presentation

Network Querying Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Querying Algorithms Roded Sharan Tel-Aviv University

  2. Protein Interactions • Crucial to cell function. • Measured by high-throughput technologies: • yeast two-hybrid • co-immunoprecipitation • Systematic data available for several species.

  3. Network Querying Problem • Sequence comparison allows transferring information a well studied genome to another genome. • Species A • well studied • protein interaction subnetworks defined by extensive experimentation • Species B • less studied • little knowledge of subnetworks • protein interaction network mapped using high-throughput technologies • Can we use the knowledge of A to discover corresponding subnetworks in B (if such exist)?

  4. Isomorphic Alignment Species A Species B Q isomorphic to Q match match match match match match Match of homologous proteins

  5. Homeomorphic Alignment Species A Species B Q homeomorphic to Q match match match match deletion insertion match match Match of homologous proteins and deletion/insertion of degree-2 nodes

  6. Sequence similarity score for matches Penalty for deletions & insertions Interaction reliability scores + + Score = Score of Alignment h(q1,v1) q1 v1 w(v1,v2) h(q2,v2) h(q3,v3) v2 h(q4,v4) del pen ins pen h(q5,v5) h(q6,v6)

  7. Network Querying Problem Query Q • Given a query graph Q and a network G, find the sub-network of G that is: • homeomorphic to Q • aligned with maximal score Network G

  8. Complexity • Network querying problem is NPC by reduction from subgraph isomorphism • Naïve algorithm has O(nk) complexity • n = size of the PPI network, k = size of the query • Intractable for realistic values of n and k • n ~5000, k~10 • Reduction in complexity can be achieved by: • Constraining the network [Pinter et al., Bioinformatics’05] • Constraining the query (fixed parameter algs.) • Allowing vertex repetitions

  9. Path Querying

  10. The Path Query Problem Query Pathway Target Pathway A’ A deletion B C’ C insertion E D’ D

  11. PathBLAST p(v) – sequence similarity q(e) – interaction reliability Kelley et al., PNAS’03

  12. Alignment-Based Approach Pros: • Conceptually simple. • Extensible to general queries (using any network alignment program). Cons: • No general treatment of indels. • Protein Repetitions.

  13. DP-Based Approach • Use dynamic programming (a la sequence alignment): W(i,j) is the maximal score of a partial alignment of query nodes {1…i} that ends at vertex j of the network. match insertion deletion • But this may introduce protein repetitions along the path. Shlomi et al., BMC Bioinformatics ’06; Yang & Sze, JCB’07

  14. Color Coding [AYZ’95] • Problem: Given a graph G=(V,E) and a parameter k, find a simple path of length k in G. • Algorithm: Randomly color vertices with k colors, and find a colorful path (distinct colors). • Complexity: • Colorful path found by DP in O(km2k). • Prob. of success (path is colorful): k!/kke-k. • Overall: m2O(k).

  15. Network Querying with Color Coding randomly color Network Graph query repeat N times high scoring subnetwork DP algorithm Shlomi et al., BMC Bioinformatics ‘06

  16. Yeast & Fly PPI Networks • S. cerevisiae • 4,726 proteins • 15,166 interactions • D. melanogaster • 7,028 proteins • 22,837 interactions

  17. Yeast-Fly Queries • Applied QPath to 271 yeast queries spanning the yeast network. • 63% of queries were matched, most requiring protein indels.

  18. The Scoring Module • Functional enrichment of a matched path correlates with: • Its interaction reliabilities • Its sequence similarities • Its numbers of protein insertions and deletions (anti-correlation). Goal: score matched pathways by their prob. to be functionally enriched. Method: logistic regression on path attributes – PPI reliabilities, sequence similarities, #insertions, #deletions.

  19. Best Matches • 171 best matches identified. • 51% were functionally enriched. • Best matches were significantly more functionally enriched and expression coherent than arbitrary pathways (p<1e-4).

  20. Queries w. Known Pathways Map kinase (yeast) Ubiq. ligation Hedgehog

  21. Pathway homology can be used to predict function! Function Conservation • 69 best matches had an enriched function in both species. • 64% preserved their function; significantly more than the random expectation (31%). • In comparison, sequence best matches preserve their function in only 40% of the cases!

  22. Fly Conserved Pathway Map • Predicted annotations were significantly prevalent. • Map exhibits modularity (cc=0.26).

  23. Querying for Trees & General Graphs

  24. QNet: Tree Queries Network Query has k nodes. Query Dost et al., RECOMB’07

  25. QNet: Tree Queries Network • Query has k nodes. • Randomly color the network with k distinct colors. • Suppose optimal subnetwork is “colorful”. • (all of its vertices colored with distinct colors) • Use the colors to remember the visited nodes.

  26. Finding colorful trees Query Network q1 v1 q2 q3 v2 v4 q4 v3 q5 v6 q6 v7 q7

  27. Querying General Graphs • We have extended the algorithm also for general graphs. • Idea: • Map the original graph into a tree, i.e. tree decomposition. • Solve the querying problem on this tree using DP.

  28. Querying General Graphs Map the original query into a tree using tree-decomposition. node=set of vertices T G u v z vertex

  29. Querying General Graphs Width(T) = size of its largest node – 1. Tree-width(G) = minimum width among all possible tree decompositions of G. T G

  30. Querying General Graphs Network Original query has k nodes and tree-width t. Randomly color the network with k distinct colors. q1 T q2 q3 q2 q3 q4 q5 q5 q4 q8 q6 q7

  31. Querying General Graphs Network Original query has k nodes and tree-width t. Randomly color the network with k distinct colors. q1 T v1 q2 q3 v2 v3 q2 q3 v4 v5 q4 q5 v7 v8 q5 q4 v6 q8 q6 q7 O(n(t+1))

  32. Running time • n=size of network, k=size of query. • Tree queries: • m2O(k). • Tractable for realistic values of m and k. • E.g.: n ~5000, k=9 => 11 seconds • Bounded-tree-width graphs: • t : tree-width • n(t+1)2O(k)

  33. A Tree-Based Heuristic • Extract several spanning trees from the original query. G

  34. A Tree-Based Heuristic • Extract several spanning trees from the original query. • Query each spanning tree in the network.

  35. A Tree-Based Heuristic • Extract several spanning trees from the original query. • Query each spanning tree in the network.

  36. A Tree-Based Heuristic • Extract several spanning trees from the original query. • Query each spanning tree in the network.

  37. A Tree-Based Heuristic • Extract several spanning trees from the original query. • Query each spanning tree in the network. • Merge the matching trees to obtain matching graph.

  38. Test 1: Importance of Topology • Motivation: Is sequence similarity enough to find corresponding sub-network? • Queries: • Random tree queries from yeast DIP network [Salwinski, 2004] • Topology perturbed (≤2 ins-dels). • Network: • Yeast PPI • Protein sequences mutated (50-70 percent) • How distant is the result from the original extracted tree?

  39. Test 1: Importance of Topology BLAST QNet Average distance Average distance #ins+#del #ins+#del • Distance = #missing proteins + #extra proteins • Outperforms sequence-based searches.

  40. Test 2: Cross-species Comparison of MAPK Pathways Query from human Match in fly • Motivation: finding conserved pathways. • Query: human MAPK pathway involved in cell proliferation and differentiation. • Network: fly PPI network • ~7K proteins • ~20K interactions • Match: a known fly MAPK pathway involved in dorsal pattern formation.

  41. Test 3: Cross-species Comparison of Protein Complexes • Motivation: conserved protein complexes between yeast and fly. • Queries: • Hand-curated yeast MIPS complexes. • Project onto yeast DIP network. • Extract several spanning trees.

  42. Test 3: Cross-species Comparison of Protein Complexes • Motivation: conserved protein complexes between yeast and fly. • Queries: • Hand-curated yeast MIPS complexes. • Project onto yeast DIP network. • Extract several spanning trees. • Network: • Fly DIP network • Match • Consensus matching graph for each query complex.

  43. Test 3: Cross-species Comparison of Protein Complexes Fly Yeast Cdc28p complex Result: • ~40 of the queries resulted in a match with >1 protein. • 72% of the consensus matches were functionally enriched. • In comparison, 17% of the random trees extracted from network are functionally enriched.

  44. Conclusions • Fixed parameter algorithms for querying paths and trees. • Definition of a match: homeomorphism • General queries: • Yang & Sze JCB’07: branch-and-bound • Alignment-based

  45. Acknowledgments Danny Segal, TAU Trey Ideker, UCSD Richard Karp, ICSI Eytan Ruppin Tomer Shlomi Vineet Bafna, UCSD Banu Dost Nitin Gupta

More Related