310 likes | 399 Views
SubMAP Aligning metabolic pathways with subnetwork mappings. Ferhat Ay , Tamer Kahveci RECOMB 2010. Bioinformatics Lab. University of Florida. What is Pathway Alignment?. R4. R6. R1. R3. R8. Pathway1. R2. R5. R7. Alignment. R3. R5. R1. Pathway2. R2. R7. R4. R6.
E N D
SubMAP Aligning metabolic pathways with subnetwork mappings Ferhat Ay, Tamer Kahveci RECOMB 2010 Bioinformatics Lab. University of Florida
What is Pathway Alignment? R4 R6 R1 R3 R8 Pathway1 R2 R5 R7 Alignment R3 R5 R1 Pathway2 R2 R7 R4 R6 • Global Alignment is GI-Complete • Local Alignment is NP-Complete
Existing Methods • Heymans et al. Bioinformatics (2003) – Undirected, Hierarchical Enzyme Similarity • Clemente et al.Genome Informatics(2005) – Gene Ontology Similarity of Enzymes • Pinter et al. Bioinformatics (2005) – Directed,Only Multi-Source Trees • Singh et al. RECOMB (2007) – PPI Networks, Sequence Similarity • Dost et al. RECOMB (2007) – QNET, Color Coding, Tree queries of size at most 9 • Cheng et al. Bioinformatics (2009) – MetNetAligner, Allows insertions & deletions • PROBLEMS • Restriction of one-to-one Mappings • Similarity of Biological Functions • Topology Restrictions
Different Paths, Same Function E. Coli A. thaliana
One-to-many (Subnetwork) Mappings • Biologically Relevant • Frequently Observed in Nature • Characterize Similarity • CHALLENGES • Exponential Number of Subnetworks • Defining Similarity Between Subnetworks • Overlapping Mappings (Consistency)
Outline Enumerating Subnetworks Homological Similarity Topological Similarity One-to-one Mapping Subnetwork Mappings One-to-one Mapping Subnetwork Mappings Combining Homology & Topology Extracting Mappings
Enumeration of Subnetworks 1 5 6 3 2 4 R1 1 2 3 4 5 6 R2 1,3 2,3 3,4 3,5 5,6 R3 1,2,3 1,3,4 1,3,5 2,3,4 2,3,5 3,4,5 3,5,6
Homological Similarities (1-to-1) Enzyme Similarity (SimE) • Hierarchical Enzyme Similarity – Webb EC.(2002) • Information-Content Enzyme Similarity – Pinter et al.(2005) • Gene Ontology Similarity of Enzymes– Clemente et al.(2003) Compound Similarity (SimC) • Identity Score for compounds • SIMCOMP Compound Similarity – • Hattori et al.(2003) • Reaction Similarity (SimR) • Defined in terms of SimE and SimC L-Aspartate L-Lysine
Homological Similarities(Subnetworks) Input compounds enzymes Output compounds Subnetwork1 • (s1) Input compounds enzymes Output compounds Subnetwork2 • (s2) Sim(s1,s2) = w1MWBM(inputs) + w2MWBM(enzymes) + w3MWBM(outputs)
Topological Similarities (1-to-1) |R| = 4 BN (R3)= {R1,R2} FN (R3)= {R4} BN (R3)= {R1} FN (R3)= {R4,R5} R1 R3 R4 A [R3 ,R3][R2,R1] =1 = 1 2*1 + 1*2 4 R2 R4 R1 R3 R5 |R| = 4 (|R| |R| ) x (|R| |R| ) = 16 x 16
Topological Similarities (Subnetworks) 4 7 1 5 1 5 6 3 2,3 5,7 2 4 Si Si 5,6 Pathway 1 5,6,7 Backward & Forward neighbors Support matrix S'j {Si,S'j} Pathway 2 1 FN(Si)FN(S'j)+BN(Si)BN(S'j)
Outline Enumerating Subnetworks Homological Similarity Topological Similarity One-to-one Mapping Subnetwork Mappings One-to-one Mapping Subnetwork Mappings Combining Homology & Topology Extracting Mappings
Combining Homology & topology Hk+1= αAHk+ (1-α)H0 Iteration 1: Support of aligned first degree neighbors added Iteration 2: Support of aligned second degree neighbors added Iteration 0: Only pairwise similarity of R3 and R3 Iteration 3: Support of aligned third degree neighbors added R1 R4 R6 R1 R3 R3 R2 R8 R2 R5 R7 R8 R5 R7 Focus on R3 – R3 matching
Combining Homology & topology Hk+1= αAHk+ (1-α)H0 InitialSimilarity Matrix H0Vector HkVector FinalSimilarity Matrix 0.5 1.0 0.4 0.3 0.6 0.9 0.5 0.5 0.6 0.9 0.5 0.5 0.6 0.9 0.5 0.5 Power Method Iterations 0.5 1.0 0.4 0.3 0.3 0.5 0.8 0.8 0.1 1.0 0.2 0.9 0.5 1.0 0.4 0.3 0.3 0.5 0.8 0.8 0.1 1.0 0.2 0.9 0.3 0.5 0.8 0.8 0.2 0.3 0.6 0.9 0.2 0.3 0.6 0.9 0.2 0.3 0.6 0.9 0.1 1.0 0.2 0.9 0.2 1.0 0.4 0.6 0.2 1.0 0.4 0.6 0.2 1.0 0.4 0.6
Extracting Mappings (1-to-1) R1 R2 R3 R1 R1 R1 R2 R3 R4 0.8 0 0.4 0 0.3 1.0 0 0.5 0 0 0.6 0.9 R2 R2 R3 R3 R4 Maximum Weight Bipartite Matching
Extracting Mappings (Subnetworks) 1,2 1 1,2 1 Conflicts With 1 2 1 2 3 1 4 3,4,5 4 3,4,5 Conflicts With 4,5 5 4,5 5 Maximum bipartite matching will fail!
How to Handle Conflicts? Label Mapping Weight Conflict Graph a 1,2 1 0.7 b 1 2 0.6 c 3 1 0.4 d 4 3,4,5 0.9 Find the set of non-conflicting mappings that maximizes the sum of sum of similarity scores. e 4,5 5 0.8 Subnetwork from pathway 1 Subnetwork from pathway 2
Maximum Weight Independent Set Problem • Given an undirected vertex weighted • graph find a vertex induced subgraph • That maximizes the sum of the vertex • weights (maximum weight) • That has no edges (independent set) • NP-Hard – Karp, 1972 • Hard to approximate – Hastad, 1996 • (There is no PTAS unless P=NP) 9+8+2+0 = 19
Finding the Best Alignment is NP-Hard ≤ MWIS problem in bounded degree graphs with max degree k+1 Metabolic pathway alignment with subnetworks of size at most k
How do we find the mappings? Label Mapping Weight Conflict Graph a 1,2 1 0.7 b 1 2 0.6 c 3 1 0.4 f(a) = 0.7 • f(b) = 0.6/0.7 • f(c) = 0.4/0.7 • f(d) = 0.9/0.8 • f(e) = 0.8/0.9 f(a) = 0.7 • f(b) = 0.6/0.7 • f(c) = 0.4/0.7 Alignment d 4 3,4,5 0.9 Choose the vertex v that maximizes w(v) f(v) = e 4,5 5 0.8 ∑u in N(v)w(u)
EXPERIMENTAL RESULTS
Comparison MetNetAligner: Cheng & Zelikovsky, Bioinformatics 2009. SubMAP: Ay & Kahveci, RECOMB 2010.
Performance of our algorithm k : maximum size of subnetwork
Conclusion • Considering subnetworks improves the accuracy of metabolic pathway alignment and allows revealing alternative paths that are biologically relevant • Alignments within and across the clades have different characteristics in terms of their mapping cardinalities. • SubMAP can be effectively used for applications where identifying different entity sets with same/similarfunctions is necessary. e.g.: filling pathway holes and metabolic/phylogenic reconstruction.