400 likes | 521 Views
A Network-based Approach for Predicting Missing Pathway Interactions. Ankush Bansal 658347261. Outline. Protein-Protein Interactions and Signaling pathway Problem statement and variations of problem Shortest distance algorithm Greedy approach to predict missing edges Other approaches
E N D
A Network-based Approach for Predicting Missing Pathway Interactions AnkushBansal 658347261
Outline • Protein-Protein Interactions and Signaling pathway • Problem statement and variations of problem • Shortest distance algorithm • Greedy approach to predict missing edges • Other approaches • Results • Future Applications
Signaling Pathway • Sub-networks of proteins that communicate via a series of interactions • Contains upstream proteins • Source proteins transmit information to a set of target proteins
Problem Statement Searching for missing edges that will maximally decrease the shortest-path distances between sources & targets
1. Shortcuts Given a weighted digraph G = (V, E) and set of sources s V and t V, finding k edges that will minimize the total shortest distance between all source-target pairs.
2. Shortcuts-X (restricted) Given a weighted digraph G = (V, E) and set of sources s V and t V and maximum allowable hops are r, then finding k edges that will minimize the total shortest distance between all source-target pairs.
3. Shortcut-SS (Single Source) Given a weighted digraph G = (V, E) and set of sources s V and t V, then finding k edges that will minimize the total shortest distance between each target and its single closet source.
4. Shortcuts-X-SS (Restricted, Single Source) Given a weighted digraph G = (V, E) and set of sources s V and t V and maximum allowable hops are r, then finding k edges that will minimize the total shortest distance between each target and its single closet source.
Dijkstra’s Algorithm • Greedy algorithm to solve single source shortest-path problem • Doesn’t work for non-negative weights • Time complexity O(|E|+|V|log|V|), where |E| is # of edges and |V| is # of vertices
Bellman-Ford Algorithm • Another algorithm to find shortest-distance from one source node • Can handle negative weight in the graph • Time complexity is O(|V|.|E|), where | V | and | E | are the number of vertices and edges respectively
Greedy Algorithm to predict pathway-consistent edges • Selects k edges to add iteratively in each step that maximally reduces the cost function • # of non-existent edges are n(n-1) – m, where n and m are # of nodes and directed edges • Require recalculation of the shortest path length to each source to each target just to add a 1 edge
Greedy Algorithm to predict pathway-consistent edges (cont’d) • Trick is to pre-compute the shortest-path distance from every source to every other node, and from every node to every target • Then, check following condition: dprev(s,u) + d(u,v) + dprev(v,t) < dprev(s,t)
Greedy Algorithm to predict pathway-consistent edges (cont’d)
Complexity Improved from O(n2)O(|E|+|V|log|V|) for each step to O(n2)O(1) + (O(|E|+|V|log|V|) for each step
Hop-Restricted Greedy Algorithm • Used modified version of Bellman-ford algorithm that calculate shortest path using atmost r edges • r is generally 5 edges between a target and its closet source • Time complexity is O(n2)O(1) + O(r|E|) for each step
Other Algorithms to predict missing interaction • Direct-ST • predicts direct edges from sources to targets • reduces cost function maximally • Betweenness • Predict highly “central” to the sources and targets • Number of all-pair shortest paths that use the edge is “betweenness centrality” • Consider only source-target pairs
Other Algorithms to predict missing interaction (cont’d) • Jaccard • Add an edge between the two proteins with the highest weighted Jaccard coefficient J(u,v) =
Results (Criteria for evaluation) • Ability to reduce the cost function • Ability to predict edges that lie within the STRING potential edges • Ability to predict edges that lie within the STRING potential edges and HOG-related nodes
Summary • A new framework for predicting missing edges that lie “in-between” given sets of sources and targets • Greedy Algorithm substantially reduced source-target distance between by adding only few edges • Shortcut edges formed alternate path for signal flow which provides greater degree of robustness in the pathway
Summary • For Shortcuts, adding 3 edges reduced distance for 27 out of 55 • Similarly, for Shortcuts-X, adding 3 edges reduced distance for 18 pairs • Hop-restricted objectives tend to select central nodes through with much signal flows
Future Applications • Reducing routing lags or increasing information flow between entities in a network • This pathway-specific context can be applied to other species with such data