400 likes | 570 Views
Biological Networks Analysis Introduction and Dijkstra’s algorithm. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. A quick review. The clustering problem: partition genes into distinct sets with high homogeneity and high separation
E N D
Biological Networks AnalysisIntroduction and Dijkstra’s algorithm Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein
A quick review • The clustering problem: • partition genes into distinct sets with high homogeneity and high separation • Hierarchical clustering algorithm: • Assign each object to a separate cluster. • Regroup the pair of clusters with shortest distance. • Repeat 2 until there is a single cluster. • Many possible distance metrics • K-mean clustering algorithm: • Arbitrarily select k initial centers • Assign each element to the closest center • Voronoi diagram • Re-calculate centers (i.e., means) • Repeat 2 and 3 until termination condition reached
Biological networks What is a network? What networks are used in biology? Why do we need networks (and network theory)? How do we find the shortest path between two nodes?
What is a network? • A map of interactions or relationships • A collection of nodes and links (edges)
What is a network? • A map of interactions or relationships • A collection of nodes and links (edges)
Types of networks • Edges: • Directed/undirected • Weighted/non-weighted • Simple-edges/Hyperedges • Special topologies: • Directed Acyclic Graphs (DAG) • Trees • Bipartite networks
Transcriptional regulatory networks • Reflect the cell’s genetic regulatory circuitry • Nodes: transcription factors and genes; • Edges: from TF to the genes it regulates • Directed; weighted?; “almost” bipartite • Derived through: • Chromatin IP • Microarrays • Computationally
Metabolic networks • Reflect the set of biochemical reactions in a cell • Nodes: metabolites • Edges: biochemical reactions • Directed; weighted?; hyperedges? • Derived through: • Knowledge of biochemistry • Metabolic flux measurements • Homology? S. Cerevisiae 1062 metabolites 1149 reactions
Protein-protein interaction (PPI) networks • Reflect the cell’s molecular interactions and signaling pathways (interactome) • Nodes: proteins • Edges: interactions(?) • Undirected • High-throughput experiments: • Protein Complex-IP (Co-IP) • Yeast two-hybrid • Computationally S. Cerevisiae 4389 proteins 14319 interactions
Non-biological networks • Computer related networks: • WWW; Internet backbone • Communications and IP • Social networks: • Friendship (facebook; clubs) • Citations / information flow • Co-authorships (papers) • Co-occurrence (movies; Jazz) • Transportation: • Highway systems; Airline routes • Electronic/Logic circuits • Many many more…
Why networks? • Networks as models • Networks as tools • Simple, visual representationof complex systems Focus on organization(rather than on components) • Algorithm development • Problem representation(more common than you think) • Discovery(topology affects function) • Predictive models • Diffusion models(dynamics)
The Seven Bridges of Königsberg • Published by Leonhard Euler, 1736 • Considered the first paper in graph theory Leonhard Euler1707 –1783
The shortest path problem • Find the minimal number of “links” connecting node A to node B in an undirected network • How many friends between you and someone on FB(6 degrees of separation) • Erdös number, Kevin Bacon number • How far apart are 2 genes in an interaction network • What is the shortest (and likely) infection path • Find the shortest (cheapest) path between two nodes in a weighted directed graph • GPS; Google map
Dijkstra’s Algorithm "Computer Science is no more about computers than astronomy is about telescopes." EdsgerWybeDijkstra1930 –2002
Dijkstra’s algorithm • Solves the single-source shortest path problem: • Find the shortest path from a single source to ALL nodes in the network • Works on both directed and undirectednetworks • Works on both weighted and non-weighted networks • Approach: • Iterative • Maintain shortest path to each intermediate node • Greedy algorithm • … but still guaranteed to provide optimal solution !!!
Dijkstra’salgorithm • Initialize: • Assign a distance value, D, to each node. Set D to zero for start node and to infinity for all others. • Mark all nodes as unvisited. • Set startnode as current node. • For each of the current node’s unvisited neighbors: • Calculate tentative distance, Dt, through current node. • If Dt smaller than D (previously recorded distance): D Dt • Mark current node as visited (note: shortest dist. found). • Set the unvisited node with the smallest distance as the next "current node" and continue from step 2. • Once all nodes are marked as visited, finish.
Dijkstra’s algorithm • A simple synthetic network 2 5 B D 9 1 4 9 F A 3 7 3 C E 12 2 • Initialize: • Assign a distance value, D, to each node. Set D to zero for start node and to infinity for all others. • Mark all nodes as unvisited. • Set startnode as current node. • For each of the current node’s unvisited neighbors: • Calculate tentative distance, Dt, through current node. • If Dt smaller than D (previously recorded distance): D Dt • Mark current node as visited (note: shortest dist. found). • Set the unvisited node with the smallest distance as the next "current node" and continue from step 2. • Once all nodes are marked as visited, finish.
Dijkstra’s algorithm • Initialization • Mark A (start) as current node 2 D: ∞ D: ∞ 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞ 3 C E 12 D: ∞ D: ∞ 2
Dijkstra’s algorithm • Check unvisited neighbors of A 2 0+9 vs. ∞ D: ∞ D: ∞ 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞ 3 C E 12 D: ∞ D: ∞ 2 0+3 vs. ∞
Dijkstra’s algorithm • Update D • Record path 2 D: ∞ D: ∞,9 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞ 3 C E 12 D: ∞,3 D: ∞ 2
Dijkstra’s algorithm • Mark A as visited … 2 D: ∞ D: ∞,9 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞ 3 C E 12 D: ∞,3 D: ∞ 2
Dijkstra’s algorithm • Mark C as current (unvisited node with smallest D) 2 D: ∞ D: ∞,9 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞ 3 C E 12 D: ∞,3 D: ∞ 2
Dijkstra’s algorithm • Check unvisited neighbors of C 2 3+3 vs. ∞ 3+4 vs. 9 D: ∞ D: ∞,9 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞ 3 C E 12 D: ∞,3 D: ∞ 3+2 vs. ∞ 2
Dijkstra’s algorithm • Update distance • Record path 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞ 3 C E 12 D: ∞,3 D: ∞,5 2
Dijkstra’s algorithm • Mark C as visited • Note: Distance to C is final!! 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞ 3 C E 12 D: ∞,3 D: ∞,5 2
Dijkstra’s algorithm • Mark E as current node • Check unvisited neighbors of E 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞ 3 C E 12 D: ∞,3 D: ∞,5 2
Dijkstra’s algorithm • Update D • Record path 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: 0 D: ∞,17 3 C E 12 D: ∞,3 D: ∞,5 2
Dijkstra’s algorithm • Mark E as visited 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞,17 3 C E 12 D: ∞,3 D: ∞,5 2
Dijkstra’s algorithm • Mark D as current node • Check unvisited neighbors of D 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞,17 3 C E 12 D: ∞,3 D: ∞,5 2
Dijkstra’s algorithm • Update D • Record path (note: path has changed) 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞,17,11 3 C E 12 D: ∞,3 D: ∞,5 2
Dijkstra’s algorithm • Mark D as visited 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞,17,11 3 C E 12 D: ∞,3 D: ∞,5 2
Dijkstra’s algorithm • Mark B as current node • Check neighbors 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞,17,11 3 C E 12 D: ∞,3 D: ∞,5 2
Dijkstra’s algorithm • No updates.. • Mark B as visited 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞,17,11 3 C E 12 D: ∞,3 D: ∞,5 2
Dijkstra’s algorithm • Mark F as current 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞,17,11 3 C E 12 D: ∞,3 D: ∞,5 2
Dijkstra’s algorithm • Mark F as visited 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞,17,11 3 C E 12 D: ∞,3 D: ∞,5 2
We are done! • We now have: • Shortest path from A to each node (both length and path) • Minimum spanning tree 2 D: ∞,6 D: ∞,9,7 5 B D 9 1 4 9 F A 3 7 D: 0 D: ∞,17,11 3 C E 12 D: ∞,3 D: ∞,5 2 • Will we always get a tree? • Can you prove it?
Computational Representationof Networks A B C D Object Oriented • List of edges:(ordered) pairs of nodes[ (A,C) , (C,B) , (D,B) , (D,C) ] Connectivity Matrix Name:Dngr: Name:Cngr: Name:Angr: Name:Bngr: p1 p1 p1 p2 • Which is the most useful representation?