300 likes | 452 Views
Quantifying Network Topology. Jennifer Hallinan ARC Centre for Bioinformatics, Institute for Molecular Biosciences & School of ITEE j.hallinan@imb.uq.edu.au. Networks. www.surrey.ac.uk/ SBMS/Fgenomics/. http://radio.weblogs.com/0114726/2003/01/02.html.
E N D
Quantifying Network Topology Jennifer Hallinan ARC Centre for Bioinformatics, Institute for Molecular Biosciences & School of ITEE j.hallinan@imb.uq.edu.au
Networks www.surrey.ac.uk/ SBMS/Fgenomics/ http://radio.weblogs.com/0114726/2003/01/02.html http://www.life.uiuc.edu/bio100/lectures/s02lects/foodweb.gif Network Analysis
What is a Network? • Relational dataset • One or more sets with explicit relations between their members • Nodes (Vertices) • People, actors, agents, entities, (genes, proteins…) • Edges • Links, ties, interactions Network Analysis
Network Data Representation • Matrix • Not necessarily square • Closely related to graph theory • Useful for direct computation (matrix algebra) • Not very efficient for large sparse networks • Linked list • Generally a list of nodes denoting their attributes • Followed by a list of edges • Less standardised • More efficient data storage for large networks • Less efficient computation Network Analysis
a b c d e f g h i a 0 1 0 1 1 1 0 0 0 b 1 0 1 0 0 0 0 0 0 c 0 1 0 0 0 1 0 0 1 d 1 0 0 0 1 1 0 0 0 e 1 0 0 1 0 1 0 0 0 f 1 0 1 1 1 0 0 0 0 g 0 0 0 0 0 0 0 1 1 h 0 0 0 0 0 0 1 0 1 i 0 0 0 0 0 0 1 1 0 A Simple Network Network Analysis
Network structure Random Small world www.discover.com/dec_issue/smallworld.html Scale free Network Analysis www.21cmagazine.com
Network structure:connectivity distribution Network Analysis
Network Metrics • Distance: minimum number of links between a pair of nodes, may be weighted • Diameter: longest shortest path between all pairs of nodes • Density: number of edges as a fraction of all possible edges • Cluster coefficient: measure of attachment amongst neighbours Network Analysis
Small World Networks Network Analysis
Properties • Robust to random error • Most nodes are sparsely connected • Deleting nodes at random tends to leave connectivity of rest of network unchanged • Up to 5% tolerance • Vulnerable to targeted error • Selective removal of most highly connected nodes leads to rapid breakdown • Internet • Enhanced signal propagation speed, computational power and synchronizability • Selective advantage? Network Analysis
Network Generation Algorithms • Networks can be generated computationally • Specified distribution • Specified algorithm • Allows large numbers of networks to be generated and studied • Draw conclusions about classes of network Network Analysis
Preferential Attachment • R. Albert & A. –L. Barabasi, “Topology of evolving networks: Local events and universality” Physics Review Letters vol. 85, pp. 5234 – 5246, 2000. • Start with a small number of nodes connected in a ring • Network grows by adding nodes, which link to other nodes with probability • Scale free, not small world Network Analysis
Preferential Attachment Network Network Analysis
Gene Duplication • R. Pastor-Satorras, E. Smith & R. V. Sole, “Evolving protein interaction networks through gene duplication”, Santa Fe Institute Working Paper 02-02-008, 2002 • Start with a small ring of nodes • At each step a node is selected at random and duplicated, with all its links • The links to the new node are deleted with probability and added with probability • Scale free network • Small world network Network Analysis
Gene Duplication Network Network Analysis
Random Network Network Analysis
Modularity in Biological Networks • Module: “a biological entity characterized by more internal than external integration” (Bolker, 2000) • Biological systems are inherently modular • Cascades of gene activation • Developmental modularity • Functional modules • The behaviour or function of a module reflects the integration of its parts, not simply the arithmetic sum of those parts • Modules are units of selection • Modular organization provides flexibility: modules can be combined in different ways during development to give different outcomes • This permits complex anatomies without excessive demand on genomic complexity Network Analysis
Detecting Modularity • Component: a maximal connected subgraph • Strong component: arcs that make up the paths in the subgraph are in the same direction • Weak component: component in a digraph whose arcs do not form a path in the same direction • Core: most cohesive or highly connected members of a component • k-core: degree-based measure • m-core: multiplicity-based measure (weights on edges) • Clique: subgraph in which every possible pair of points is directly connected and the clique is not contained in any other clique • n-clique: clique with maximal path length of n Network Analysis
Detecting Modularity • Analysis of flux modes (the smallest sub-networks enabling the metabolic system to operate in steady state) • find the maximum flow that can be routed from a source node, to a sink node, while obeying all capacity constraints • Identify and remove linking nodes or edges • orthologous groups with mutually exclusive associations • nodes which have more than a threshold number of links • Betweenness • Clustering • Calculate “distance” between each pair of nodes • Cluster according to distance Network Analysis
Cancer genes Network Analysis
Clustered Network Analysis
Clusters Cluster 2: Handling of epidermal growth factor two platelet-derived growth factors (PDGFA and B), two platelet-derived growth factor receptors (PDGFRA and B), a protein which binds the epidermal growth factor receptor (GRB), and one which is involved in the regulation of epidermal growth factor receptor activity (SHC1) Cluster 4: Mitogen activated protein kinases two mitogen activated protein (MAP) kinase kinases and their targets, three MAP kinases. Network Analysis
Hierarchical modularity Network Analysis
Hierarchical modularity Network Analysis
Network Motifs • What are network motifs? • “The simplest units of commonly used transcriptional regulatory network architecture” (Lee et al., 2002). • “Recurring, significant patterns of interconnections.” (Milo et al., 2002). • Motifs with meaning occur in many different network contexts: Network Analysis
GRN Motifs Network Analysis
Motif Detection • Working on an adjacency matrix representation • Look for all possible two- or three-node configurations • Eg 13 possible 3-node subsets: • look for patterns which occur significantly more frequently in real than in equivalent randomized networks Network Analysis
Motif Detection • Two matrices • The overall matrix D consists of binary entries Dij, where a 1 indicates binding of regulator j to intergenic region i with a p-value of less than or equal to 0.001, a 0 indicates a p-value greather than 0.001. • The regulator matrix R is a subset of D, containing only the rows corresponding to the intergenic region assigned to each regulator, in the same order as the columns of regulators • Autoregulatory motif: Find each non-zero entry on the diagonal of R. • Feedforward loop: For each master regulator (column of R), find non-zero entries, which correspond to regulators bound. For each master regulator / secondary regulator pair, find all rows in D bound by both regulators. • Etc. Network Analysis
Conclusions • Many biological systems can be modelled as networks of interactions • The dynamics of these networks represent phenomena such as changing gene expression over time, spread of information / disease, etc. • Network dynamics are affected by topology • Topological analysis is interesting in its own right, as giving us more information about the global properties of the system • Topological features of interest include connectivity patterns, cluster coefficient, modularity, and motifs Network Analysis