640 likes | 656 Views
Discover the concept of small-world networks and their application in modeling complex intracellular molecular interactions. Explore the characteristics and behaviors of small-world networks, including clustering coefficient and characteristic path length.
E N D
C E N T E R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U Introduction to Bioinformatics Lecture 20 Global network behaviour
Networks "The thousands of components of a living cell are dynamically interconnected, so that the cell’s functional properties are ultimately encoded into a complex intracellular web [network] of molecular interactions." "This is perhaps most evident with cellular metabolism, a fully connected biochemical network in which hundreds of metabolic substrates are densely integrated through biochemical reactions." (Ravasz E, et al.)
TF Ribosomal proteins
(4- 1/(4 (4-1)/2) =1/6
Small-world networks A seminal paper, Collective dynamics of "small-world" networks, by Duncan J. Watts and Steven H. Strogatz, which appeared in Nature volume 393, pp. 440-442 (4 June 1998), has attracted considerable attention. One can consider two extremes of networks: The first are regular networks, where "nearby" nodes have large numbers of interconnections, but "distant" nodes have few. The second are random networks, where the nodes are connected at random. Regular networks are highly clustered, i.e., there is a high density of connections between nearby nodes, but have long path lengths, i.e., to go from one distant node to another one must pass through many intermediate nodes. Random networks are highly un-clustered but have short path lengths. This is because the randomness makes it less likely that nearby nodes will have lots of connections, but introduces more links that connect one part of the network to another.
Regular and random networks random regular regular complete
Regular, small-world and random networks:Rewiring experiments (Watts and Strogatz, 1998) p is the probability that a randomly chosen connection will be randomly redirected elsewhere (i.e.,p=0 means nothing is changed, leaving the network regular; p=1 means every connection is changed and randomly reconnected, yielding complete randomness). For example, for p = .01, (so that only 1% of the edges in the graph have been randomly changed), the "clustering coefficient" is over 95% of what it would be for a regular graph, but the "characteristic path length" is less than 20% of what it would be for a regular graph.
Small-world and networks A small-world network can be generated from a regular one by randomly disconnecting a few points and randomly reconnecting them elsewhere. Another way to think of a small world network is that some so-called 'shortcut' links are added to a regular network as shown here: The added links are shortcuts because they allow travel from node (a) to node (b), to occur in only 3 steps, instead of 5 without the shortcuts.
Small-world networks • Network characterisation: • L = characteristic path length • C = clustering coefficient • A small-world network is much more highly clustered than an equally sparse random graph (C >> Crandom), and its characteristic path length L is close to the theoretical minimum shown by a random graph (L ~ Lrandom). • The reason a graph can have small L despite being highly clustered is that a few nodes connecting distant clusters are sufficient to lower L. • Because C changes little as small-worldliness develops, it follows that small-worldliness is a global graph property that cannot be found by studying local graph properties.
Small-world networks A network or order (0<p<1 as in earlier slides) can be characterized by the average shortest length L(p) between any two points, and a clustering coefficient C(p) that measures the cliquishness of a typical neighbourhood (a local property). These can be calculated from mathematical simulations and yield the following behavior (Watts and Strogatz):
Small-world networks Part of the reason for the interest in the results of Watts and Strogatz is that small-world networks seem to be good models for a wide variety of physical situations. They showed that the power grid for the western U.S. (nodes are power stations, and there is an edge joining two nodes if the power stations are joined by high-voltage transmission lines), the neural network of a nematode worm (nodes are neurons and there is an edge joining two nodes if the neurons are joined by a synapse or gap junction), and the Internet Movie Database (nodes are actors and there is an edge joining two nodes if the actors have appeared in the same movie) all have the characteristics (high clustering coefficient but low characteristic path length) of small-world networks. Intuitively, one can see why small-world networks might provide a good model for a number of situations. For example, people tend to form tight clusters of friends and colleagues (a regular network), but then one person might move from New York to Los Angeles, say, introducing a random edge. The results of Watts and Strogatz then provide an explanation for the empirically observed phenomenon that there often seem to be surprisingly short connections between unrelated people (e.g., you meet a complete stranger on an airplane and soon discover that your sister's best friend went to college with his boss's wife).
Small world example: metabolism. • Wagner and Fell (2001) modeled the known reactions of 287 substrates that represent the central routes of energy metabolism and small-molecule building block synthesis in E. coli. This included metabolic sub-pathways such as: • glycolysis • pentose phosphate and Entner-Doudoro pathways • glycogen metabolism • acetate production • glyoxalate and anaplerotic reactions • tricarboxylic acid cycle • oxidative phosphorylation • amino acid and polyamine biosynthesis • nucleotide and nucleoside biosynthesis • folate synthesis and 1-carbon metabolism • glycerol 3-phosphate and membrane lipids • riboflavin • coenzyme A • NAD(P) • porphyrins, haem and sirohaem • lipopolysaccharides and murein • pyrophosphate metabolism • transport reactions • glycerol 3-phosphateproduction • isoprenoid biosynthesis and quinone biosynthesis • These sub-pathways form a network because some compounds are part of more than one pathway and because most of them include common components such as ATP and NADP. • Thegraphs on the left show that considering either reactants or substrates, the clustering coefficient C>>Crandom, and the length coefficient L is near that of Lrandom, characteristics of a small world system. random Wagner A, Fell D (2001) The small world inside large metabolic networks. Proc. R. Soc. London Ser. B 268, 1803-1810.
Scale-free Networks Using a Web crawler, physicist Albert-Laszlo Barabasi and his colleagues at the University of Notre Dame in Indiana in 1998 mapped the connectedness of the Web. They were surprised to find that the structure of the Web didn't conform to the then-accepted model of random connectivity. Instead, their experiment yielded a connectivity map that they christened "scale-free." • Often small-world networks are also scale-free. • Ina scale-free network the characteristic clustering is maintained even as the networks themselves grow arbitrarily large.
Scale-free Networks • In any real network some nodes are more highly connected than others. • P(k) is the proportion of nodes that have k-links. • For large, random graphs only a few nodes have a very small k and only very few have a very large k, leading to a bell-shaped Poisson distribution: Scale-free networks fall off more slowly and are more highly skewed than random ones due to the combination of small-world local highly connected neighborhoods and more 'shortcuts' than would be expected by chance. Scale-free networks are governed by a power law of the form: P(k) ~ k-
Scale-free Networks Because of the P(k) ~ k-power law relationship, a log-log plot of P(k) versus k gives a straight line of slope - : Some networks, especially small-world networks of modest size do not follow a power law, but are exponential. This point can be significant when trying to understand the rules that underlie the network.
Comparing Random and Scale-Free DistributionIn the random network (right), the five nodes with the most links (in red) are connected to only 27% of all nodes (green). In the scale-free network (left), the five most connected nodes (red), often called hubs, are connected to 60% of all nodes (green).
Scale-free Networks Before discovering scale-free networks, Barabasi and his team had been doing work that modeled surfaces in terms of fractals, which are also scale-free. Their discoveries about networks have been found to have implications well beyond the Internet; the notion of scale-free networks has turned the study of a number of fields upside down. Scale-free networks have been used to explain behaviors as diverse as those of power grids, the stock market and cancerous cells, as well as the dispersal of sexually transmitted diseases.
Scale-free Networks Put simply, the nodes of a scale-free network aren't randomly or evenly connected. Scale-free networks include many "very connected" nodes, hubs of connectivity that shape the way the network operates. The ratio of very connected nodes to the number of nodes in the rest of the network remains constant as the network changes in size. In contrast, random connectivity distributions—the kinds of models used to study networks like the Internet before Barabasi and his team made their observation—predicted that there would be no well-connected nodes, or that there would be so few that they would be statistically insignificant. Although not all nodes in that kind of network would be connected to the same degree, most would have a number of connections hovering around a small, average value. Also, as a randomly distributed network grows, the relative number of very connected nodes decreases.
Scale-free Networks The ramifications of this difference between the two types of networks are significant, but it's worth pointing out that both scale-free and randomly distributed networks can be what are called "small world" networks. That means it doesn't take many hops to get from one node to another—the science behind the notion that there are only six degrees of separation between any two people in the world. So, in both scale-free and randomly distributed networks, with or without very connected nodes, it may not take many hops for a node to make a connection with another node. There's a good chance, though, that in a scale-free network, many transactions would be funneled through one of the well-connected hub nodes - one like Yahoo’s or Google’s Web portal. Because of these differences, the two types of networks behave differently as they break down. The connectedness of a randomly distributed network decays steadily as nodes fail, slowly breaking into smaller, separate domains that are unable to communicate.
Scale-free Networks Resists Random Failure Scale-free networks, on the other hand, may show almost no degradation as random nodes fail. With their very connected nodes, which are statistically unlikely to fail under random conditions, connectivity in the network is maintained. It takes quite a lot of random failure before the hubs are wiped out, and only then does the network stop working. (Of course, there's always the possibility that the very connected nodes would be the first to go.) In a targeted attack, in which failures aren't random but are the result of mischief, or worse, directed at hubs, the scale-free network fails catastrophically. Take out the very connected nodes, and the whole network stops functioning. In these days of concern about cyber attacks on the critical infrastructure, whether the nodes on the network in question are randomly distributed or are scale-free makes a big difference. With scale-free networks, targeted attacks can be resisted by implementing extra protective measures for the hubs.
Scale-free Networks Epidemiologists are also pondering the significance of scale-free connectivity. Until now, it has been accepted that stopping sexually transmitted diseases requires reaching or immunizing a large proportion of the population; most contacts will be safe, and the disease will no longer spread. But if societies of people include the very connected individuals of scale-free networks—individuals who have sex lives that are quantitatively different from those of their peers—then health offensives will fail unless they target these individuals. These individuals will propagate the disease no matter how many of their more subdued neighbors are immunized. Now consider the following: Geographic connectivity of Internet nodes is scale-free, the number of links on Web pages is scale-free, Web users belong to interest groups that are connected in a scale-free way, and e-mails propagate in a scale-free way. Barabasi's model of the Internet tells us that stopping a computer virus from spreading requires that we focus on protecting the hubs.
Scale Free Network •Hubs, highly connected nodes, bring together different parts of the network • Rubustness: Removing random nodes have little effect • Low attack resistance: Removing a hub is lethal. Random Network • No hubs • Low robustness • Low attack resistance
Robustness of the biodegradation network against perturbations is tested here by removing 200 edges randomly (simulating each time that the enzyme catalysing the reaction step has mutated) (A) For each connection lost (red line), 1.6 compounds lose their pathway to the Central Metabolism (CM). (B) However, the increase in the average pathway length to the CM for the remaining compounds is small The biodegradation network appears to be less tolerant to perturbations than metabolic networks (Jeong et al., 2000)
Preferential attachment in biodegradation networks New degradable compounds are observed to attach prefentially to hubs close to (or in) the Central Metabolism
Protein Function Prediction • How can we get the edges (connections) of the cellular networks? • We can predict functions of genes or proteins so we know where they would fit in a metabolic network • There are also techniques to predict whether two proteins interact, either functionally (e.g. they are involved in a two-step metabolic process) or directly physically (e.g. are together in a protein complex)
Protein Function Prediction The state of the art – it’s not complete Many genes are not annotated, and many more are partially or erroneously annotated. Given a genome which is partially annotated at best, how do we fill in the blanks? Of each sequenced genome, 20%-50% of the functions of proteins encoded by the genomes remains unknown! How then do we build a reasonably complete networks when the parts list is so incomplete?
Protein Function Prediction For all these reasons, improving automated protein function prediction is now a cornerstone of bioinformatics and computational biology New methods will need to integrate signals coming from sequence, expression, interaction and structural data, etc.
Classes of function prediction methods (Recap) • Sequence based approaches • protein A has function X, and protein B is a homolog (ortholog) of protein A; Hence B has function X • Structure-based approaches • protein A has structure X, and X has so-so structural features; Hence A’s function sites are …. • Motif-based approaches • a group of genes have function X and they all have motif Y; protein A has motif Y; Hence protein A’s function might be related to X • Function prediction based on “guilt-by-association” • gene A has function X and gene B is often “associated” with gene A, B might have function related to X