470 likes | 670 Views
Gene duplication models and reconstruction of gene regulatory network evolution from network structure Juris Viksna, David Gilbert Riga, IMCS, 10.02.2006. Gene regulatory networks. Yeast network:. [J.Rung,T.Schlitt,A.Brazma,K.Freivalds,J.Vilo Bioinformatics 18 S2 (ECCB), 202-210 ].
E N D
Gene duplication models and reconstruction of gene regulatory network evolution from network structure Juris Viksna, David Gilbert Riga, IMCS, 10.02.2006
Gene regulatory networks Yeast network: [J.Rung,T.Schlitt,A.Brazma,K.Freivalds,J.Vilo Bioinformatics 18 S2 (ECCB), 202-210 ]
Gene regulatory networks • Directed graph • Graph vertices correspond to • genes • An edge from gene A to B • means that gene B is (directly) • regulated by gene A
Properties of gene networks (1) • Believed to be scale-free (vertex degrees • satisfy so-called power law): • N(k) – number of vertices with degree k • N(k) k
Properties of gene networks (1) [F.Chung,L.Lu,T.Dewey,D.Gallas JCB 10, 677-687] N(k) k
Properties of gene networks (2) • Believed to have a noticeable modularity • i - vertex • ki - number of neighbours for vertex i • ki - number of direct links between these • ki neighbours • Clustering coefficient (for vertex i): • Ci= 2ni/ki(ki1)
Properties of gene networks (2) • Clustering coefficient (for vertex i): • Ci= 2ni/ki(ki1) [E.Ravasz,A.Somera,D.Mongru,Z.Oltvai,A.Barabasi Science 297, 1551-1555]
Network evolution models (1) • networks expand continuously by the addition of new vertices, • (ii) new vertices attach preferentially to sites that are already well connected. • A model based on these two ingredients reproduces the observed stationary scale-free distributions. [A.Barabasi, R.Albert Science 286, 509-512]
Network evolution models (2) "Hierarchical" model Sample hierarchical networks (scale-free and modular) [E.Ravasz,A.Somera,D.Mongru,Z.Oltvai,A.Barabasi Science 297, 1551-1555]
Network evolution models (3) "Duplication" model Scale-free with b < 2 for ½ < p < 1 [F.Chung,L.Lu,T.Dewey,D.Gallas JCB 10, 677-687]
M1, p = 0.1, 5000 vertices 4.5
M1, p=0.05, d=0.2, 5000 vertices 2.5
Network evolution models (M1) M1 V E 20 40 50 200 100 700 500 15000 1000 50000 5000 800000
Network evolution models (M2) A A genome evolution X X' X X'
Network evolution models (M2) A X X' A genome evolution or X X' A X X'
Network evolution models (M2) M2 V E 20 40 50 80 100 150 500 700 1000 1500 5000 7000
Evolution graphs k+2 vertices two types of edges: - for swappable events (black) - for dependent events (grey)
Evolution graphs Initial graph G Numbered vertices correspond to evolution steps and are marked by the vertices duplicated in the corresponding steps Intermediate graphs between G and G' correspond to cuts of evolution graph (G and G' can also be obtained in this way) Graph G' obtained from G after k (in this example k=6) evolution steps
Evolution graphs – some questions Equivalence Decide whether 2 given evolution graphs are equivalent Irreducible networks – networks that can’t be obtained from simpler networks by evolution graph Uniqueness of evolution Is it possible that D(G1,E1)= D(G2,E2) for two different irreducible networks G1 and G2?
"Reverse engineering" problems E Reconstruct: Given: G' G
"Reverse engineering" problem (1) (Assuming either model M1 or M2.) Reconstruction of evolution graph For a given network N’ find an irreducible network N, the sequence of duplication events D1,...,Dm and the corresponding evolution tree, such that N’=D(N,E).
"Reverse engineering" problem (2) (Assuming either model M1 or M2.) Reconstruction of duplication event For a given network N’ find a network N and a duplication event D, such that N’=D(N).
"Reverse engineering" problem (3) (Assuming either model M1 or M2.) Reconstruction of the largest duplication event For a given network N’ find a network N with the smallest possible number of genes and a duplication event D, such that N’=D(N).
"Reverse engineering" - complexity For a given network N’ find a network N with the smallest possible number of genes and a duplication event D, such that N’=D(N). • at least as hard as graph isomorphism problem • likely NP-hard (maximum clique for reconstruction • graphs) • reconstruction graphs are much smaller than • networks • still might be practically solvable for random graphs • of reasonable size (few tens of thousands of vertices).
Algorithm – stage 1 Partition G' vertices into orbits Can be done e.g. with nauty package One can try to use some property p which is more simple to compute than automorphisms and is such that p(G1)=p(G2) for isomorphic graphs G1 and G2.
Reconstruction graphs Vertices correspond to non-singleton orbits Two types of edges: - (1) have to participate in the same duplication event (solid) - (2) can not participate in the same duplication event (dotted)
Algorithm – stage 2 Find reconstruction graph
Algorithm – stage 3 Find the largest independent set (according to type 2 edges) in reconstruction graph
Algorithm – stage 4 - if all selected orbits contain just 2 nodes, we are practically done - otherwise we have to find a pair of (largest) sets of vertices from selected orbits, which correspond to duplication event [currently exhaustive search]
Algorithm Evolution graph can be reconstructed by repeated use of Largest duplication event
Algorithm - efficiency - using nauty we can deal with networks with < 200 genes - for larger graphs one can use heuristics to compute orbits - vertex/edge counts at different DFS levels seems to work quite well - likely to find a large part of duplication event - for <200 vertices often gives the exact result
Algorithm – Model 2 General case – check automorphisms for all k-tuples of vertices A serious problem even for k=2 However, large components are duplicated not that often Previous algorithm could be used to find "large" part of duplicated genes Still an open problem Also, a question about good heuristics
Model 2 – Component sizes Model M2 550 vertices 132 duplications
Model 2 – Component sizes Constructing random network with 20000 genes: Component sizes#of events 1 177008 2 342 3 97 4 49 5 37 6 18 7 13 8 10 10,11,14 4 9,12,13,15,27 3 16,24 2 17,18,21,22,31,27 1
Experiments with yeast network 6270 genes 106 regulators
Experiments with yeast network p=0.0001 E=106 V=216
Experiments with yeast network 277 pairs of duplication candidates were discovered Few "real": COS5 and COS8, YLR460C and YNL134L All 5962 genes were compared all-v-all using SW Normalized compression score: ssearch_score(P1,P2)/min{length(P1),length(P2)} Scores for the found duplication pairs were compared with average values
Experiments with yeast network Observed distances vs average, all non-adjacent gene pairs