870 likes | 1.08k Views
Introduction to genetic network models. Roberto Serra Centro Ricerche Ambientali Montecatini rserra@cramont.it why networks basics of gene regulation generic properties self-organizing dynamical systems the Kauffman model continuous models topology of complex networks
E N D
Introduction to genetic network models Roberto Serra Centro Ricerche Ambientali Montecatini rserra@cramont.it • why networks • basics of gene regulation • generic properties • self-organizing dynamical systems • the Kauffman model • continuous models • topology of complex networks • small world networks
linear cause-effect chain unlimited growth tree structure (no feedback)
not only genes regulation genes proteins synthesis activation catalysis chemicals from outside
control points mRNA DNA primary RNA transcript mRNA-ribosome protein mRNA transport mRNA degradation transcriptional control RNA processing translational control protein activity control
control mechanisms • the most effective regulation acts at transcription level • RNAp binds to a DNA region upstream of the coding region, the promoter • regulatory proteins can “recognize” certain sequences and bind to them • the interactions between proteins and segments of the DNA chain are highly specific • proteins recognize specificic sequences of bases without the need for opening the DNA double helix • in eucaryotes it is necessary the the DNA molecule be unbound in order for the regulatory proteins to operate
product inhibition bound RNAp inactive repressor repressor activated by tryptophan
catabolite induced activation inactive CAP CAP activated by cAMP
collective regulatory mechanisms • groups of genes may be activated or inactivated simoultaneously • sigma factors in bacteria • transcription factors in eucaryotes • these mechanism introduce correlations among the expression patterns of different genes • certain kinds of packaging of genes in eucaryotes (e.g. heterochromatin) make genes in that region inaccessible to RNAp
modelling level • the choice of the modelling level is a crucial step • while there are detailed models of the protein synthesis process, in order to understand network properties it is advisable to use a simplified view of the synthesis • activation level of a given gene = • concentration of the corresponding mRNA • concentration of the corresponding protein • concentrations can be expressed either as continuous or as discrete variables • the latter when there are say a few molecules per cell • a boolean approximation may often be appropriately employed • our “standard” choice: • activation = concentration of the corresponding protein • activation = continuous or boolean
asking specific questions • modelling specific control circuits • which genes, chemicals etc. directly affect the expression of my-gene? • or which do affect it in an indirect way ? • which are the control regions? • which interactions are there among the control molecules, which is the logic of the control? • these are “classical” problems in biological research on genetic control • provide detailed, specific information about specific circuits • which serve as a guide to guess the general principles of network “design”
a complementary approach • trying to understand the properties of large networks • if we knew all the details, we could write down the exact model of the overall network • but this is impossible so far • looking at general properties of “networks of the kind” which is present in cells • general properties means global structural features, types of possible dynamical behaviours, etc. • this analysis has very strong implications for the theory of biological evolution • the search for generic properties may also provide hints for the analysis of specific circuits • which questions to ask • which features to expect
generic properties of genetic networks • the strategy: analyze ensembles of networks • the ensemble is composed by networks which share some overall features (constraints) • nonconstrained features vary at random in the ensemble • characterize the statistical distribution • analyze the generic features
ensembles of networks • a technique from statistical physics • example: the Hopfield model of boolean neural networks • stored patterns are “memorized” in a set of weights W • wij weight connecting nodes i and j • every set of stored patterns gives rise to a set of W values • to analyze the generic properties of these networks • suppose that the stored patterns are random • characterize the properties of W • analyze the interesting features, like storing capacity, crosstalk among patterns, etc.
generic questions • which kind of dynamic behaviour can we expect in a certain type of networks ? • fixed points, limit cycles, strange attractors ? • islands of activation spreading through the network ? • how sensible are these asymptotic states to perturbations ? • either in inputs or in the network structure • what kind of topology shall we expect in genetic networks ? • how does the information flow from one point to the rest of the network ? • how far • how fast
reduced description • the activation of a gene depends upon proteins and chemicals • let us suppose that • the synthesis of regulatory proteins is “fast” wrt to the time constants of the regulatory processes • regulatory proteins decay with a time constant which is fast wrt to the time constants of the regulatory processes • the concentrations of regulatory chemicals are constant • then we may express the activation at time t+d t as a function of the activations at time t • only one kind of variable is sufficient ! • this holds true under both interpretations of “activation” • concentration of mRNA • concentration of protein • the important point is the loss of memory within d t
Kauffman model • a generic model, meant to capture the features of large webs of interconnected genes • genes’ activations are boolean (1 or 0)ir state at fixed time steps t, t+1, t+2 … • each gene activation at time t+1 is determined by the activation of a fixed set of input genes at time t • external chemicals are not explicitly taken into account • updating is synchronous
examples C’ = A and B C’ = A or B C’ = A xor B def: canalyzing functions are those boolean functions where there is at least one value of one of the inputs which uniquely determines the output irrespective of the others examples canalyzing or, and examples noncanalyzing xor, parity A B C C(t+1) depends upon A(t) and B(t)
the Kauffman model is a dynamical system • at time 0, an activation value is given to each gene • at each time step t=1, 2 ..., each gene takes an activation value xi(t) determined according to the previous laws • the global state of the system X = [x1, x2 ... xN] is the ordered set of activation values • X(t) determines X(t+1) • as time passes the system moves from state X(t) to X(t+1), X(t+2), etc, following a trajectory in a N-dimensional state space • allowed states are located on the corners of the unit hypercube
the state space 101 y 010 110 011 111 100 000 x 001 z
definitions • attractorº a set of states which is either approached in the limit t->¥ , or is reached in a finite time and no longer abandoned by a dynamical system • random boolean networks with a finite number of nodes have a finite number of states, so the attractor is reached in finite time • attractors may be fixed points, cycles, or strange attractors • (not allowed in finite boolean systems) • the set of initial conditions which evolve towards a given attractor is its basin of attraction • attractors determine the key features of dynamical systems • after transients have died out • qualitative analysis of dynamical systems concentrates on attractors and their basins, the so called “phase portrait”
asymptotic dynamics of RBN • the state transition rule is such that X(t) determines X(t+1) • since the system has 2N different states, it comes back to a previous state after a “Poincarè time” < 2N time steps • therefore, after a transient < 2N time steps , the system enters a cycle • all the system attractors are cycles; a particular case is that of fixed points, i.e. cycles of length = 1
there are N genes each node is influenced directly by k other genes as we are looking for generic properties, for each node, the k input genes are chosen at random for each node, the boolean function is chosen at random among the set of 2^(2k) possible functions (or among a subset) input output 0000 1 0 0 0001 0 0 1 0010 1 0 1 0011 1 1 0 0100 0 1 0 0101 0 1 0 0110 0 1 0 0111 1 0 1 1000 1 1 0 1001 0 0 1 1010 1 1 1 1011 1 0 1 1100 0 0 0 1101 0 0 1 1110 0 1 0 1111 1 1 0 ensemble properties
studying the ensemble of networks • each network has its own dynamics • dynamical analysis relies upon extensive simulations, starting form random initial conditions • dynamical analysis is performed by varying connections and rules • the main features of the model (qualitative analysis), attractors and basins, are ruled by the degree of connectivity k
high connectivity • if k=N-1, the state at time t+1 is completely uncorrelated to the state at time t • the input to each node is the vector of values of all the other nodes • the output associated to each input set is random • therefore there is no correlation between outputs corresponding to two inputs which differ even by a single bit • there are relatively few cycles • wrt to the total number of states • cycles are long (their period grows as 2bN) • systems are fragile with respect to small changes in initial conditions • nearby initial states go to different attractors • the boundaries of the basins of attraction are highly irregular • analogous to “chaotic behaviour” in continuous dynamical systems
fragility (sensitive dependence on initial conditions) • initial state 111111 -> cycle A • initial state 111110 -> cycle B • almost always, B#A
low connectivity • if k= 2, cycle number scales as N1/2 • cycle length grows as N1/2 • basins are regular: systems starting from two nearby intial states usually evolve to the same attractor • the behaviour is much more regular and ordered than in the k=N-1 case • a phase transition accurs at some k value
connected clusters (high k, interaction with neighbours) oscillating genes constant genes
connected clusters (low k, interaction with neighbours) constant genes oscillating genes
phase transition • the network display a phase transition • by lowering the value of k, the transition takes place when the cluster of non oscillating genes percolates through the network • the boundary between ordered and disordered regimes can be found at different k values, if the set of boolean functions is restricted somehow • e.g. by limiting to canalyzing functions, i.e. those where at least one of the inputs has one values which forces the variable to take a specific value
order for free • scaling laws in the self-organized regime • number of cycles ~Nb (1/2<b<1) • length of cycles ~Nb • the model is consistent with experimental observations over many different phyla • number of cellular types <-> number of different cycles • cell life <-> length of cycles • selection builds upon the network self-organizing properties • the selective advantages of “the edge of chaos”?
warning • the Kauffman model is a highly idealized representation of real genetic / metabolic nets which is based upon several approximations • no chemicals • proteins are fast wrt to the time step • synchronous activation may introduce “spurious cycles” in boolean dynamical systems (cfr. Hopfield nets) • fully random topology, constant k
but • the Kauffman model allows us to address issues which would otherwise be missed, and to develop an appropriate language in which we can frame some key questions • the very existence of self-organizing dynamics in nonlinear genetic networks • the importance of attractors in determining the properties of gene nets • robustness and basins of attraction • the importance of the average degree of connectivity • it also allows us to examine in a new way the interplay between selection and self-organization • the importance of studying ensembles of networks to gain information about their generic properties
continuous or boolean • the intermediate values of gene expression may be due to • intermediate values of the concentration of stimulating factors • time-dependent phenomena (transients, cycles) • the boolean approximation allows one to better elucidate the logic of control, but must be exercised with care • the boolean dynamics may be different from the continuous one
t<0: A=0 continuous t>0: dA/dt = s - kA A(t) = s/k(1-e-kt) boolean: A=0, t<(ln2)/k A=1, t>(ln2)/k constant activation input
generalizing Kauffman • it would then be desirable to have a model where • activations can take continuous values • the “logic of control” is explicit and flexible as in Kauffman • there is an embarras de richesse in model development • we require that the models are true generalizations of the Kauffman RBN • they lead to the same dynamics if the initial activations are boolean
continuous model (Serra & Villani) • let d t be larger than the time required for protein synthesis and degradation (as in Kauffman) • ai = activation of gene i (normalized to [0,1]) • i.e. concentration of the corresponding product • a = [a1, a2 .. aN] • ai(t+1) = s[contri(a(t))] • where • x: s(x) 0 • x 0: s(x) = 0 • x>0, e >0: s(x+e) s(x) • for simplicity: limx->+s(x) = 1 • for the time being, chemicals are not explicitly considered, as in Kauffman
summing over the paths • let us focus upon the interactions among genes • mediated of course by their synthesis products • i.e. consider ai(t+1) = s(contri(a(t)) • the “digital logic” of the genetic switch must be translated into a continuous rule • the transition rule for the activation must take into account contributions from all the combinations of [0,1] values of its inputs • for example, if the rule is an OR, it may receive positive contributions from the combinations of input values (11), (10), (01) (which tend to turn it on) and negative from (00) (which tends to turn it off
evolution laws • a “set of input values” (input set) to gene i, Yi = {yi1, y12} is defined as a given combination of boolean values of its input genes (in our case, 11, 10, 01 or 00) • generalization to K inputs is trivial • the truth table assigns a boolean function (the activation of the gene at the next time step) to each input set • we must define a weight for each input set, and a rule to combine the weights of the different input sets • Q1i set of the input paths to gene “i” which correspond to an updated value “1” • Q0i set of the input paths to gene “i” which correspond to an updated value “0”
weighting an input set (model A) • the weight a should be computed from the activations of the two input genes, i.e. from a1 and a2 • the contribution of the input set (11) may be estimated to be limited by the gene with the smallest activation • a(11) = min(a1, a2) • the contribution of the input set (00) may be estimated to be limited by the gene with the highest activation • a(00) = max(a1,a2) = min(1-a1, 1-a2) • the contribution of (10) and (01) are • a(10) = min(a1, 1-a2) • a(01) = min(1-a1, a2)
the equations of model A • if yij=1, a’(yij) = aj • if yij=0, a’(yij) = 1-aj • the contribution of the whole input set is • a(Yi) = min{a’(yij) } • the contribution to the activation at time t+1 is the weighted sum of those contributions which turn the gene on minus those which turn it off
dynamical properties • let us start from a set of initial activations which belong all to {0,1} • for every gene, there is one input set which has contribution = 1, precisely the one which corresponds to the “right” 0’s and 1’s • all the other input sets provide a vanishing contribution (as there is at least one “1” corresponding to a “0” real value, or a “0” corresponding to a 1, which give a’(yij)=0 • if the output corresponding to the only nonvanishing contribution is 1, then the next state is 1, otherwise it is 0 • therefore the system always remains on the corners of the unit hypercube • and the rule for determining ai(t+1) is the same as that of the original Kauffman model • the model therefore represents a true generalization of the Kauffman model