Pattern storage in gene-protein networks

Ronald Westra Department of Mathematics Maastricht University Pattern storage in gene-protein networks

Items in this Presentation 1. Problem formulation 2. Modeling of gene/proteins interactions 3. Information Processing in Gene-Protein Networks 4. Information Storage in Gene-Protein Networks5. Conclusions

1. Problem formulation How much genome is required for an organism to survive in this World? Some observations ...

Mycoplasma genitalium 500 nm 580 Kbp 477 genes 74% coding DNA Obligatory parasitic endosymbiont Nanoarchaeum equitans 400 nm 460 Kbp 487 ORFs 95% coding DNA Obligatory parasitic endosymbiont SARS CoV 100 nm 30 Kbp 5 ORFs 98% coding DNA Retro virus Minimal genome sizes

Organisms likeMycoplasma genitalium, Nanoarchaeum equitans, and the SARS Corona Virus are able to exhibit a large amount of complex and well-tunedbehavioral patternsdespite an extremely small genome Apattern of behaviourhere is the adequate conditional sequence of responses of the gene-protein interaction network to an external input: light, oxygen-stress, pH, feromones, and numerous organic and anorganic molecules.

Problem formulation Questions: * How do gene-protein networks perform computations and how do they process real time information? * How is informationstored in gene-protein networks? * How doprocessing speed , computation power, andstorage capacityrelate tonetwork properties?

CENTRAL THOUGHT [1] What is the capacity of a gene-protein network to store input-output patterns, where the stimulus is the input and the behaviour is the output. How does the pattern storage capacity of an organism relate to the size of its genome n, and the number of external stimuli m?

CENTRAL THOUGHT [2] Conjecture: The task of reverse engineering a gene regulatory network from a time series of m observations, is actually identical to the task of storing m patterns in that network. In the first case an engineer tries to design a network that fits the observations; in the second case Nature selects those networks/organisms that best perform the input-output mapping.

Requirements For studying the pattern storage capacity of a gene-protein interaction system we need: 1. a suitable parametrized formal model 2. a method for fixing the model parameters with the given set of input-parameters We will visit these items in the following slides ...

2. Modeling the Interactions between Genes and Proteins Prerequisite for the successful reconstruction of gene-protein networks is the way in which the dynamics of their interactions is modeled.

Components in Gene-Protein networks Genes: ON/OFF-switches RNA&Proteins: vectors of information exchange between genes External inputs: interact with higher-order proteins

General state space dynamics The evolution of the n-dimensional state space vector x (gene expressions) depend on p-dim inputs u, parameters θ and Gaussian white noise ξ.

external inputs genes/proteins input-coupling interaction-coupling Example of an general dynamics network topology

The general case is too complex • Strongly dependent on unknown microscopic details • Relevant parameters are unidentified and thus unknown • Therefore approximate interaction potentials and qualitative methods seem appropriate

x : the vector (x1, x2,..., xn) where xi is the relative gene expression of gene ‘í’ u : the vector (u1, u2,..., up) where ui is the value of external input ‘í’ (e.g. a toxic agent) νξ(t) : white Gaussian noise 1. Linear stochastic state-space models FollowingYeung et al. 2003 and others

2. Piecewise Linear Models FollowingMestl, Plahte, Omhold 1995 and others bilsum of step-functions s+,–

3. More complex non-linear interaction models Example: including quadratic terms;

Our mathematical framework for non-linear gene-protein interactions

3. Information processing in sparse Hierarchic gene-protein networks Consider a network as described before with only a few connections (=sparse) and where few genes/proteins control the a considerable amount of the others (=hierarchic)

random sparse network, n=64, k=2 largest cluster therein Information Processing in random sparseGene-Protein Interactions

Information Processing in random sparseGene-Protein Interactions Now consider the information processing time (= #iterations) necesary to reach all nodes (proteins) as a function of: The number of connections (= #non-zero-elements) in the network

phase transition from slow to fast processing

4. Memory storage in gene-protein networks * Ben-Hur, Siegelmann: Computation with Gene Networks, Chaos, January 2004 * Skarda and Freeman: How brains make chaos in order to make sense of the world, Behavioral and brain sciences, Vol. 10 1987 Philosophy: Information is stored in the network topology (weights, sparsity, hierarchy) and the system dynamics

Memory storage in gene-protein networks • We assume a hierarchic, non-symmetric, and sparse gene/protein network (with k out of n possible connections/node) with linear state space dynamics • Suppose we want to store M patterns in the network

Linearized form of a subsystem First order linear approximation of system separates state vector x and inputs u.

input-output pattern: The organism has (evolutionary) learned to react to an external input u (e.g. toxic agent, viral infection) with a gene-protein activity x(t). This combination (x,u) is the input-output PATTERN

Memory Storage = Network Reconstruction Using these definitions it is possible to map the problem of pattern storage to the * solved * problem of gene network reconstruction with sparse estimation

Information Pattern: Now, suppose that we have M patterns we want to store in the network:

Pattern Storage: method 1.0 The relation between the desired patterns (state derivatives, states and inputs) defines constraints on the data matrices A and B, which have to be computed.

Pattern Storage: method 1.0 Computing the optimal A and B for storing the Patterns The matrices A and B, are sparse (most elements are zero): Using optimization techniques from robust/sparse optimization, this problem can be defined as:

1st order phase transition from error-free memory retrieval kC Number of retrieval errors as a function of the number of nonzero entries k, with: M = 150 patterns, N = 50000 genes.

1st order phase transition to error-free memory retrieval kC Number of retrieval errors versus M with fixed N = 50000, k = 10.

Critical number of patterns Mcritversus the problem size N,

Pattern Storage: method 2.0 A pattern corresponds to a converged state of the system hence: Therefore a sparse system ∑ = {A,B} is sought that maps the inputs to the patterns {U,X}, which leads to:

Computing optimal sparse matrices • LP: • subject to: • condition for stationary equilibrium: • condition to avoidA = B = 0: • avoidA = 0by using degradation of proteins and auto-decay of genes:diag(A) < 0

The sparsity in A and B The sparsity in the gene/protein interaction matrix A is kA : the number of non-zero elements in A This can be scaled to the size of A: N, and we obtain: pA = kA/N, Similarly for the input-coupling B:pB = kB/P.

A gene-gene B input-gene B A Results:

A gene-gene B input-gene B A

gene-gene input-gene A B sparsity versus the number of stored patterns There are three distinct regions with different ‘learning’ strategies separated by order transitions

gene-gene input-gene A B sparsity versus the number of stored patterns Region I : all information is exclusively stored in B. Region III : no clear preference for A or B, Highest ‘order’. Highest ‘disorder’. Region II : information is preferably stored in A.

gene-gene input-gene A B sparsity versus the number of stored patterns I : ‘impulsive’ III : ‘hybrid’. II : ‘rational’

Phase transitions and entropy The entropy of the macroscopic system relates to the relative fraction of connections pA and pB as: As A and B are indiscernible the total entropy is:

Information entropy The entropy of the microscopic systemArelates to the degree distribution: the number of connections fi of node i . Let P(v) be the probability that a given node has voutgoing connections: and

Information entropy [2] With Pthe Laplacedistributionfor large networks the average entropy per node converges to: With Euler's constant.

Information gain per node This also allows the computation of the gain in information entropy if one connection is added: Information gain per node If this formalism is applied to our network structure we obtain:

Information gain per node Again the three learning strategies are clearly visible {impulsive, rational, hybrid} Left: the entropy S versus for n=100, p=30, based on 1180 observations, Right: the gain in entropy for the same data set.

Relation between sparsities Relation between pA = kA/n and pB = kB/p averaged for 10116 measurements. .

5. Conclusions • Non-linear time-invariant state space models for gene-protein networks exhibit a range of complex behaviours for storing input-output patterns in sparse representations. • In this model information processing (=computing) and pattern storage (=learning) exhibit multiple distinct 1st and 2nd order continuous phase transitions • There are two second-order phase transitions that divide the network learning in three distinct regions, ‘impulsive’, ‘rational’, ‘hybrid’.

Other members of trans-national University Limburg -Bioinformatics Research Team University of Hasselt (Belgium): • Goele Hollanders (PhD student) • Geert Jan Bex • Marc Gyssens University of Maastricht (Netherlands): • Stef Zeemering (PhD student) • Karl Tuyls • Ralf Peeters

Pattern storage in gene-protein networks

Pattern storage in gene-protein networks

Presentation Transcript

Protein, Gene Networks recountrustion and systhetic networks

PROTEIN BASED STORAGE

Gene to Protein

Gene Expression: From Gene to Protein

Protein Networks

Protein Networks / Protein Complexes

Pattern databases in protein analysis

Protein-Protein Interactions Networks

Hox Gene Expression Pattern

Storage Networks

Protein Networks / Protein Complexes

Protein networks

Gene Expression: From Gene to Protein

c-Myc Regulated Functional Gene and Protein Networks Involved in Tumourigenesis

PROTEIN PATTERN DATABASES

CyberBridges Protein Pattern Discovery

Gene Expression From gene to protein

Gene Expression: From Gene to Protein

Gene Expression: From Gene to Protein

Gene expression From Gene to Protein

GENE to PROTEIN

From Gene to Protein