1 / 34

Unraveling the Protein Universe: Power Laws and Scale-Free Networks in Genome Evolution

Explore the distribution of protein sequences and folds in the Protein Universe through the lens of power laws and scale-free networks. Discover how evolution shapes the structure of genomes and networks, paving the way for future genomics advancements.

saiz
Download Presentation

Unraveling the Protein Universe: Power Laws and Scale-Free Networks in Genome Evolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Power laws, scalefree networks, the structure of the Protein Universe and genome evolution Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970)

  2. The Protein Universe Total number of potential protein sequences - ~20200 Total number of existing protein sequences: 1010-1011 GenBank2002: ~106 What is the distribution of these sequences in the sequence and structure spaces?

  3. The distribution of folds by the number of families in the protein structural database (PDB).

  4. There are many folds with 1-3 families but only a few folds with numerous families Altogether, there might be as many as 5,000-10,000 folds but >90% of the families belong to <1,000 common folds Mapping the Protein Universe is feasible!

  5. Thermotoga maritima Size distributions of domain families in two genomes - 2-log plot C. elegans

  6. The size distributions of folds and families are approximated by a power law: f(i) ~ i-k (k ~1-3) Power laws describe distributions of a number of quantities in biological and other contexts, e.g., the node degrees (number of connections) in metabolic and protein interactions networks, the Internet and social networks, citations of scientific papers, population of cities, personal wealth… Networks described by power laws are known as scale-free - they look the same at different scales. The existence of a small number of highly connected nodes (hubs) in scalefree networks determines their small-world properties and error tolerance

  7. Scale-free networks evolve through preferential attachment: the rich get richer or the fit get fitter

  8. Br Br Br Br Br Domain accretion in the evolution of orthologous sets of eukaryotic genes C1 Yeasts C2 Zk C. elegans C1 C3 C2 Zk A. thaliana C1 C3 C2 Ub Zk C1 D. melanogaster C3 C2 Zk

  9. Distribution of proteins by the number of domains follows is exponential! (if repeats do not count) However, we get a power law if repeats are included

  10. Domain connectivity network

  11. The domain connectivity graph is roughly approximated by a power law

  12. Evolution of protein domain families in genomes can be described by simple models which involve domain birth, death and innovation (“invention”) as elementary events

  13. Birth Death Innovation BDIM: elementary events BDIM – Birth, Death and Innovation Model

  14. per-family birth rate domain domain family family n l1 l2 l3 li-1 li lN-1 … … d1 d2 d3 d4 di di+1 dN 1 d1 2 d2 3 d3 i di N dN per-family death rate size class BDIM: the layout of the model innovation rate number of families in a size class maximum number of domains in a family

  15. innovation (instead of "class 0" birth) rate of change for di Gain: birth in class i-1 df1(t)/dt = -1f1-1f1+2f2 … Loss: birth in class i … dfN(t)/dt = N-1fN-1-NfN Gain: death in class i+1 no birth into and death from class N+1 Loss: death in class i F(t) = fi(t) - the total number of families BDIM: the basic equations dfi(t)/dt = i-1fi-1-ifi-ifi+i+1fi+1

  16. asymptote (k = a-b-1) approximation Power Approximation vs Power Asymptote under thelinear BDIM Linear BDIM

  17. Linear BDIM: Size Does Matter? li/i = l(1+a1/i) per domain birth rate di/i = d(1+b1/i) per domain death rate di/i li/i i Family size

  18. Conclusions • The world, including biology, is full of power law distributions and scalefree networks • The emergence of these seems to be explained by relatively simple evolutionary models

  19. Tomorrow?? Genomics today “There are two kinds of science: physics and stamp collection” Attributed to Ernest Rutherford

More Related