210 likes | 337 Views
The 7 Bridges in K ö nigsberg and Compositional Representation of Protein Sequences. Bailin Hao ( 郝柏林 ) (ITP & BGI, CAS ) Huimin Xie ( 谢惠民 ) ( Math Dept. Suzhou U ) Shuyu Zhang ( 张淑誉 ) (IP. Acad. Sinica). Compositional Approach in Prokaryote Phylogeny .
E N D
The 7 Bridges in Königsberg and Compositional Representation of Protein Sequences Bailin Hao (郝柏林) (ITP & BGI, CAS ) Huimin Xie (谢惠民) (Math Dept. Suzhou U) Shuyu Zhang (张淑誉) (IP. Acad. Sinica)
Compositional Approach inProkaryote Phylogeny • Justification of using K-tuples instead of primary protein sequences. • Problem of uniqueness of reconstruction of protein sequence from its constituent K-tuples. • Picking up a special class of proteins without biological knowledge.
7 bridges in Königsberg Euler (1736) 4 odd nodes: No! Dénes König, Theory of Finite and Infinite Graphs 1st ed.(1932). Birkhaüser (1990) “ From Königsberg to König’s book, So runs the graphic tale…’’
Basic Notions • A Graph G=(V, E), where V is a set of nodes (vertices), E is a set of edges (bonds) • Edges: undirected (u,v)=(v,u), uand v adjacent; directed (u,v) differs from (v,u), u incident to v. • A weight may beassociated with (u,v): cost, distance, transfer function, reaction rate, etc. • Eulerian graph: each edge appears once and only once in a path • Hamiltonian graph: each vertex appears once and only once in a path. Hamiltonian cycle of minimal weight --- Travel Salesman Problem (TSP)
An Euler path: An Euler loop: Euler grahp: loop Semi-Euler praph: path, no loop Problem of Eulerian loop: simple, known solutions Problem of Hamiltonian paths: much harder
Hamiltonian Loops: much harder No! 10 nodes 15 arcs di=3 nodes Traveling Salesman Problem NP-hard problems Yes!
Graph = nodes + arcs Directed , Labeled arcs and nodes Simple graph: No rings at nodes: No repeated arcs: i i j
Indegree din(i) Outdegree dout(i) Euler graph: din(i) = dout(i) dii
Simple Euler GraphDiagonal matrix:M=diag( d1, d2, … dn )Adjacent matrix:A={aij} aij= aii= 0Kirchhoff matrix:C=M-ACij= Cij=0 det(C)=0All minors of C are equal. Denote this common minor by 1 n i,j=0 0 i j
Number of Euler loops in simple Euler Graph N G de Bruijn T van Aardenrie Ehrenfest C A B Smith W T Tuite BEST Theorem e (G) = (di-1)! i
i Number of Eulerian loops in general Eular G. some aii0rings some aij>1parallel arcs Putting auxiliary nodes on these rings and parallel arcs makes the graph simple.
No need to work with bigger A matrix. Just let some aii0, aij>1 in original A. Eliminate redundancy caused by unlebeled arcs. Modified BEST Theorem: e(G) = (di-1)! i aij! ij
MALS K=5 ALSL LSLF SLFT LFTV FTVG ANPA_PSEAM 82AA MALSLFTVGQLIFLFWTMRITEASPDPAAKAAPAAAAAPAAAAPDTASDAAAAAALTAANAKAAAELTAANAAAAAAATARG TVGQ VGQL AKAA
6 rings ANPA_PSEAM 82AA Antifreeze protein A/B precursor in winter flaunder Alanine-rich Amphiphilic auxiliary arc
From pdb.seq-a special selection of SWISSPROT2821-1=2820 proteins ( May 2000 )R—number of reconstructed AA sequences from a given protein decomposition
Compositional Representation of Proteins K L= -k+1 K M i j i=1 j=1 The collection {W }or {W ,n j}may be used as an equivalent representation of the original protein sequence. A seemingly trivial result upon further reflection: random AA sequences have unique reconstruction as well. Compositional Representation works equally for random AA sequences and most of protein sequences. A given realization of a short random AA sequence is as specific as a real protein sequence.
Nucleotide correlations in DNA/RNA Much studied K=2 correlation functions 16 9 6 See Wentian Li, Computer Chem. 21(1997) 257-271. Amino Acid correlations in Proteins Almost no study Hard to comprehend 400 correlation functions at K=2 Proteins too short to define correlation functions One should approach the problem from a more deterministic point of view Repeated AA segments in proteins are strong manifestation of correlations!
On-going study: the other extreme Quit a few proteins have an enormous number of reconstructions. Transmembrane Antifreeze Fibrous: collagens Coarse-graining: closer to biology by reducing the number of AAs
Preprint: NSF – ITP – 01 – 018 LANL E-archive physics/0103028 arxiv.org or cn.arxiv.org Cross-referenced in q-bio since 15 Sept 2003