1 / 21

The 7 Bridges in K ö nigsberg and Compositional Representation of Protein Sequences

The 7 Bridges in K ö nigsberg and Compositional Representation of Protein Sequences. Bailin Hao ( 郝柏林 ) (ITP & BGI, CAS ) Huimin Xie ( 谢惠民 ) ( Math Dept. Suzhou U ) Shuyu Zhang ( 张淑誉 ) (IP. Acad. Sinica). Compositional Approach in Prokaryote Phylogeny .

lucinda
Download Presentation

The 7 Bridges in K ö nigsberg and Compositional Representation of Protein Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The 7 Bridges in Königsberg and Compositional Representation of Protein Sequences Bailin Hao (郝柏林) (ITP & BGI, CAS ) Huimin Xie (谢惠民) (Math Dept. Suzhou U) Shuyu Zhang (张淑誉) (IP. Acad. Sinica)

  2. Compositional Approach inProkaryote Phylogeny • Justification of using K-tuples instead of primary protein sequences. • Problem of uniqueness of reconstruction of protein sequence from its constituent K-tuples. • Picking up a special class of proteins without biological knowledge.

  3. 7 bridges in Königsberg Euler (1736) 4 odd nodes: No! Dénes König, Theory of Finite and Infinite Graphs 1st ed.(1932). Birkhaüser (1990) “ From Königsberg to König’s book, So runs the graphic tale…’’

  4. Basic Notions • A Graph G=(V, E), where V is a set of nodes (vertices), E is a set of edges (bonds) • Edges: undirected (u,v)=(v,u), uand v adjacent; directed (u,v) differs from (v,u), u incident to v. • A weight may beassociated with (u,v): cost, distance, transfer function, reaction rate, etc. • Eulerian graph: each edge appears once and only once in a path • Hamiltonian graph: each vertex appears once and only once in a path. Hamiltonian cycle of minimal weight --- Travel Salesman Problem (TSP)

  5. An Euler path: An Euler loop: Euler grahp: loop Semi-Euler praph: path, no loop Problem of Eulerian loop: simple, known solutions Problem of Hamiltonian paths: much harder

  6. Hamiltonian Loops: much harder No! 10 nodes 15 arcs di=3 nodes Traveling Salesman Problem NP-hard problems Yes!

  7. Graph = nodes + arcs Directed , Labeled arcs and nodes Simple graph: No rings at nodes: No repeated arcs: i i j

  8. Indegree din(i) Outdegree dout(i) Euler graph: din(i) = dout(i)  dii

  9. Simple Euler GraphDiagonal matrix:M=diag( d1, d2, … dn )Adjacent matrix:A={aij} aij= aii= 0Kirchhoff matrix:C=M-ACij=  Cij=0 det(C)=0All minors of C are equal. Denote this common minor by 1 n i,j=0 0 i j

  10. Number of Euler loops in simple Euler Graph N G de Bruijn T van Aardenrie Ehrenfest C A B Smith W T Tuite BEST Theorem e (G) = (di-1)! i

  11. i Number of Eulerian loops in general Eular G. some aii0rings some aij>1parallel arcs Putting auxiliary nodes on these rings and parallel arcs makes the graph simple.

  12. No need to work with bigger A matrix. Just let some aii0, aij>1 in original A. Eliminate redundancy caused by unlebeled arcs. Modified BEST Theorem: e(G) = (di-1)! i  aij! ij

  13. MALS K=5 ALSL LSLF SLFT LFTV FTVG ANPA_PSEAM 82AA MALSLFTVGQLIFLFWTMRITEASPDPAAKAAPAAAAAPAAAAPDTASDAAAAAALTAANAKAAAELTAANAAAAAAATARG TVGQ VGQL AKAA

  14. 6 rings ANPA_PSEAM 82AA Antifreeze protein A/B precursor in winter flaunder Alanine-rich Amphiphilic auxiliary arc

  15. From pdb.seq-a special selection of SWISSPROT2821-1=2820 proteins ( May 2000 )R—number of reconstructed AA sequences from a given protein decomposition

  16. Compositional Representation of Proteins K L= -k+1 K M i j i=1 j=1 The collection {W }or {W ,n j}may be used as an equivalent representation of the original protein sequence. A seemingly trivial result upon further reflection: random AA sequences have unique reconstruction as well. Compositional Representation works equally for random AA sequences and most of protein sequences. A given realization of a short random AA sequence is as specific as a real protein sequence.

  17. Nucleotide correlations in DNA/RNA Much studied K=2 correlation functions 16 9 6 See Wentian Li, Computer Chem. 21(1997) 257-271. Amino Acid correlations in Proteins Almost no study Hard to comprehend 400 correlation functions at K=2 Proteins too short to define correlation functions One should approach the problem from a more deterministic point of view Repeated AA segments in proteins are strong manifestation of correlations!

  18. On-going study: the other extreme Quit a few proteins have an enormous number of reconstructions. Transmembrane Antifreeze Fibrous: collagens Coarse-graining: closer to biology by reducing the number of AAs

  19. Preprint: NSF – ITP – 01 – 018 LANL E-archive physics/0103028 arxiv.org or cn.arxiv.org Cross-referenced in q-bio since 15 Sept 2003

More Related