190 likes | 291 Views
Comput ati onal Biology Applications. Bartosz Nowierski Poznań Univeristy of Technology. Laboratory of Bioinformatics (1). Institute of Bioorganical Chemistry (Polish Academy of Sciences) Founded: 1.11.1999 Members: Prof. Dr. Habil. Jacek Błażewicz - director Ph.D. Piotr Formanowicz
E N D
Computational Biology Applications Bartosz Nowierski Poznań Univeristy of Technology
Laboratory of Bioinformatics (1) • Institute of Bioorganical Chemistry (Polish Academy of Sciences) • Founded: 1.11.1999 • Members: • Prof. Dr. Habil. Jacek Błażewicz - director • Ph.D. Piotr Formanowicz • Ph.D. Marta Kasprzak • M.Sc. Marcin Jaroszewski • M.Sc. Piotr Łukasiak • M.Sc. Piotr Wierzejewski ASM Team
Laboratory of Bioinformatics (2) • Basic research area: • algorithms for sequencing by hybridisation • analysis of DNA graphs • analysis of NMR spectra for RNA chains • restriction map constraction • constracting phylogenetic trees • constructing bio-server for selected problems of computional biology • prediction of protein secondary structures • DNA sequence assembly • Basic research area: • algorithms for sequencing by hybridisation • analysis of DNA graphs • analysis of NMR spectra for RNA chains • restriction map constraction • constracting phylogenetic trees • constructing bio-server for selected problems of computional biology • prediction of protein secondary structures • DNA sequence assembly ASM Team
Laboratory of Bioinformatics (3) • International cooperation: • Universite LeHavre, France • Max Planck Institutes, Germany • TmBioscience, Canada • Rutgers University, USA • TU Clausthal, Germany • RIKEN, Japan ASM Team
Applications • DNA sequence assembly • Prediction of protein secondary structures • Constracting phylogenetic trees • Motivation: • popular subject • great demend • faster => distributed ASM Team
ACCGT ACCGT CGTGC CGTGC TTACC TACCGT TTACC TACCGT TTACCGTGC DNA Sequence AssemblyProblem specification • Alphabet = {A, C, G, T} • Problem: ASM Team
(5, 1) (6, 1) DNA Sequence AssemblyOverlap graph • input sequence vertex • overlap of sequences arc • shift • weight ACTGCCTA CTAGGATC TCAAGA ASM Team
(3, 1) (3, 1) (7, 1) (7, 1) (6, 1) (2, 1) (2, 2) (3, 1) (3, 1) (4, 1) (4, 2) DNA Sequence AssemblyRedundant arcs • Arc deletion ATGACTACT GACTACTGA ACTGAATCA 2+4 = 6 ASM Team
DNA Sequence AssemblyHamiltionan path with max. weight • NP-hard problem => heuristic • Selection of first element: unatractive successor of any vertex • Selection of next elements: atractive succesor, but not attractive to others ASM Team
DNA Sequence AssemblyParallelization • Overlaps – distrbute set of sequences • Arc reduction – distrib. set of vertices • First vertex – distribute set of vertices • Next vertices ASM Team
Example: VASYDYLVIGGGSGG...VAIHPTSSEELVTLR XEEXXEEEEXXXHHH...XXXXXXXHHHHHXXX Protein Secondary StructuresProblem specification aminoacid secondary structure {A,C,D,E,F,G,H,I,K,....} {H,E,X} ASM Team
x-3 x-2 x-1 x0 x1 x2 x3 x-3 x-2 x-1 x0 x1 x2 x3 1.1 -0.7 1.2 -0.5 0.1 -1.1 2.1 1.1 -0.7 1.2 -0.5 0.1 -1.1 2.1 1 0 1 0 0 0 1 RULES H: x-3<0.7 x-1>0.1 x2<-0.5 E: x-1>0.3 x0>-1.5 x2<0.2 x3>1.2 RULES H: x-3=0 x-1=1 x2=0 E: x-1=1 x0=0 x2=0 x3=1 E E Protein Secondary StructuresRule usage • Logical Analysis of Data approach VASYDYLVIGGGSGG VASYDYLVIGGGSGG ASM Team
Rule generation scenario aminoacid sequences rules secondary structures Protein Secondary StructuresRule generation • Good rule properties • rule says e.g. H it must be right • rule says e.g. not H it should be right • The best rules if 1 variable is out, it’s not good anymore ASM Team
Protein Secondary StructuresRule generation - algorithm • Algorithm generate all reasonable 0-1 arrays • Clasifier generation • set of rules (logical OR) • mathematical formula • Parallelization division of array space ASM Team
Fitch( ) = sth. small human monkey iguana snake Fitch( ) = sth. big human iguana monkey snake Phylogenetic TreesProblem specification ASM Team
4 species T1 T2 T3 T4 T5 T6 T7 T8 ….. 5 species ….. Phylogenetic TreesAlgorithm • Branch & Bound • Parallelization • distribution of subtrees • exchange of information about solutions ASM Team
Usage of GridLab (1) • Resource management (GRMS): • assignment of resources • application structure (master-slave) • checkpointing & migration • dynamic assignment of new resources • frameworks • distributed shared memory (?) ASM Team
Usage of GridLab (2) • Monitoring: • dynamic link/processor states (tuning) • estimation of end time (GRMS, end user) • Others: • visualisation • security • mobile users ASM Team
Contact • Director Prof. Dr. Habil. Jacek Błażewicz Jacek.Blazewicz@cs.put.poznan.pl • DNA Sequence Assembly B.Sc. Bartosz Nowierski Bartosz.Nowierski@cs.put.poznan.pl • Protein Secondary Structures M.Sc. Piotr Łukasiak Piotr.Lukasiak@cs.put.poznan.pl • Phylogenetic Trees Ph.D. Piotr Formanowicz Piotr.Formanowicz@cs.put.poznan.pl ASM Team