200 likes | 366 Views
Genomics Quick Start. Mikhail Dvorkin Vladislav Isenbaev Eugene Kapun. Scientific advisors Acad. Konstantin Skryabin, Bioengineering RAS Prof. Anatoly Shalyto, SPbSU ITMO . Collaboration with Bioengineering RAS. Bioengineering RAS Conducts biological experiments Sets problems
E N D
Genomics Quick Start Mikhail DvorkinVladislav IsenbaevEugene Kapun Scientific advisors Acad. Konstantin Skryabin, Bioengineering RASProf. Anatoly Shalyto, SPbSU ITMO
Collaboration with Bioengineering RAS • Bioengineering RAS • Conducts biological experiments • Sets problems • Provides biological data • SPbSU ITMO • Develops algorithms and programs • Started in the end of 2009 • Why us? SPbSU ITMO: Genomics Quick Start
SPbSU ITMO at ACM ICPC We train Zürich ETHMay be, MIT? :-) EugeneKapun VladislavIsenbaev MikhailDvorkin GeorgiyKorneev SPbSU ITMO: Genomics Quick Start
MikhailDvorkin SPbSU ITMO: Genomics Quick Start
EugeneKapun VladislavIsenbaev SPbSU ITMO: Genomics Quick Start
Genome Team Coach • GeorgiyKorneev Members • Mikhail Dvorkin • Vladislav Isenbaev • Eugene Kapun SPbSU ITMO: Genomics Quick Start
Problems Being Solved • DNA assembly de novo based on pair reads • Generalized suffix tree traversal • Reduction to single reads • DNA alignment with transfers SPbSU ITMO: Genomics Quick Start
DNA Assembly 1 Generalized suffix tree traversal
Suffix Tree • Built upon reads • Arc weight: number and quality of reads • Possible extensions • Erroneous nucleotides detection SPbSU ITMO: Genomics Quick Start
Building up a Contig • Start with high-quality read • Use pair reads to select a nucleotide • “Backward” – match the past • “Forward” – match the future • Build up to a branch SPbSU ITMO: Genomics Quick Start
Results • Caenorhabditis elegans • Escherichia coli K-12 SPbSU ITMO: Genomics Quick Start
DNA Assembly 2 Reduction to single reads
Concept • De Bruijn graph with all reads • Pair reads • Path in the graph • Low density – backtracking • Slow – Meet-in-the-middle SPbSU ITMO: Genomics Quick Start
Error detection • Poorly covered vertices • Erroneous • Delete them • Repeat • Paths • Single reads • Use another tool SPbSU ITMO: Genomics Quick Start
Results • 60% erroneous reads detected • < 0.1% errors left after one iteration • 99.5% DNA coverage SPbSU ITMO: Genomics Quick Start
DNA Alignment with transfers
Concept • Parts • Matched (small edit distance) • Unmatched • Swapping allowed • Penalties • Number of parts • Edit distance in matched parts • Length of unmatched parts SPbSU ITMO: Genomics Quick Start
Implementation First DNA • Tear into small pieces • Hash ‘em and store ‘em Second DNA • Tear into small pieces • Look them up • Build them up SPbSU ITMO: Genomics Quick Start
Results SPbSU ITMO: Genomics Quick Start