100 likes | 357 Views
Analysis of DNA Sequence Alignment Tools. John Dorband , Yaacov Yesha , and Ashwin Ganesan. Project Goal.
E N D
Analysis of DNA Sequence Alignment Tools John Dorband, YaacovYesha, and AshwinGanesan
Project Goal • The goal of our project is analyzing DNA sequence alignment tools, such as SHRiMP [1], Bowtie [2], BWA [3], and BFAST [4], explaining why different tools produce different results, and finding ways of improving the tools.
Alignment of Short Reads • A common task is aligning short reads of DNA to a reference genome (database). • A common technique used by DNA alignment tools is creating a searchable index.
Transitions Vs. Transversions • As mentioned in [5], transition mutations (AG and CT) have higher probability than transversion mutations (other subsitutions). [5] utilized this facts for improving DNA alignment. • We introduced the following technique: In situations where mutation rate is suffiently high compared with sequencing error rate, use different penalties for transition mismatches and tranversionmismathces, in algorithms, such as those used in Bowtie [2] and BWA [3], that are related to the Burrows Wheeler transform [6]. • We plan to test our technique.
Comparing DNA Alignment Tools • Our work also includes comparing several DNA alignment tools. • We compared Bowtie and SHRiMP, and found out that SHRiMP mapped 74.18%, while Bowtie mapped 35.79%. • We plan to use simulated data, as was used in [5], in order to compare sensitivity and specificity of different DNA alignment tools.
A Performance Issue • At IGS it was found that BWA was performing an enormous number of opens and closes of files, which resulted in extremely poor performance • We analysed the problem and concluded that this is likely caused by file locks by the system • We recommend that the BWA code be checked and likely modified in order to eliminate this problem
Polymorphism • One claimed strength of SHRiMP [3] is handling substantial polymorphism. • We plan on using simulated test data that will include substantial polymorphism in addition to sequencing errors. • We plan to run SHRiMP and also other mapping tools on that data and compare sensitivity and specificity.
References [1] Stephen M. Rumble, Phil Lacroute, Adrian V. Dalca1, Marc Fiume, ArendSidow, Michael Brudno, SHRiMP: Accurate Mapping of Short Color-space Reads, PLoS Computational Biology, May 2009. [2] Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biology 2009. [3] Heng Li and Richard Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform, Bioinformatics (2009).
References (continued) [4] Nils Homer, Barry Merriman, Stanley F. Nelson, BFAST: An Alignment Tool for Large Scale Genome Resequencing, PLoS ONE, 2009. [5] Laurent Noé* and Gregory Kucherov, Improved hit criteria for DNA local alignment, BMC Bioinformatics 2004, 5:149. [6] M. Burrows and D.J. Wheeler, A Block-sorting Lossless Data Compression Algorithm, SRC Research Report 124, May 10, 1994, digital, Systems Research Center, 130 Lytton Avenue, Palo Alto, California 94301,