110 likes | 273 Views
Short read alignment. BNFO 601. Short read alignment. Input: Reads: short DNA sequences usually up to 100 base pairs (bp) produced by a sequencing machine Reads are fragments of a longer DNA sequence present in the sample given as input to the machine Usually number in the millions
E N D
Short read alignment BNFO 601
Short read alignment • Input: • Reads: short DNA sequences usually up to 100 base pairs (bp) produced by a sequencing machine • Reads are fragments of a longer DNA sequence present in the sample given as input to the machine • Usually number in the millions • Genome sequence: a reference DNA sequence much longer than the read length
Short read alignment • Applications • Genome assembly • RNA splicing studies • Gene expression studies • Discovery of new genes • Discovering of cancer causing mutations
Short read alignment • Two approaches • Hashing based algorithms • BFAST • SHRIMP • MAQ • STAMPY (statistical alignment) • Burrows Wheeler transform • Bowtie • BWA
BFAST overview PLoS ONE 4(11): e7767.
BFAST algorithm PLoS ONE 4(11): e7767.
Short read alignment Empirical performance: • Simulated data: • Extract random substrings of fixed length with random mutations and gaps • Realign back to reference genome • Real data: • Paired reads: two ends of the same molecule • Count number of paired reads within 500 to 10000 bases of each other
Short read alignment Courtesy of Genome Res. June 2011 21: 936-939;
Short read alignment Courtesy of Genome Res. June 2011 21: 936-939;