1 / 14

2009-09-10

InCoB 2009. MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads. Hua Bao Sun Yat-sen University, Guangzhou, China Evolution.sysu.edu.cn. 2009-09-10. Next-generation sequencing. High-throughput (tens of millions reads per lane)

vadin
Download Presentation

2009-09-10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. InCoB 2009 MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads Hua Bao Sun Yat-sen University, Guangzhou, China Evolution.sysu.edu.cn 2009-09-10

  2. Next-generation sequencing • High-throughput (tens of millions reads per lane) • Read length is short (25-50bp) • Sequencing error rate is relatively higher than Sanger sequencing • Applications: genome sequencing, transcriptome sequencing, pooled population sequencing

  3. The objective 1. Unspliced alignment of reads onto the genome 2. Spliced alignment of transcript reads over exon-intron boundaries 3. SNP detection from population sequences

  4. Seed hash table Read 1 TACACCACGGTCAGACTTGCATCACAACTGTTAAGC AGACTTGCATCACAACTGTTAAGCTACACCACGGTC Read 2 Read n … … Seed hash table TACACCACGGTC Position 1, Read 1, + ;Position 25, Read 2,+;… GACCGTGGTGTA Position 1, Read 1, - ; Position 25, Read 2,-;… AGACTTGCATCA Position 13, Read 1, + ;Position 1, Read 2, +; … TGATGCAAGTCT Position 25, Read 1, -; Position 13, Read 2,-; … Other seed (K-mer) … …

  5. Coding A: 0 T: 1G: 2C: 3 k-mer CCGATT key = 3*45 + 3*44 + 2*43 + 0*42 + 1*41 +1*40 Seed hash table Reads Seed hash table [0] (read id, position, strand) [1] [2] [..] [n] (1,1,+) (2,13,-) … [0] Read sequence [1] CCGATTGGCTAAA … [2] [..] [n] Key=n Key computation of the seed

  6. Unspliced alignment Seed hash table Reads [0] (read id, position,strand) [1] [2] [3] (1,1,+) (2,13,-) … [n] [0] Read sequence [1] [2] [3] [n] O(1) Extension Key=3 Genome TACACCACGGTCAGACTTGCATCA … K-mer:8-12bp Step-size: 1bp

  7. Spliced alignment Seed hit list Hash table Reads [0] Read sequence [1] TACACCACG … [2] [n] [0] (read id, posi,strand) [1] [2] (1,H,+) (2,T,-) … [n] [0] (Genome posi, read posi, strand) [1] (1,H,+) (780,T,+) … [2] (1,T,-) … TACACCACGGTCAGAGTGCCATGGCTAGT TACACCACGGTCAGAgtac … ccagGTGCCATGGCTAGT 1 780 O(1) Key=2 Genome TACACCACGGTCAGACTTGCATCA … K-mer:6-10bp Step-size: 1bp

  8. Accuracy of alignment A total of 1893118 reads (35bp length, 134274 spliced and 1758844 unspliced) from 5796 coding DNA sequences of chromosome I of Arabidopsis thaliana for the query dataset were simulated.

  9. SNP detection from population sequences … TACACACGGTCAGACTAGCATCAGTCCGTAATGCT … CACGGTCAGACGAGCATCAGTCC CACACGGTCAGACGAGCATCAGT GGTCAGACGAGCATCAGTCCGTA CAGACTAGCATCAGTCCGTAATG CACACGGTCAGACTAGCATCAGT GGTCAGACTAGCATCAGACCGTA GGTCAGACTAGCATCAGTCCGTA CGGTCAGACTAGCATCAGTCCG Quality control:minimum quality score (MQS), minimum neighbour quality score (MNQS) Significance control:minimum coverage (MC),minimum minor allele frequency (MMAF)

  10. Clustered short reads N Reads that passed QC? Y N Polymorphism sites are covered by MC number of reads? Y The frequency of minor allele is higher than MMAF? N Y Candidate SNPs SNP detection from population sequences

  11. Accuracy of SNP detection from population sequencing There were 2162 true SNPs in 50 individuals (haploid) in our simulation. Coverage equals sequencing depth per individual. MQV, MNQV, MMAF and MC were set at 25, 20, 0.01 and 50 (1X per individual), respectively.

  12. Accuracy of MAF estimation from population sequencing 0.48 0.44 0.40 0.36 0.32 0.28 Estimated minor allele frequency 0.24 0.20 0.16 0.12 0.08 0.04 0.00 0.00 0.06 0.12 0.18 0.24 0.30 0.36 0.42 0.48 Real minor allele frequency

  13. Summary 1. MapNext supports both spliced and unspliced alignments of the short reads. And for spliced alignments, a training process is not needed. 2. MapNext can detect SNPs and estimate minor allele frequency from population sequences.

  14. MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads Thank you! 2009-09-10

More Related