140 likes | 319 Views
CSC Conference 2.6.2010 Next generation sequencing data analysis. Assembling the Glanville fritillary genome. Panu Somervuo University of Helsinki MRG group & DNA sequencing and genomics lab. Next generation sequencing. Roche 454 Illumina Solexa ABI SOLiD. Assembly pipeline. Newbler
E N D
CSC Conference 2.6.2010 Next generation sequencing data analysis Assembling the Glanville fritillary genome Panu Somervuo University of Helsinki MRG group & DNA sequencing and genomics lab
Next generation sequencing • Roche 454 • Illumina Solexa • ABI SOLiD
Assembly pipeline Newbler 320Mbp 220K contigs N50: 1700nt • 454 • 10M single reads 400bp • Illumina Solexa • 52M 2*101 pairend (insertsize 600bp) • 102M 2*76 pairend (insertsize 600bp) • error correction, soap denovo scaffolds 2M 2*75 matepairs, span 1500 at every 25bp • SOLiD • 420M 2*50 matepairs (insertsize 1Kbp)filtering 96M • EST • 26K 27M unique mapping SOLiD: 40K scaffolds
Assembly validation 1: contigs vs nr contig BLASTXhits top5 contig00008 216 Bombyx mori (domestic silkworm), Bombyx mori (domestic silkworm), Aedes aegypti (Stegomyia aegypti), Nasonia vitripennis (jewel wasp), Nasonia vitripennis (jewel wasp) contig00077 2 Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid) contig00084 63 Apis mellifera (honey bee), Forficula auricularia (European earwig), Forficula auricularia (European earwig), Forficula auricularia (European earwig), Forficula auricularia (European earwig) contig00094 2 Tribolium castaneum (red flour beetle), Apis mellifera (honey bee) contig00198 203 Tribolium castaneum (red flour beetle), Tribolium castaneum (red flour beetle), Nasonia vitripennis (jewel wasp), Pediculus humanus corporis (human body louse), Apis mellifera (honey bee) contig00208 68 Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid), Acyrthosiphon pisum (pea aphid), Tribolium castaneum (red flour beetle), Strongylocentrotus purpuratus contig00216 163 Pediculus humanus corporis (human body louse), Culex quinquefasciatus (southern house mosquito), Aedes aegypti (Stegomyia aegypti), Culex quinquefasciatus (southern house mosquito), Tribolium castaneum (red flour beetle) contig00229 39 Tribolium castaneum (red flour beetle), Culex quinquefasciatus (southern house mosquito), Pediculus humanus corporis (human body louse), Apis mellifera (honey bee), Drosophila pseudoobscura pseudoobscura contig00251 76 Acyrthosiphon pisum (pea aphid), Pediculus humanus corporis (human body louse), Nematostella vectensis (starlet sea anemone), Strongylocentrotus purpuratus, Strongylocentrotus purpuratus contig00278 90 Aedes aegypti (Stegomyia aegypti), Anopheles gambiae str. PEST, Nasonia vitripennis (jewel wasp), Drosophila willistoni, Drosophila virilis contig00279 43 Bombyx mori (domestic silkworm), Culex quinquefasciatus (southern house mosquito), Culex quinquefasciatus (southern house mosquito), Anopheles gambiae str. PEST, Tribolium castaneum (red flour beetle) contig00302 250 Acyrthosiphon pisum (pea aphid), Salmo salar (Atlantic salmon), Branchiostoma floridae (Florida lancelet), Ciona intestinalis, Ciona intestinalis contig00310 26 Tribolium castaneum (red flour beetle), Acyrthosiphon pisum (pea aphid), Nasonia vitripennis (jewel wasp), Aedes aegypti (Stegomyia aegypti), Aedes aegypti (Stegomyia aegypti) contig00321 218 Acyrthosiphon pisum (pea aphid), Aedes aegypti (Stegomyia aegypti), Aedes aegypti (Stegomyia aegypti), Tribolium castaneum (red flour beetle), Culex quinquefasciatus (southern house mosquito) contig00471 91 Drosophila virilis, Drosophila mojavensis, Drosophila ananassae, Drosophila yakuba, Drosophila grimshawi contig00507 3 Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer) contig00525 250 Bombyx mori (domestic silkworm), Nasonia vitripennis (jewel wasp), Aedes aegypti (Stegomyia aegypti), Apis mellifera (honey bee), Apis mellifera (honey bee) contig00533 8 Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer), Ostrinia nubilalis (European corn borer), Bombyx mori (domestic silkworm), Strongylocentrotus purpuratus
Assembly validation 2: Genomic contigs vs EST contigs 52 13
rev_contig310 1 --TTCAGAGAAACAAGTGAATTGAAATTTGATTATTTAtTTTCGTTTCAG 48 |||||||||||||||.|||||||||||||||||||||||||||||.|| contig402106 1 TTTTCAGAGAAACAAGTAAATTGAAATTTGATTATTTATTTtCGTTTTAG 50 rev_contig310 49 TATGAAGCAGCAGCGAGAGGTGCAGAAGCACTTGGAAACAGATATGGTAC 98 |||||||||||.||||||||||||||||||||||||||.||||||||||| contig402106 51 TATGAAGCAGCCGCGAGAGGTGCAGAAGCACTTGGAAAAAGATATGGTAC 100 rev_contig310 99 AAAtTATAGAGTAGGAGtTGCCGCAGATATTCtTTGTAAGtTGTTTTTTT 148 |||||||||||||||||||||||||||||||||||||||||||||||||| contig402106 101 AAATTATAGAGTAGGAGTTGCCGCAGATATTCTTTGTAAGTTGTTTTTTT 150 rev_contig310 149 AATCAGTTTAGCtTGCAGCtTTAAGACTATTATTATATATTTTTTTATCG 198 ||||.|||||.||||||||||||||||||||||||||| ||||||||||| contig402106 151 AATCGGTTTATCTTGCAGCTTTAAGACTATTATTATAT-TTTTTTtATCG 199 rev_contig310 199 TTGTACAGTAAGAAGCTACATAAtTTTTcCTACCGcCTA--TT-----gg 241 ||||||||||||||||||||||||||||||||||||||| || .| contig402106 200 TTGTACAGTAAGAAGCTACATAATTTTTCCTACCGCCTATTTTGGGGGAG 249 rev_contig310 242 GGGGGGGGATTGTTGAATCAGTTAAGAATTAAAAGATGATGCTAtTTCAG 291 ||||||||||||||.|||||||.||||||| ||||||||||||||||||| contig402106 250 GGGGGGGgATTGTTAAATCAGTCAAGAATT-AAAGATGATGCTATTTCAG 298 rev_contig310 292 aATACtTaAACttTTTTTAAGAC--GAC---------T-A-TAA-GTTTA 327 ||.||||.||||||||||||||| ||| | | ||. ||||| contig402106 299 AAAACTTCAACTTTTTTtAAGACTAGACTATTTTTAATAATTAGTGTTTA 348 rev_contig310 328 AATAACACTAATTATTaAAAACTTGGTCTATCTTGGTCTTGGtTTTAGGt 377 |||||||||||||||||||||||||.||||||||.||||||||.|.|||| contig402106 349 AATAACACTAATTATTAAAAACTTGATCTATCTTCGTCTTGGTCTAAGGT 398 rev_contig310 378 TTTTCCTCTAGTTAATATTACTGTTACAACTACATAAAAACAATAAAATA 427 ||.|||||||||||||.|||||||||||||||||||||||||||||..|| contig402106 399 TTGTCCTCTAGTTAATCTTACTGTTACAACTACATAAAAACAaTAAGGTA 448 rev_contig310 428 CTGTATCTTTGCAGATCCTATGAGCGGAACCACTTTTGACTGGGCGAAGA 477 |||||||||||.|||||||||||||||||||||||||||||||||||||| contig402106 449 CTGTATCTTTGTAGATCCTATGAGCGGAACCACTTTtGACTGGGCGAAGA 498 478 ATACAACAAATGTCCCATTTTCTTACCTGATTGAATTAAGAGACTTGGGG 527 ||.||||||||||||||||||||||||||||||||||||||||||||||| 499 ATGCAACAAATGTCCcATTTtCTTACCTGATTGAATTAAGAGACTtGGGg 548 528 CAATACGGTTTCTTGTTACCAGCAGAACAGATTATTCCAACTAATTTAGA 577 |||||||||||||||||||||||||||||||||||.|||||||||||||| 549 CAaTACGGTTtCTTGTTACcAGCAGAACAGATTATACCAACTAATTtAGA 598 578 AATAATGGATGCACTCCTGGAGATGGATAATACCGCAAGAACACTAgGG 626 ||||||||||||||||||||||||||||||.|||||||||||||||||. 599 AATAaTGGATGCACTCcTGGAGATGGATAACACCGCAAGAACACTAGGA 647
? ? ? ? ? ? ? What now?Still more sequencing needed... • target enrichment: 55K 120nt probes • 5’ SAGE • longer matepairs longer contigs & scaffolds annotation
Challenges • no elegant solution for combining SOLiD colorspace reads with other platforms in denovo assembly • read quality: filtering vs error correction • difficulties generating long matepairs • how to finish the assembly project: validation Goal: to get contigs/scaffolds useful for gene prediction
What is the best assembler? • soap, velvet, Newbler, CLC bio, Celera • #contigs, contig lengths, accuracy
Assembling Solexa data • 52M 2*101 pairend (insertsize 600bp) • 102M 2*76 pairend (insertsize 600bp) • error correction (soap denovo) sum of contig lengths number of contigs contig size contig size
Assembling 454 data, 10M single reads 400bp number of contigs sum of contig lengths Newbler: all 454 data + 2M 1500nt matepairs from soap scaffolds CLC bio: all 454 data + all Solexa data contig size contig size
denovo assembler history: Part I • read errors • repetitive elements
denovo assembler history: Part II de Bruijn graph