1 / 31

Genome Assembly

Genome Assembly. Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington , Juliette Zerick. Outline. Input Data Sequence read data Pipeline Review U n-processed data Assemblers

landon
Download Presentation

Genome Assembly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome Assembly Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick

  2. Outline • Input Data • Sequence read data • Pipeline Review • Un-processed data • Assemblers • Preliminary data – assembler comparison • Visualization • Future

  3. Input Data Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  4. Vibrio navarrensis- 454 Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  5. Vibrio vulnificus- 454 Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  6. Vibrio navarrensis- Illumina Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  7. Vibrio vulnificus- Illumina Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  8. Pipeline: Revisited 454 • Illumina DeNovo • Allpaths LG • SOAP DeNovo • Velvet • Taipan • SUTTA • Hybrid DeNovo • Ray • MIRA Parameter optimization 454 raw reads Illumina raw reads Illumina hybrid • 454 DeNovo • Newbler • CABOG • SUTTA Process Illumina GAGE Statistical analysis Pre-processing 454 Evaluation Info. Illumina/ 454/ Hybrid DeNovo assembly Assemblers • GAGE • Hawk-eye Fastqc Prinseq NGS QC Assemblers Chosen Ref. Unmapped reads All possible combinations of the best 3 454 reads Illumina reads Read stats LEGEND contigs * 3 • Mimimus • MAIA Finished genome Scaffolds PRE-PROCESSING Align illumina reads against 454 contigs CONTIG MERGING Unmapped reads • MUMmer • PAGIT • Mauve Published Genomes from public databases Mac vector CLC wb V. vulnificus YJ016 V. vulnificus CMCP6 V. vulnificus MO6-24/O contigs Gap filling Nulceotide identity DENOVO ASSEMBLY GENOME FINISHING bwa Unmapped reads • GRASS • Built-in Align Illumina against the reference samstats contigs Compare mapping statistics Reference genome Illumina/(454?) reference based assembly Draft/ Finished genome • MUMmer • DNA Diff AMOScmp Reference evaluation Reference evaluation REFERENCE SELECTION REFERENCE BASED ASSEMBLY

  9. Vibrio vulnificus- 454

  10. Vibrio navarrensis- 454; unprocessed data Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  11. Vibrio vulnificus- Illumina; unprocessed data Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  12. Vibrio navarrensis- Illumina; unprocessed data Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  13. Per base sequence quality vul_454_07-2444 nav_454_2541-90 vul_ill_06-2432 nav_ill_08-2462 Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  14. Per base sequence content vul_454_06-2432 nav_454_08-2462 vul_ill_06-2432 nav_ill_06-2756-81 Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  15. Seq. duplicate levels vul_454_08-2435 • nav_454_2541-90 • nav_ill_08-2462 vul_ill_06-2432 Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  16. Pre-processing stats Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  17. Pipeline: Revisited 454 • Illumina DeNovo • Allpaths LG • SOAP DeNovo • Velvet • Taipan • SUTTA • Hybrid DeNovo • Ray • MIRA Parameter optimization 454 raw reads Illumina raw reads Illumina hybrid • 454 DeNovo • Newbler • CABOG • SUTTA Process Illumina GAGE Statistical analysis Pre-processing 454 Evaluation Info. Illumina/ 454/ Hybrid DeNovo assembly Assemblers • GAGE • Hawk-eye Fastqc Prinseq NGS QC Assemblers Chosen Ref. Unmapped reads All possible combinations of the best 3 454 reads Illumina reads Read stats LEGEND contigs * 3 • Mimimus • MAIA Finished genome Scaffolds PRE-PROCESSING Align illumina reads against 454 contigs CONTIG MERGING Unmapped reads • MUMmer • PAGIT • Mauve Published Genomes from public databases Mac vector CLC wb V. vulnificus YJ016 V. vulnificus CMCP6 V. vulnificus MO6-24/O contigs Gap filling Nulceotide identity DENOVO ASSEMBLY GENOME FINISHING bwa Unmapped reads • GRASS • Built-in Align Illumina against the reference samstats contigs Compare mapping statistics Reference genome Illumina/(454?) reference based assembly Draft/ Finished genome • MUMmer • DNA Diff AMOScmp Reference evaluation Reference evaluation REFERENCE SELECTION REFERENCE BASED ASSEMBLY

  18. Assemblers Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  19. CLC Genomics • Word Size: Automatic Word Size • CLC bio's de novo assembly algorithm works by using de Bruijn graphs. It makes a table of all sub-sequences of a certain length (called words) found in the reads. • BubbleSize: AutomaticBubbleSize • A bubble is defined as a bifurcation in the graph where a path furcates into two nodes and then merge back into one. • Minimum Contig Length: 200 • Mismatchcost : 2 • The cost of a mismatch between the read and the reference sequence. • Insertion cost: 3 • The cost of an insertion in the read (causing a gap in the reference sequence) • Deletion cost: 3 • The cost of having a gap in the read. The score for a match is always 1. • Length fraction: 0.5 • Set minimum length fraction of a read that must match the reference sequence. Setting a value at 0.5 means that at least half the read needs to match the reference sequence for the read to be included in the final mapping. • Similarity: 0.8 • Set minimum fraction of identity between the read and the reference sequence. If you want the reads to have e.g. at least 90% identity with the reference sequence in order to be included in the final mapping, set this value to 0.9. • Update contigs based on mapped reads • This means that the original contig sequences produced from the de novo assembly will be updated to reflect the mapping of the reads Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  20. Velvet • De brujin assembler • Max kmer length-31, default 29 • Commands • velveth directory -k-mer -readtype –file format filename • velvetg VAssemILL -exp_cov auto -cov_cutoff auto • exp_cov – allow the sytem to infer expected coverage of unique regions • Cov_cutoff - Allow the system to infer the removal of low coverage nodes • Designed for very short reads (25-50bp) Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  21. Newbler • De Novo OLC assembler • Uses k-mer based hashing • Command – runAssembly [filename] • Designed for longer reads (454) Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  22. SOAP DeNovo2 • Short reads DeNovo assembler • Designed to study Illumina GAII contigs • Command - SOAPdenovo-127mer all -s test.config -K 30 -R -p 4 -N 4600000 -o test_OP 1>ass.log 2>ass.err • Parameters specified: • Insert_size: 0, single end reads • Kmer_size: 23, default • asm_flag: both contigs and scaffold Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  23. Assembler comparison- 454 nav_454_2541-90 vul_454_06-2432 Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  24. Assembler comparison- Illumina nav_ill_2541-90 vul_ill_06-2432 Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  25. Pipeline: Revisited 454 • Illumina DeNovo • Allpaths LG • SOAP DeNovo • Velvet • SUTTA • Hybrid DeNovo • Ray Parameter optimization 454 raw reads Illumina raw reads Illumina • 454 DeNovo • Newbler • CABOG • SUTTA hybrid Process Illumina GAGE Statistical analysis Pre-processing 454 Evaluation Info. Illumina/ 454/ Hybrid DeNovo assembly Assemblers • GAGE • Hawk-eye Fastqc Prinseq NGS QC Assemblers Chosen Ref. Unmapped reads All possible combinations of the best 3 454 reads Illumina reads Read stats LEGEND contigs * 3 • Mimimus • MAIA Finished genome Scaffolds PRE-PROCESSING Align illumina reads against 454 contigs CONTIG MERGING Unmapped reads • MUMmer • PAGIT • Mauve Published Genomes from public databases Mac vector CLC wb V. vulnificus YJ016 V. vulnificus CMCP6 V. vulnificus MO6-24/O contigs Gap filling Nulceotide identity DENOVO ASSEMBLY GENOME FINISHING bwa Unmapped reads • GRASS • Built-in Align Illumina against the reference samstats contigs Compare mapping statistics Reference genome Illumina/454? reference based assembly Draft/ Finished genome • DNA Diff • DNA Diff AMOScmp Reference evaluation Reference evaluation REFERENCE SELECTION REFERENCE BASED ASSEMBLY

  26. Reference Genomes • V. vulnificus MO6-24/O • V. vulnificus YJ016 • V. vulnificus CMCP6

  27. Reference vs. all contigs- 454 nav_454_2541-90 vul_454_06-2432

  28. Reference vs. all contigs- Illumina nav_ill_2541-90 vul_ill_06-2432

  29. Visualization Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  30. Road ahead….. • Get all the tools working • Optimize tool parameters • Use Illumina reads to finish 454 contigs • Performance considerations for the tool Input Data / Sequence Read Data / Pipeline Review / Un-processed data / Assemblers / Preliminary Data / Visualization / Future

  31. Questions???

More Related