1 / 16

Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads

Harry Presman. Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads. Overview. Motivation Assembly Results Advantages/Limitations. Motivation. Next-gen sequencers produce short read-lengths Useful for polymorphism discovery Difficult to assemble whole genomes

kina
Download Presentation

Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harry Presman Gene-Boosted Assembly of a Novel Bacterial Genomefrom Very Short Reads

  2. Overview • Motivation • Assembly • Results • Advantages/Limitations

  3. Motivation • Next-gen sequencers produce short read-lengths • Useful for polymorphism discovery • Difficult to assemble whole genomes • Current assembly algorithms produce highly fragmented results

  4. Sequencing P. aeruginosa(PAb1) • Source of common in-hospital infections • Chosen due to available comparators, PAO1 and PA14 • 8,627,900 shotgun reads (Solexa)

  5. Assembly • Step 1: AMOScmp • Comparative assembler • Uses MUMmer • Alignment system based on suffix trees • Referenced in “Comparative Genome Assembly” • PA14 – 2053 contigs • PAO1 – 2797 contigs

  6. Assembly • Step 2 : multiple sequence alignment • Align PAO1 and PA14 assemblies • Use Minimus to fill gaps with contigs • AMOS component for small data sets • Re-map reads using AMOScmp to clean assembly • Closed 203 gaps

  7. Assembly • Step 3 : gene-boosted assembly • UofMaryland annotation pipeline • Based on BLAST and Glimmer • Protein-coding genes used to fill gaps • Identify genes at contig edges and gaps • Extract AA sequences • tBlastn identified potential filler reads • ABBA assembled reads into gaps • Closed 185 gaps

  8. Aside • Tested gene-boosted analysis alone • PAb1 assembled using PA14 proteins • 96% of PAb1 proteins assembled using only this method • Lacks global genome structure information

  9. Assembly • Step 4 : Clean up • SSAKE • “Short Sequence Assembly by K-mer search and 3’ read Extension” • Edena • “Exact DE Novo Assembler” • Velvet • Closed 46 gaps

  10. Results • 76 contigs containing 6,290,005 bp • 94% of bases in single scaffold • 5602 protein-coding genes identified • Error rate per read = 1.04% • Error with coverage > 20X is zero • Slight bias toward high gene coverage regions

  11. Results • SNP analysis • Aligned PA14 and PAb1 • 5,537,508/5,568,550 bp agreed • 1157/5,568,550 possible sequence errors • 187/1104 indels in error • Accuracy of assembly: > 99.97%

  12. Advantages/Limitations • Requires related genomes and protein sequences • GenBank contains > 650 microbial genomes • Genome size should not matter • High speed and low cost • ¼ of a single Solexa sequencing run in this case

  13. Thank You Questions?

More Related