1 / 27

SIGS Meeting Michigan State University Friday, March 13, 2009

Joint Announcement on Genome Sequencing Standards . Defining genome project standards in a new era of sequencing. Patrick Chain. SIGS Meeting Michigan State University Friday, March 13, 2009. Outline. Evolution of genome projects From Draft to Finished From Sanger to Sangerless

poppy
Download Presentation

SIGS Meeting Michigan State University Friday, March 13, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Joint Announcement on Genome Sequencing Standards Defining genome project standards in a new era of sequencing Patrick Chain SIGS Meeting Michigan State University Friday, March 13, 2009

  2. Outline • Evolution of genome projects • From Draft to Finished • From Sanger to Sangerless • Proposed Genome Project Standards • Facing the deluge in the wake of a sequencing revolution

  3. 1995: First complete genome of a free-living organism 1997: Second International Strategy Meeting on Human Genome Sequencing in Bermuda http://www.genome.gov/10001812 Until very recently: genomes fell into 1 of 2 categories - draft or finished Finished in bacteria/archaea is defined as base pair perfect sequence in a single contig Finished in eukaryotes typically conforms to the bermudastandards, though most genomes are only regionally finished Requires significant re-evaluation! A little history

  4. General genome project pipeline Authorization Source Material Library Construction Sequencing Assembly QC Draft QD Finish QA <1/ 10 KB error rate Single contig / replicon Repeats resolved Annotation Analysis Publication End goal

  5. What we get Assembly: unordered set of contigs • Now to finish it • Order sets of contigs (scaffolds) • Fill in gaps • Resolve mis-assemblies

  6. General genome project pipeline Authorization Source Material Library Construction Sequencing Assembly QC Draft QD Finish QA <1/ 10 KB error rate Single contig / replicon Repeats resolved Annotation Analysis Publication End goal

  7. But it just ain’t so easy • Fill in gaps • Not covered by random shotgun (uncaptured) • DNA regions recalcitrant to library creation • Hairpins, GC rich, hard stops or other 2° structure/physical premature termination • Resolve repeats and mis-assemblies • Repeats within or in vicinity of other repeats • Large repetitive regions • Complex repetitive regions (tandems) • Other issues • Homopolymeric tracts; polymorphisms (SNPs, ISs, VNTRs, indels); etc. • Combinations of the above The average (JGI) Joe

  8. Ordering contigs and resolving repeats with paired ends Repeat Repeat genome assembly Solving small repeats

  9. Distribution of repeat elements in the chromosome Nostoc punctiforme 8.23 Mb

  10. Repetitive nature of a putative Synechococcus sp. WH8102 32,373 bpgene * 357 357 bp bp repeat repeat 352 352 bp bp repeat repeat 356 356 bp bp repeat repeat

  11. Hard stop gaps (including hairpins) Homopolymers Uncaptured gaps (no paired ends flank gap) Rely on alternative chemistries and methods Some gaps present a technical challenge

  12. Other non-trivial challenges: Non-clonalpopulation sequencing Also observe: hypervariable loci; active transposition of IS elements; and rearrangements And then there are metagenomes… G … T 3 types: A … T G … C

  13. Finishing = slow and costly

  14. With the advent of 2nd generation sequencing technologies, data generation has become easier and faster Will get worse: PacBio, Helios, Visigen, etc. Advances in sequencing technology

  15. Traditional whole genome Sanger sequencing required paired ends Gaps, repeats Paired end methods for new technology sequencing will make Sanger obsolete Microbial isolates, eukaryotic, single cell, genomes from metagenomes Requires evaluation of methodology Quality of: library, data, assembly Sangerless Sequencing de novo

  16. Finishing = slow and costly What does this mean? Blakesley, R.W. et al. An intermediate grade of finished genomic sequence suitable for comparative analyses. Genome Res 14, 2235-2244 (2004).

  17. Tales from the Finishing meeting • Finishing effort needs to be applied effectively and some gradations between draft and finished are required. • Community needs to be better informed of the quality of product they are receiving. • Granting bodies need to understand what will be delivered. Formed an international genome sequencing standards group

  18. Finishing standards group has representation from most of the world’s large genome centres Representation

  19. An agreement on gradation of finishing Fewer Finished Non-contiguous finished Annotation grade Improved high quality draft High quality draft Standard draft Many Also, regionally improved…

  20. An agreement on gradation of finishing Finished Non-contiguous finished Annotation grade Improved high quality draft High quality draft Standard draft It may not always be possible to remove contaminating sequence data from this incomplete dataset.

  21. An agreement on gradation of finishing Finished Non-contiguous finished Annotation grade Improved high quality draft High quality draft Standard draft still a draft assembly (90% coverage) with little or no manual review of the product, thus sequence errors and misassemblies are possible, with no implied order and orientation to contigs

  22. An agreement on gradation of finishing Finished Non-contiguous finished Annotation grade Improved high quality draft High quality draft Standard draft Automated or manual improvement - although undetectable misassemblies are still possible, they are much less likely….low quality regions and potential bp errors may also be present, but the sequence is of reasonably high quality

  23. An agreement on gradation of finishing Finished Non-contiguous finished Annotation grade Improved high quality draft High quality draft Standard draft Efforts have been made to resolve all errors in gene regions (exceptions to this gene-specific finishing standard should be noted with comments)…repeat regions are not necessarily resolved, so errors in those regions are much more likely

  24. An agreement on gradation of finishing Finished Non-contiguous finished Annotation grade Improved high quality draft High quality draft Standard draft All gaps and sequence uncertainties have been attempted to be resolved, and only those recalcitrant to resolution remain, but are specifically noted as to the nature of the uncertainty

  25. An agreement on gradation of finishing Finished Non-contiguous finished Annotation grade Improved high quality draft High quality draft Standard draft Gold Standard (1 error per 100,000 bp, no misassemblies). Any remaining small exceptions to highly accurate sequence are commented in the submission.

  26. Currently agreed upon by all members of our consortium Discussions with databases underway Planned article in journal and implementation on institute webpagesto publicise broadly Technology-agnostic so can adapt as 3rd (next) generation of sequencers arrives Currently no agreed industry standard from vendors Project/centres can add more detail as required such as sequence coverage and platform type Can integrate with the other domains of the GSC One system for all

  27. Contributors • Sanger Institute • Darren Grafham, Julian Parkhill • Human Microbiome Project • WashU Genome Center • The Broad Institute • J. Craig Venter Institute • Baylor College of Medecine Human GSC • Bob Fulton, George Weinstock, Mike Fitzgerald, Donna Muzny, Jessica Hostetler • JGI • Chris Detter, Nikos Kyrpides, AllaLapidus, Stephanie Malfatti, David Bruce, Dorothy Lang • Many others

More Related