1 / 16

Mo17 shotgun project

Mo17 shotgun project. Goal: sequence Mo17 “gene space” with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of ~12X Include ~3kb paired-end sequencing (for short-range structural variation) Ultra-short-read Solexa or ABI-SOLID (for polishing)

Download Presentation

Mo17 shotgun project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mo17 shotgun project • Goal: sequence Mo17 “gene space” with inexpensive new technologies • Datasets in progress: • Four-phases of 454-FLX sequencing to max of ~12X • Include ~3kb paired-end sequencing (for short-range structural variation) • Ultra-short-read Solexa or ABI-SOLID (for polishing) • Preparation of methyl-spanning linkers to augment IBM map integration, detect rearrangements (Sanger end-sequence) • (Ideally would add Mo17 BAC-ends from DuPont, if available)

  2. Shotgun • Independent of tiling path -Can detect non-repetitive gene space even within otherwise complex regions that may not be in tiling path • Disadvantages of short-reads -Can’t expect to recover repetitive sequences

  3. Four Phases of Sequencing Complete in 2007 • Sequencing contract established with 454/Roche. Four Phases, including “collaborative” runs at no cost in P2-4. • Phase I underway (30 FLX runs.) Library QC and initial assessment of data quality (30 FLX runs). • 10 FLX runs totaling 1 Gb (~0.4X) • 20 FLX pair runs spanning 12 Gb (~5X span in 3kb inserts) • Assess quality, coverage, contamination, chimerism, accuracy • Phase II. (80 runs plus 30 runs from Roche, total 110 runs). Rough draft stage. • 40 FLX-pair runs spanning 36 Gb (total 48 Gb~10X span) • 70 FLX runs for 7 Gb (total 8Gb ~3.5X sequence) • Assess rough draft assembly (3 methods), compare B73, sorghum

  4. Phases III and IV • Phase III (50 runs + 20 contributed) • 20 FLX-pair runs (total spanning cover ~20X) • 50 FLX runs (total 13 Gb sequence ~5.5X) • Draft assembly. Rough annnotation. Assessment of structural • variation based on 20X “clone” cover. Assessment complete by • end of 2007. • Phase IV (60 runs + 30 contributed) • 90 FLX runs (to reach total 22 Gb ~10X) • Data collection complete by end of 2007. • Early ‘08. Final assembly. Integration with MSSL ends and IBM • map. Proceed to annotation and full analysis. • Note: Later phases may use next FLX release with longer • read lengths. To be conservative, sequence from FLX-pair • reads not included in sequence coverage estimates. • Total sequencing cost for Phase I-IV: $1.6M

  5. 454-FLX reads are typically either mostly masked, or mostly clean ~29% of reads have < quarter of positions masked ~58% of reads have > 2/3 of positions masked 0 0.5 1.0 Percent masked by over-rep’d 16mers

  6. Mo17 454 unique full length alignments vs. B73 MAGIs show high quality of unique alignments Residual repeats in MAGIs with multiple hits in 454 data Unique full alignments

  7. SNPs and indels of 454 reads relative to MAGIs consistent with few % variation of Mo17/B73 (combines variation with sequencing errors) SNPs or indels per base Frequency of reads

  8. Multiple assembly alternate plans • Divide and conquer • Reduce ~100 million reads to ~50K unique gene • spaces of ~thousands of reads each (~10kb) by • clustering based on various comparisons • Plan A: De novo clustering of masked reads • Plan B: map to B73, assemble (de novo for remainder) • Plan C: sorghum-assisted • Use various assemblers to lay-out and produce • consensus for each cluster (454 assembly team engaged) • Polish sequence with Solexa or SOLID for • accuracy • Link with MSSL pairs, integrate with map

  9. Backup analyses vs. B73 reference • SNP/variation detection by alignment to B73 sequence -454/Solexa/Solid (various successful models in other species at JGI, elsewhere) • Structural variation detection via paired-end placements -Needs to be tolerant of chimerism rate -Model of successful human structural analysis done with 454 (unpublished)

  10. Timeline • Phase I in progress, complete by end of month. Analysis to OK phase II ~10 days. • Phase II: October • Phase III: November • Phase IV: December • 454 sequencing complete by end of year

  11. ~58% of each BAC is masked by over-represented 16-mers

  12. Outreach Dick McCombie

  13. Types of Outreach • Public presentations • Collaborations • CSHL DNA Learning Center

  14. Public Presentations

  15. Collaborations • “The Maize Genetics and Genomics Database.” --Letter • for Carolyn Lawrence-MaizeGDB • MaizeGDB-web site text, links to data • Gramene • EBI Ensembl • Affymetrix Maize Pilot Expression Array Project • Optical map • TWINSCAN • Vmatch • Full-Length cDNA Project

  16. CSHL DNA Learning Center http://www.dnalc.org/maize/maize.html

More Related