1 / 40

Sequencing the Maize Genome

Sequencing the Maize Genome. Maize Genome Sequencing Consortium. rwilson@watson.wustl.edu. Sequencing Progress. A 22 Mb sequence contig on Maize chromosome 4 . Maize Chr4. Genetic. Physical. Synteny. Plans & Milestones. 22 Mb contig on chromosome 4 Analysis & publication

lynde
Download Presentation

Sequencing the Maize Genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequencing the Maize Genome Maize Genome Sequencing Consortium rwilson@watson.wustl.edu

  2. Sequencing Progress

  3. A 22 Mb sequence contig on Maize chromosome 4 Maize Chr4 Genetic Physical Synteny

  4. Plans & Milestones • 22 Mb contig on chromosome 4 • Analysis & publication • Draft sequence of the maize genome • All BACs: shotgun & pre-finishing (?) • End of the calendar year • Announce at the Maize Meeting in D.C. • Completion of the maize genome sequence • Version 1.0 • Analysis & Publication • Future Work • Secondary Annotation • Clean-up sequencing, maintenance

  5. Maize Genome Sequencingat Arizona Rod A. Wing Arizona Genomics Institute BIO5 Department of Plant Sciences University of Arizona

  6. BAC by BAC Strategy to Sequence the Maize Genome Maize B73 Genome (2300 Mb) BAC library construction (Hind III, EcoR I, MboI ; 27X genome coverage (~150kb inserts) Genetic Anchoring in silico, overgo hybridization (19,292) Framework Fingerprinting ~460,000 BACs BAC End Sequencing ~800,000 BAC physical maps (HICF & Agarose) FPC databases (Agarose and HICF) STC database Choose a seed BAC (800 Kb spacing) Shotgun sequencing and finishing STC database search, FP comparison Determine minimum overlap BACs Complete maize genome sequence

  7. Estimated Chromosomal Coverage 100 Physical 90 Genetic 80 70 60 Percentage 50 40 30 20 10 1 2 3 4 5 6 7 8 9 10 Chromosomes The chromosomal coverage based on maize cv Seneca 60

  8. Minimum Tiling Path Pipeline(CSHL/AGI) • BAC End Sequence of potential BACs • are BLASTed against the Seed BACs • Results are classified based on location • on the physical map • A table for each BAC is created of filtered • BLAST results with links to CMap and • GBrowse • Blast results are imported into CMap and • GBrowse with additional information such • as trace files and FPCs • A table of alignments between the seed • BAC and the BAC end sequences • contains links to CMap and GBrowse. • CMap displays the FPC data for the seed • BAC and the candidate BACs to pick. • GBrowse provides an alignment of the BES • with the seed sequence and displays the • trace data.

  9. Clone Picking Progress • Seed BACs: 3,400, complete • Clone Walking from Seed BACs: 12,824 complete • Total clones picked = 16,224 (169 96-well plates) • 15,400 successful • 7,800 Year 1 • 7,600 Year 2 • Gap-filling • ~600 Year 3, in progress

  10. Clone Picking • Clone Walking • By sequence if seed BAC sequence was available • By fingerprints when no sequence was available • Clone verification • BAC end sequence • Seed BAC sequence

  11. Library Picking • 60 cycles to look through • 1,221 384-well plates for • 16,320 clones

  12. BAC End Sequencing(for Clone Verification) 170 96-well plates for 16,320 clones generating 48,960 BES (2 forward, one reverse)

  13. DNA Preparation and Shearing 170 96-well plates for 16,320 clones 10 plates each month 2.5 plates per person

  14. MegaContig 182 in Maize Genome and Its Synteny to Rice Maize Chr4 All ordered and orientated 26 MB Genetic Physical Synteny

  15. Maize Pseudomolecules for Rice Syntenic Chr3S 6.9 Mb (1.5 gap/BAC) 7.2 Mb (1.7 gap/BAC) Maize Chr9L Rice Chr3S Maize Chr1S

  16. Maize Production Sequencing lfulton@watson.wustl.edu

  17. Maize Production Goals • BAC End Sequencing of 220,000 Clones • Fosmid End Sequencing of 500,000 Clones • Shotgun of 16,000 BAC Clones

  18. Maize BAC End Sequences • 580,000 reads processed • 567 average read length • 60% success

  19. Maize Fosmid End Sequences • 850,000 processed • 79% success • 543 average read length • Completed today

  20. Receipt of sheared DNA from AGI • Size selection of insert DNA • Ligation into pSMART vector Library Construction Pipeline

  21. Constructed 17,034 Libraries as of August 31st

  22. Average Fail Rate for Library Construction was less than 5%

  23. 3.5X coverage • Clone size verification • 50% paired ends • BES agreement • 25% of clones failed • 22% need more data • 3% BES disagreement Shotgun Criteria

  24. Shotgun Complete for 12,211 Clones as of August 31st

  25. Final Production Work • 660 Clones Need Library Construction • 2100 Clones In Production Pipeline • Expected Completion Date December 2007

  26. Sequence Improvement Bob Fulton Dick McCombie Rod Wing

  27. Sequence Improvement Pipeline • Shotgun_done triggers the prefinishing pipeline • Initial identification of “do finish” regions • Manual sorting and use of autoedit(Gordon) to break apart misassembly. • Autofinish(Gordon) used to choose directed reactions for all gaps and regions of low quality in “do finish” regions • Reassembly and 2nd iteration of prefinishing pipeline • Final identification of “do finish” regions and handoff to finishing pipeline

  28. Clone Improvement through the Prefinishing Pipeline

  29. Assembly View-Entire Clone Coverage (green) Spanning Plasmids End

  30. Assembly View-Do Finish Region EST sequence GSS sequence Do Finish Repeat Tags

  31. Alignment with cDNA read pairs Alignment with End Sequences

  32. Actual Projected

  33. Maize GenBank Submissions Joanne Nelson

  34. HTGS_FULLTOPHTGS_PREFINHTGS_ACTIVEFINHTGS_IMPROVED Submission Landmarks

  35. “Non-repetitve portions of the sequence have had sequence improvement (directed attempts) and have been labeled as ‘improved.’ Improved regions are double stranded, sequenced with an alternate chemistry or covered by high quality data (i.e. phred quality greater than or equal to 30 or approval by an experienced finisher), unless otherwise noted. Regions of low sequence complexity (such as dinucleotide repeats and small unit tandem repeats) in the improved regions have not been resolved to previously established finishing standards. BAC end sequence, cot and methyl filtered genome survey sequence and data from overlapping projects of strain B73 may have been included in this project. Where possible, contigs have been ordered and oriented based on read pairing. These regions are designated as scaffolds. Additional order and orientation will be provided upon completion of detailed analysis of the complete finished tiling path.” Improved Sequence

  36. FEATURES Location/Qualifiers source 1..184604 /organism="Zea mays" /mol_type="genomic DNA" /db_xref="taxon:4577" /chromosome="1" /clone="CH201-132J17; ZMMBBc0132J17" misc_feature 1..69252 /note="scaffold_name:Scaffold1" misc_feature 1..34245 /note="assembly_name:Contig28 vector_side:SP6" misc_feature 32401..34245 /note="Improved sequence." unsure 34230..34245 /note="Non-repetitive but unresolved region" gap 34246..34345 /estimated_length=unknown misc_feature 34346..68071 /note="assembly_name:Contig27" misc_feature 34346..36695 /note="Improved sequence." unsure 34346..34356 /note="Non-repetitive but unresolved region" misc_feature 38146..46795 /note="Improved sequence." gap 68072..68171 /estimated_length=unknown misc_feature 68172..69252 /note="assembly_name:Contig14" gap 69253..69352 /estimated_length=unknown misc_feature 69353..132243 /note="scaffold_name:Scaffold2” Improved Sequence

  37. HTGS_FULLTOP 3342HTGS_PREFIN 2014HTGS_ACTIVEFIN 4151HTGS_IMPROVED 2660 TOTAL 12167 Submission Totals

More Related