400 likes | 745 Views
Sequencing the Maize Genome. Maize Genome Sequencing Consortium. rwilson@watson.wustl.edu. Sequencing Progress. A 22 Mb sequence contig on Maize chromosome 4 . Maize Chr4. Genetic. Physical. Synteny. Plans & Milestones. 22 Mb contig on chromosome 4 Analysis & publication
E N D
Sequencing the Maize Genome Maize Genome Sequencing Consortium rwilson@watson.wustl.edu
A 22 Mb sequence contig on Maize chromosome 4 Maize Chr4 Genetic Physical Synteny
Plans & Milestones • 22 Mb contig on chromosome 4 • Analysis & publication • Draft sequence of the maize genome • All BACs: shotgun & pre-finishing (?) • End of the calendar year • Announce at the Maize Meeting in D.C. • Completion of the maize genome sequence • Version 1.0 • Analysis & Publication • Future Work • Secondary Annotation • Clean-up sequencing, maintenance
Maize Genome Sequencingat Arizona Rod A. Wing Arizona Genomics Institute BIO5 Department of Plant Sciences University of Arizona
BAC by BAC Strategy to Sequence the Maize Genome Maize B73 Genome (2300 Mb) BAC library construction (Hind III, EcoR I, MboI ; 27X genome coverage (~150kb inserts) Genetic Anchoring in silico, overgo hybridization (19,292) Framework Fingerprinting ~460,000 BACs BAC End Sequencing ~800,000 BAC physical maps (HICF & Agarose) FPC databases (Agarose and HICF) STC database Choose a seed BAC (800 Kb spacing) Shotgun sequencing and finishing STC database search, FP comparison Determine minimum overlap BACs Complete maize genome sequence
Estimated Chromosomal Coverage 100 Physical 90 Genetic 80 70 60 Percentage 50 40 30 20 10 1 2 3 4 5 6 7 8 9 10 Chromosomes The chromosomal coverage based on maize cv Seneca 60
Minimum Tiling Path Pipeline(CSHL/AGI) • BAC End Sequence of potential BACs • are BLASTed against the Seed BACs • Results are classified based on location • on the physical map • A table for each BAC is created of filtered • BLAST results with links to CMap and • GBrowse • Blast results are imported into CMap and • GBrowse with additional information such • as trace files and FPCs • A table of alignments between the seed • BAC and the BAC end sequences • contains links to CMap and GBrowse. • CMap displays the FPC data for the seed • BAC and the candidate BACs to pick. • GBrowse provides an alignment of the BES • with the seed sequence and displays the • trace data.
Clone Picking Progress • Seed BACs: 3,400, complete • Clone Walking from Seed BACs: 12,824 complete • Total clones picked = 16,224 (169 96-well plates) • 15,400 successful • 7,800 Year 1 • 7,600 Year 2 • Gap-filling • ~600 Year 3, in progress
Clone Picking • Clone Walking • By sequence if seed BAC sequence was available • By fingerprints when no sequence was available • Clone verification • BAC end sequence • Seed BAC sequence
Library Picking • 60 cycles to look through • 1,221 384-well plates for • 16,320 clones
BAC End Sequencing(for Clone Verification) 170 96-well plates for 16,320 clones generating 48,960 BES (2 forward, one reverse)
DNA Preparation and Shearing 170 96-well plates for 16,320 clones 10 plates each month 2.5 plates per person
MegaContig 182 in Maize Genome and Its Synteny to Rice Maize Chr4 All ordered and orientated 26 MB Genetic Physical Synteny
Maize Pseudomolecules for Rice Syntenic Chr3S 6.9 Mb (1.5 gap/BAC) 7.2 Mb (1.7 gap/BAC) Maize Chr9L Rice Chr3S Maize Chr1S
Maize Production Sequencing lfulton@watson.wustl.edu
Maize Production Goals • BAC End Sequencing of 220,000 Clones • Fosmid End Sequencing of 500,000 Clones • Shotgun of 16,000 BAC Clones
Maize BAC End Sequences • 580,000 reads processed • 567 average read length • 60% success
Maize Fosmid End Sequences • 850,000 processed • 79% success • 543 average read length • Completed today
Receipt of sheared DNA from AGI • Size selection of insert DNA • Ligation into pSMART vector Library Construction Pipeline
3.5X coverage • Clone size verification • 50% paired ends • BES agreement • 25% of clones failed • 22% need more data • 3% BES disagreement Shotgun Criteria
Final Production Work • 660 Clones Need Library Construction • 2100 Clones In Production Pipeline • Expected Completion Date December 2007
Sequence Improvement Bob Fulton Dick McCombie Rod Wing
Sequence Improvement Pipeline • Shotgun_done triggers the prefinishing pipeline • Initial identification of “do finish” regions • Manual sorting and use of autoedit(Gordon) to break apart misassembly. • Autofinish(Gordon) used to choose directed reactions for all gaps and regions of low quality in “do finish” regions • Reassembly and 2nd iteration of prefinishing pipeline • Final identification of “do finish” regions and handoff to finishing pipeline
Assembly View-Entire Clone Coverage (green) Spanning Plasmids End
Assembly View-Do Finish Region EST sequence GSS sequence Do Finish Repeat Tags
Alignment with cDNA read pairs Alignment with End Sequences
Actual Projected
Maize GenBank Submissions Joanne Nelson
HTGS_FULLTOPHTGS_PREFINHTGS_ACTIVEFINHTGS_IMPROVED Submission Landmarks
“Non-repetitve portions of the sequence have had sequence improvement (directed attempts) and have been labeled as ‘improved.’ Improved regions are double stranded, sequenced with an alternate chemistry or covered by high quality data (i.e. phred quality greater than or equal to 30 or approval by an experienced finisher), unless otherwise noted. Regions of low sequence complexity (such as dinucleotide repeats and small unit tandem repeats) in the improved regions have not been resolved to previously established finishing standards. BAC end sequence, cot and methyl filtered genome survey sequence and data from overlapping projects of strain B73 may have been included in this project. Where possible, contigs have been ordered and oriented based on read pairing. These regions are designated as scaffolds. Additional order and orientation will be provided upon completion of detailed analysis of the complete finished tiling path.” Improved Sequence
FEATURES Location/Qualifiers source 1..184604 /organism="Zea mays" /mol_type="genomic DNA" /db_xref="taxon:4577" /chromosome="1" /clone="CH201-132J17; ZMMBBc0132J17" misc_feature 1..69252 /note="scaffold_name:Scaffold1" misc_feature 1..34245 /note="assembly_name:Contig28 vector_side:SP6" misc_feature 32401..34245 /note="Improved sequence." unsure 34230..34245 /note="Non-repetitive but unresolved region" gap 34246..34345 /estimated_length=unknown misc_feature 34346..68071 /note="assembly_name:Contig27" misc_feature 34346..36695 /note="Improved sequence." unsure 34346..34356 /note="Non-repetitive but unresolved region" misc_feature 38146..46795 /note="Improved sequence." gap 68072..68171 /estimated_length=unknown misc_feature 68172..69252 /note="assembly_name:Contig14" gap 69253..69352 /estimated_length=unknown misc_feature 69353..132243 /note="scaffold_name:Scaffold2” Improved Sequence
HTGS_FULLTOP 3342HTGS_PREFIN 2014HTGS_ACTIVEFIN 4151HTGS_IMPROVED 2660 TOTAL 12167 Submission Totals