1 / 49

Sequencing the Maize (B73) Genome

Discover the progress in sequencing the Maize (B73) Genome with the latest data submission, annotation, and collaborations. Track the path from library construction to successful clone selection for a complete genome sequence.

stevenbrown
Download Presentation

Sequencing the Maize (B73) Genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequencing the Maize (B73) Genome Maize Genome Sequencing Consortium Genome Sequencing Center

  2. The Team • WU Genome Sequencing Center (R. Wilson, PI) • Bob Fulton, Pat Minx, Sandy Clifton • Arizona Genome Institute (R. Wing) • Cold Spring Harbor Laboratory • D. Ware, L. Stein • R. McCombie, R. Martienssen • Iowa State University (P. Schnable & S. Aluru) • The Maize research community

  3. The Plan

  4. Progress as of 9/30/06

  5. Agenda 9:00 – 9:15 Introductions and Project Overview (Rick Wilson) 9:15 – 10:15 Plans and Progress – WU/AGI/CSHL/ISU Project Map and Tile Path Selection (Rod Wing) Library Construction and Production (Lucinda Fulton) Sequence Improvement (Bob Fulton, Dick McCombie, Rod Wing) Data Submission (Joanne Nelson) Annotation and Data Display (Doreen Ware) Outreach (Rick Wilson) 10:15 - 10:30 Break 10:30 – 11:00 Plans and Progress – DOE Project (Dan Rohksar) 11:00 – 11:30 Future Plans and Collaborations Pat Schnable (by phone) - retrotransposons 11:30 – Noon Executive Session Noon – 1:00 Working Lunch and Discussion 1:00 Depart for Airport

  6. BAC-by-BAC Strategy to Sequence the Maize Genome Maize B73 Genome (2300 Mb) BAC library construction (Hind III, EcoR I/MboI ; 27X deep ; 150kb avg. insert) Genetic Anchoring in silico, overgo hybridization Fingerprinting ~460,000 BACs BAC End Sequencing ~800,000 BAC physical maps (HICF & Agarose) FPC databases (Agarose and HICF) STC database Choose a seed BAC Shotgun sequencing and finishing STC database search, FP comparison Determine minimum overlap BACs Complete maize genome sequence

  7. Map Summary • Total Assembled Contigs: 721 • Equal to 2,150 Mb, 93.5% coverage of 2300 Mb genome • Anchored: 421 ctgs, 86.1% the genome • average anchored contig size: 4.7 Mb • Unanchored: 300 ctgs, 7.4% coverage average unanchored contig size: 0.56 Mb • 189 of the 300 unanchored contigs are less than 10 clones • Largest anchored contig 22.9Mb in Chr9 • Largest unanchored contig 6.7 Mb • Total FPC Markers: 25,924 • STS markers: 9,129 • Overgo Markers: 14,877 • Anchored markers: 1918

  8. MTP Selection • Seed BACs: 4000, done • Mega Contig: 197, done • Clone Walking from Seed BACs: 2,800 done; in progress • Total clones picked = 6,997 • On track to deliver 1000 clones/month until maze MTP is complete

  9. Flowchart for MTP picking and Library Construction Clone selection (combine seed BAC and BAC end sequences with fingerprinting and trace files) Clone picking (Resource Center) GenBank BAC end sequence database MTP sequencing Seed BAC database Library DNA production Library DNA production DNA shearing Hfq sequencing MTP BAC end database Clone verification Clone shipping Continue shotgun library construction at WashU

  10. Seed BAC Walking In Agarose and HICF map, selecting large clones next to seed BAC Blastn search of BAC end sequences against seed BAC sequences Check blastn alignment for candidate clones Check trace file for Dye blob Check the Sulston score in HICF map for overlap Check Agarose fingerprints to avoid overlap with large bands Choose walking clone

  11. Minimum Tile Path Pipeline • BAC End Sequence of potential BACs are BLASTed against the Seed BACs • Results are classified based on location on the FPC • A table for each BAC is created of filtered BLAST results with links to CMap and GBrowse • Blast results are imported into CMap and GBrowse with additional information such as trace files and FPCs

  12. Minimum Tile Path Pipeline Usage • A table of alignments between the seed BAC and the BAC end sequences contains links to CMap and GBrowse. • CMap displays the FPC data for the seed BAC and the potential next BACs. • GBrowse provides an alignment of the BES with the seed sequence and displays the trace data.

  13. Blast Results Table

  14. Maize Production Sequencing • Shotgun of 19,000 BACs • Fosmid End Sequencing of 1 Million Reads • BAC End Sequencing of 220,000 clones

  15. Maize BAC shotgun BAC DNA received from AGI or prepared at the GSC Small Scale Library Construction Production Sequencing - 1,536 reads/project Automated Shotgun_done

  16. To date 3,106 BAC clones are shotgun_done

  17. Maize Fosmid Sequencing • Fosmid trays 0001 to 0471 were received from Messing lab Initial QC was fine, but bulk shipment has failed to grow Stamping results of the original trays show no growth • 85 Fosmid ligations which represent ~250,000 clones were received from the Messing lab, plating is underway • GSC Fosmid library construction has been completed and represents 1M clones • Expected completion date is November of this year.

  18. Maize BAC End Sequencing • BAC end sequencing will be completed next week • Total of 440,000 reads from two different libraries • Pass rate of 75% with an average read length 600 bases • Paired end read rate is ~70%

  19. Sequence Improvement Pipeline • Shotgun_done triggers the prefinishing pipeline • Initial identification of “do finish” regions • Manual sorting and use of autoedit(Gordon) to break apart misassembly. • Autofinish(Gordon) used to choose directed reactions for all gaps and regions of low quality in “do finish” regions • Reassembly and 2nd iteration of prefinishing pipeline • Final identification of “do finish” regions and handoff to finishing pipeline

  20. Clone Improvement through the Prefinishing Pipeline

  21. Coverage (green) Spanning Plasmids End

  22. EST sequence GSS sequence Do Finish Repeat Tags

  23. Alignment with cDNA read pairs Alignment with End Sequences

  24. Future Plans for Improved Throughput • Automated Shotgun-done status assigning • Overlap Evaluation at Prefinishing • Addition of Fosmid End Pairs at Prefinishing • Direct Sequencing for Unspanned Gaps • Additional Finishing Staff Hired at all 3 Centers

  25. Maize clone submissions Query GenBank by keywords clone status submission keywords shotgun complete HTGS_PHASE1; HTGS_FULLTOP 2 rounds of prefinish HTGS_PHASE1; HTGS_PREFIN in finishing HTGS_PHASE1; HTGS_ACTIVEFIN finished HTGS_PHASE1; HTGS_IMPROVED zea mays[ORGN] AND HTGS_PREFIN[KYWD] AND WUGSC[CNTR] zea mays[ORGN] AND HTGS_IMPROVED[KYWD] AND WUGSC[CNTR] Restrict by date range: zea mays[ORGN] AND WUGSC[CNTR] AND HTGS_FULLTOP[KYWD] AND 2006/09[PDAT] zea mays[ORGN] AND WUGSC[CNTR] AND HTGS_FULLTOP[KYWD] AND 2006/09/26:2006/10/03[PDAT]

  26. HTGS_IMPROVED submissions Pick a clonename, any clonename - DEFINITION Zea mays chromosome 4 clone CH201-11H16; ZMMBBc0011H16 Center project name: Z_AF-11H16 Improved sequence is annotated on submission record Where possible, contigs have been ordered and oriented based on read pairing. and these regions are designated as scaffolds. Small contigs (<2kb) that don’t represent a clone end, don’t contain improved sequence, or are not part of a scaffold are removed from the final submission. Contigs are screened for bacterial contamination

  27. FEATURES Location/Qualifiers source 1..173904 /organism="Zea mays" /mol_type="genomic DNA" /db_xref="taxon:4577" /chromosome="unknown" /clone="CH201-112C8; ZMMBBc0112C08" misc_feature 1..51940 /note="scaffold_name:Scaffold1" misc_feature 1..36440 /note="assembly_name:Contig245 clone_end:left vector_side:T7" gap 36441..36540 /estimated_length=unknown misc_feature 36541..51940 /note="assembly_name:Contig240" misc_feature 51941..129231 /note="scaffold_name:Scaffold2" gap 51941..52040 /estimated_length=unknown misc_feature 52041..59371 /note="assembly_name:Contig250” ........... misc_feature 120342..122491 /note="Improved sequence." misc_feature 128142..129231 /note="Improved sequence." misc_feature 129232..139656 /note="scaffold_name:Scaffold3" .....

  28. GenBank 1005 HTGS_FULLTOP 254 PREFIN_DONE 1532 ACTIVE_FIN 357 HTGS_IMPROVED

  29. Ongoing work at CSHL • BAC Annotations Levels • Data Analysis • Display • Project Management • Collaborations

  30. BAC Data Analysis • Ensembl Pipeline • 3 inclusive phases of annotation • Level I: Display BAC information • Level II: Sequence-based annotations • Level III: Integrative annotations Shiran Pasternak, Apurva Narechania, Joshua Stein

  31. Application of Mathematical Repeat Analysis • Identifies novel repeats w/o dependence on curation. • Based on frequency of 20-mers in JGI WGS sequence • Correlates with presence of retroelements. • Can modulate threshold to optimize application. Apurva Narechania, Joshua Stein

  32. Retroelement Annotation Collaboration with Jeff Bennetzen and Philip SanMiguel • Classify retroelement families • Current list covers ~68% of genome • Ten most prevalent account for ~80% retroelement sequences • Ji, huck, opie, zeon, cinful, prem1, grande, xilon, gyma, giepum Goal is to visualize the history of transpositions Giepum element interrupted by ji and opie in AC148166 Joshua Stein

  33. Whole Genome Alignments • Wobble Aware Bulk Aligner (WABA)* • TIGR Transcripts Rice • WABA alignments Maize • Distinguishes between: • low similarity regions (grey) • high-similarity regions (medium blue) • high similarity regions w/ wobble-base mismatch of coding regions (green) *Kent, WJ & Zahler, A.M. (2000). Genome Res. 10:1115-25 Joshua Stein

  34. Whole Genome Alignments • BLASTZ* with AXTCHAIN** & CHAINNET** • Sensitive gapped BLAST algorithm designed for aligning long sequences. • Accommodates long gaps & overlapping gaps, inversions, translocations, & duplications *Schwartz, S et al. (2003). Genome Res. 13:103-7 **Kent, WJ, et al. (2003). PNAS 100:11484-11489 Example of BLASTZ(net) display in Ensembl.

  35. www.maizesequence.org Sequenced BAC FPC Contig Virtual Bin Core Bin Marker Chromosome Synteny Views Main Navigation bar is accessible from every page Contains multiple entry points to the genome

  36. MapView Displays statistics by chromosome and provides entry points based on a single chromosome

  37. CytoView Provides detail information on features anchored to the FPC map. The side bar highlights the location on the chromosome and provides page specific functionality including data export. The Detailed view is customizable, tracks can be added or removed by the users. Feature contain drop down menus that contain general information as well as provided internal links, and external links.

  38. ContigView This view is based BAC coordinated and displays annotation levels II and III. The header contains the Clone name in the physical map, GenBank Accession, and Chromosome and FPC contig information. Detailed view offers semantic zooming, customizable and provides links to other views and information resources.

  39. SyntenyView

  40. Release October 2006 BlastView December 2006 BAC Annotation Level II January, 2007 Level III annotation April, 2007 WG alignments June, 2007 BioMart January, 2007 NSF collaborations TwinScan annotations: March, 2007 Maize Optical Map: July, 2007 Full-length cDNAs: December, 2007 Notification System Users are notified When a region of interest is updated When markers are aligned to a specific sequence January, 2007 Upcoming Features

  41. Hardware Environments • Software • Developed locally • Managed with source control • Frequent releases to staging environment • Quarterly production releases • Data • Timed analysis on staging environment • Mirrored weekly on production Shiran Pasternak, Apurva Narechania

  42. Quality Assurance • Unit-testing framework • Binary assertions • Failure report and automatic notification • Software Quality Control • e.g., code retrieves correct data from the database • Data Quality Control • e.g., clone in Genbank record exists in FPC map Shiran Pasternak

  43. Project Management • Mantis Bug Tracker • Manage tasks using priorities, severities, and resource allocations • Automated submission of issues using feedback form • Generation of progress reports

  44. Project Management • Wiki • Enhances group communication • Meeting notes, flowcharts, specification documents • Maintains history of specifications and design decisions • Seamless editing

  45. Collaborations • MaizeGDB (Iowa State University, University of Missouri) • C. Lawrence • Maize Optical Map (University of Wisconsin) • D. Schwartz • Maize Transposon Annotation (University of Georgia, Purdue) • J. Bennetzen, P. San Miguel • Ensembl (EBI) • E. Birney • Vmatch for Mathematical Repeats (University of Hamburg) • S. Kurtz • Maize Full Length cDNA project (Arizona Genomics Institute) • Y. Yu • TwinScan (Danforth Plant Science Center) • B. Barbazuk

More Related