140 likes | 338 Views
Maize Missouri 17 “chromosome 10” project update. Dan Rokhsar 3 October 2006. Generate and annotate “gene space” for the ~180 Mbp chromosome 10 of Mo17 using a random shotgun approach from flow-sorted chromosomes.
E N D
Maize Missouri 17 “chromosome 10” project update Dan Rokhsar 3 October 2006
Generate and annotate “gene space” for the ~180 Mbp chromosome 10 of Mo17 using a random shotgun approach from flow-sorted chromosomes. This resource will complement the BAC-by-BAC sequencing of B73, informing our understanding of intra-species variation, from SNPs to chromosomal organization. The project will serve as a pilot R&D study for chromosome-scale random shotgun sequencing of complex genomes Aims: “Plan A”
Produce high-quality shotgun library from a single chromosome (year 1) Apply flow sorting methods to root tip preparations or oat-maize hybrid lines with maize Mo17-10 Assemble shotgun sequences and relevant mapping data to recover non-repetitive and ‘distinguishable repetitive’ regions (years 1-2) DuPont Mo17 BAC library, BAC-end sequence Targeted mapping to link across complex repeats Targeted finishing of “gene space” from whole-chromosome-shotgun draft (year 2) Interplay of finishing with annotation Challenges
Unlimited markers for mapping Nearly complete gene set for Mo17-10 Conserved synteny/chromosome dynamics with sorghum Evolutionary approaches empowered Novel reagents begin to emerge Framework for understanding strain differences Project goals for researchers and breeders
Year 1 Produce test libraries from mock flow sorted material (JGI) Produce preliminary flow sorting data for discussion at Advisory Committee meetings (NFCR) Produce 1-10 micrograms of flow sorted chromosome 10 material (NFCR). Complete library production (JGI) Begin shotgun sequencing, with associated data deposition (JGI) Milestones
Year 2. Complete initial shotgun assembly, with associated data deposition (JGI) Integrate with physical map data from DuPont (JGI) Complete two rounds of primer walking (SHGC) Annotate initial draft assembly, with data release (JGI) Complete subsequent rounds of targeted finishing reactions (SHGC) Complete physical mapping of markers and release to public repositories (PGML) Produce final assembly incorporating finishing data (JGI, SHGC) Publish detailed analyses of Maize Genome Project outcomes (all) Offer summer course on maize genome data (JGI) Milestones
First milestone from “plan A” not met Flow sorting system is going … But no significant progress to chromosome flow sorting at preparative scale Some small-scale root tip chromosome preps have been done, but not ready to scale up Three months of chromosome preps (~10,000 root tips) would be needed to obtain even a few tenths of micrograms of DNA for first chromosome-specific cloning attempt, outcome not guaranteed JGI library group would prefer more material for robust shotgun library prep (minimum of several ug); previous chromosome-specific lambda cloning (Arumuganathan) is more forgiving, still gave low coverage (2X) Attempted to contract to Dolezel’s group in Czech. but their capacity is taken with wheat BAC preps. Willing to advise. Arumuganathan is now doing human cell sorting, not working with chromosome preps, and cannot take on task. Problems at first step
Even in expert hands, purity of chromosome prep is 85-90% • Li, Arumuganathan, et al. Flow cytometric sorting of maize chromosome 9 from an oat-maize chromosome addition line. TAG (2001).
Continue development of flow sorting chromosome 10, but decouple from sequencing plans in current project Produce ~3/4 X random whole genome shotgun sequence of Mo17 in plasmid and fosmid paired ends (mix TBD) ~3 months to bulk prep DNA, make libraries, do quality control testing/sampling (Jan 2007) <3 month to schedule and perform production sequencing run (Apr 2007) Note: JGI is not in position to take on significant BAC-based shotgun from B73 project perhaps a few hundred clones, maybe ~1% of project Proposal for “Plan B”
Mo17 1 AACCAATTGGCAGCATTATTATTTTGAACAGATAAAAATCACGCCAGGGCGATGGATACT 60 B73 88023 ..............C.........C................................... 88082 Query 61 CAGCTCAATCACGGAATTCATCCATGAACTTCTCGTGGAACTCCTTGAGCCTGGATACTA 120 Sbjct 88083 ............................................................ 88142 Query 121 TCGCAGGTATCTTGTCCTCCTGCGGCAGTATCGTGCACCTGAAGTGCCACGTTCCAGGGA 180 Sbjct 88143 ............................................................ 88202 Query 181 CCTTCA--------CG--G-T--G-T-C-GC-AAAGCAACGTGTCAGTATCGTGTGCATC 223 Sbjct 88203 ......CGGTGTCG..AA.T.AA.A.C.A..A................G........... 88262 Query 224 TGAAGCTTAACGATGCTTTGAAACGGCAGGGACTTCCACaaaaaaaGG-CTTTTGAGATT 282 Sbjct 88263 .............................................G..G........... 88322 Query 283 ACCCACCTGTCCAAACCCAGAACCGGGGACGACGACGATTCCAGTGGCTTCCAGTAGGCG 342 Sbjct 88323 ............................................................ 88382 Query 343 TTTTGCGTAGTATGCATCTGGCGCAGTGCCGACTGCTTGGGCAGCTCCAATTGCCTTCTG 402 Sbjct 88383 ..........................................T................. 88442 Query 403 GGGTAAATGAAGGCGTGGGAACAGATACATTGCACCTTCGGCTTTGTTGCATGTAATTCC 462 Sbjct 88443 ............................................................ 88502 Query 463 TTCTAAACTGTTGAATGCTTCTTCCAAAGCCTGTGACAGAAGAACACGTAACAATAAGAA 522 Sbjct 88503 ............................................................ 88562 Query 523 GGTGCTTATAAGATTCAGGaaaaaaaa--TCTTTTTTAAAGTTGTTTTGCATATGTTAAC 580 Sbjct 88563 ...........................GA............................... 88622 Query 581 GGACTACTCGACCAGGGGTATAGCTTTTATTCTTGTTTGATATTTCCATATTAGGACTCT 640 Sbjct 88623 ..........G................................................. 88682 Alignment of Mo17 “gene space” with B73 allele ~97% identity • In unique “genic” regions (especially coding sequence), can easily align Mo17 and B73 to detect polymorphism. • Cf comparable human-chimp alignments at ~98.5% • (putative aminotransferase, Morgante et al.)
Align Mo17 shotgun to emerging B73 draft (at quarterly intervals) Should be easy to recognize allelic variants in non-repetitive (i.e., genic) regions, based on Morgante et al. results. Expect unique coverage of ~40% of B73 sequence. (alternative: MeF, C0t) In a typical genic locus of 5 kb, conservatively expect ~100 mismatches or indels. Dense markers allows rapid development of multiple markers per gene. (Distribute via Gramene, NCBI) Repetitive regions within B73 differ by ~90-99%, so identifying “allelic” repeats will be difficult given ~97% polymorphism (Attempt to localize “sisters” of unique reads based on B73 map.) In places where both ends of a clone are alignable, can confirm local colinearity of B73 and Mo17, or identify rearrangements and/or deletions (A la human-chimp comparison, but expect worse) Mo17 fosmid clones with localized ends will be available for distribution and/or targeted sequencing of loci-of-interest Potential start towards Mo17 WGS if desirable Likely outcomes of Plan B
Sorghum WGS currently at ~7X (in Trace Archive) mostly small insert plasmids sequenced to date BAC-end and fosmid-end sequences coming by end 2006 but uniformity of BAC library is in question, may limit assembly Quick and dirty assemblies look good using “skeleton” of method proposed for maize ~13 kb contigs and ~300 kb scaffolds (N50 #’s) at ~5X considerable scaffolding even without much BAC/fosmid data recovering ~2/3 of genome is easy even setting aside “difficult” repeats, as predicted for maize Expect full 8X assembly (with map integration) ready late Q1 2007. Quick and dirty annotation: ~42,000 genes in low copy families plus >100K retrotransposon-ish genes even in easy-to-assemble regions JGI Sorghum update
Early peek at Sorghum-rice comparison shows syntenic segments Sorghum-Rice syntenic segments are of uniform molecular “age” Comparable to human-chicken divergence Younger than Rice-Rice paralogs (from cereal-specific duplication) Transversions/synonymous site Loci in syntenic block
Maize divergences (transversions) Maize: 7,960 complete/29,922 partial peptides Sorghum: 5,927 complete/19,681 peptides Sugarcane: 6,566 complete/ 21,850 peptides ~16,000 gene families at base of grasses ~12,000 families defined by rice/arabidopsis/poplar sugarcane sorghum Arabidopsis rice