1 / 46

The Medicago truncatula genome: a progress report

The Medicago truncatula genome: a progress report. Dr. Bruce A. Roe Advanced Center for Genome Technology Department of Chemistry and Biochemistry University of Oklahoma broe@ou.edu www.genome.ou.edu. Plant and Animal Genome San Deigo January 11 , 2004.

lamis
Download Presentation

The Medicago truncatula genome: a progress report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Medicagotruncatula genome:a progress report Dr. Bruce A. Roe Advanced Center for Genome Technology Department of Chemistry and Biochemistry University of Oklahoma broe@ou.edu www.genome.ou.edu Plant and Animal Genome San Deigo January 11, 2004 Photos by Steve Hughes, Genetic Resource Centre (PIRSA-SARDI), Adelaide, Australia. http://www.fao.org/ag/AGP/AGPC/doc/gallery/pictures/meditrunc/meditrunc.htm

  2. Why sequence the Medicago genome? • An important forage crop • A genetically tractable model legume • A relatively small (~500 Mbp) diploid genome • Active legume research community • Medicago Research Consortium • Large collection of ESTs • Excellent BAC library • Integrated physical and genetic map • Large number of BAC-end sequences

  3. Sequence Pipeline at the University of Oklahoma Genome Center, OU-ACGT DNA GenBank Sequencing (ABI 3700) Growing subclones (HiGroTM) Subclone isolation II (VPrepTM) DNA shearing (HydroshearTM) Data assembly and Analysis Thermocycling (ABI 9700) Subclone Isolation I (Mini-StaccatoTM) Colony Piking (QPixIITM) Closure Miscelaneous liquid handling Primer Synthesis

  4. Subclone Isolation (Mini-StaccatoTM) • This Zymark robot has 384 cannula array, four built in shakers, three attached storage racks, built-in barcoding and a Twister II robotic arm. • This automation has allow us to perform the DNA isolation completely unattended from as many as eighty 384 well plates of bacterial cells per day.

  5. Subclone Isolation (Mini-StaccatoTM) • Once all three solutions have been added, the plates are transferred from the SciClone workspace deck to a storage rack by the Twister II robotic arm.

  6. Subclone Isolation and Sequencing Reaction Pipetting (Velocity 11 VPrep) • Liquid handling station with 384-channel pipettor head • Four movable shelves on either side of the pipettor head • Used for subclone isolation, sequencing reaction set-up and clean-up.

  7. Data assembly and Analysis Phred/Phrap/Consed Sun V880 server Exgap • 32 GB RAM running Solaris 8 OS and 3 TB of data stored on RAID-5 arrays with autoloader tape backup • Also: • 12 workstations each with 1 GB RAM

  8. Initial WGS Skimming for ~500 Mb Medicago truncatula genome • Collected ~25,000 end-sequences from ~12,500 plasmid-based WGS clones. • Of these ~25,000 sequences, ~1,000 have homology with Medicago truncatula ESTs. • URL: http://www.genome.ou.edu/medicago.html

  9. Phrap assembly of our Medicago truncatula whole genome shotgun survey sequencing data at 0.005-fold genomic sequence coverage

  10. Bases 0 100 200 300 400 500 600 700 Bases 700 600 500 400 300 200 100 0 DotPlot of a Phrap assembled whole genome shotgun contig showing multiple repeated regions

  11. Bases 0 500 1000 1000 500 0 Bases DotPlot of a Phrap assembled whole genome shotgun contig showing 4 repeated blocks of ~600 bases

  12. Bases 0 200 400 600 Bases 600 400 200 0 Yet another genomic contig showing extensive repeated regions Contig 1931

  13. >Contig1931 TTTACGTCCCCGTAGTGAACTATTTCCTAAGTTGACTAGTCAATTAGGTG ATAGTTCGTCCGGATGACGTACCGCCGTGAACCCGATATGAGAATTTCAT GTGGTGCATCCTTCTATGTTTGATAAGGTCATTTTGAACGGTCGGATTGA ACGTGGCTGGTGTCGTTCACGATAGAGGCACGTTTAGGTCCCTACGGTGA ACTAGTTCCTAAGTTGACTAGTCAATTAGGTGATAGTTTGTCCGGATGAC GTACCTCCGTGAACCCGATCTGAGAAATTCAAGTTTCTGCATCCTTCTAT GTTTGATAAGGTCATTTTGAACGGTCGGATTGAAGGTGGCTGGTGTTCTT CACATTCTAGGCACGTTTAGGTTCCCGCGGTGAACTAGTTCCTAAGTTGA CTAGTCAATTAGGTGATAGTTCGTCCGGATGACCTACCTCCGTGAACCCG ATATTAGAAATTCAAGTTTCTGCATCCTTCTATGTTTGATAAGGTCATTT TGAACGGTCAGATTGAACGTGGCTGGTGTCGTTCACGATCTAGGCACGTT TAGGTCCCCGCAGTGAACTAGTTCCTAAGTTGACTAGTCAATTAGGTGAT AGTTTGTCCGGATGACGTGACTCCGTAAAGCCAGTATGAGAACTTCTAGT TTCTGCATCCTTTTATGTTTGATAAGGTCATTTTGAACGGTGGGATTGAA CGTTGTTGGTGTCGTTCACGATCTAGGCACGTTTAGGTCCCCGCAGTGAA CTAGTTCCTTAGTTGACTAGTCAATTAGGTGATAGTTCGTCCGGATGACG TATCTCCGTCAGCCCGATCTGAGAAATTCAAATTTCTGCATCCTTCTATG TTTGATAAGGTCATTTTGAACGGTCGGATTGAACGTGGCTGGTGTCGTGC ACGATCAAGGCACGTTTAGGTCCCCGCAGCGAACTAGTTCCTAAGTTGAC TAGTCAATTAGGTGATACCTTGTCCGGATGACGTACCTCCGTGAACCCGA TCTGAGAAATTCAAGTTTCTGCATCCTTCTATGTTTGATAAGGTCATTTT GAACGGTTGGATTGAACATGGCTGGTGTCGTTCACGATCTAGGCACGTTT AGGTCCCCGCAGTGAACTAGTTCCTAAGTTGACTAGTCAATTAGGTGATA GTTCGTCTGGATGACGTACCTCCTTGAACCCAATATGAGAAATTCAATTT TCTTCATCCTTCTATGTTTGATAAGGTCATTTTGAACGGTCGGATTGAAC GTGCCTGGTGTCGTTCACGATCGAGGCACGTTTAGGTCCCCGCAGTGAAC . . .

  14. Summary of our Medicago truncatula WGS Sequencing Assembly with only 0.005-fold Genomic Sequence Coverage • The largest contig (21,157 bp) contained the 26S rRNA genes • 19 smaller contigs (105,455 bp total) were from the chloroplast genome • The remaining ~500 contigs, ranging in size from 2,000 to 12,000 bp contain highly repetitive DNA, which were unique to Medicago, as they had no significant homology in the GenBank database • We concluded that a more directed strategy was needed

  15. Mapped BAC approach in collaboration with Doug Cook and DJ Kim at U.C. Davis with funding from the Noble Foundation, Ardmore, OK

  16. The first ~1000 Medicago truncatula BACs • Initially concentrated on BACs with known biological markers and in regions of biological interest that were supplied to us by the UC Davis group. • Requests for sequencing specific BACs were directed to Doug Cook and DJ Kim at UC Davis and they supplied us with the BACs once these BACs have been characterized. • Once the BACs were received, we created the shotgun libraries, isolated the sequencing templates and obtained the working draft sequence followed by closure and finishing. • All data was made publically available in GenBank within 24 hours of sequence assembly.

  17. UC Davis -------- Oklahoma University

  18. The next ~750 Medicago truncatula BACs • With recent NSF funding, we will be sequencing BACs from chromosomes 1,4, 6, and 8 with the goal of completing the sequence of the euchromatic regions of these chromosomes over the next 3 years. • Chromosomes 2 and 7 will be sequenced at TIGR, chromosome 3 at The Sanger Institute and and chromosome 5 at Genoscope. • All data will be released immediately as before.

  19. www.genome.ou.edu/medicago.html

  20. www.genome.ou.edu/medicago_totals.html

  21. Medicago-specific gene with ESTs but no known homology Gene density of this BAC is ~1 gene per 10 kb

  22. Medicago-specific gene with ESTs but no known homology

  23. myosin-like protein Gene density ~1 gene per 10 kb

  24. myosin-like protein

  25. 4500 4000 3500 13,396 FgeneSH predicted genes 11,488 Genscan predicted genes FgeneSH 3000 Genscan Number of Genes 2500 2000 1500 1000 500 0 1-1000 7001-8000 6001-7000 8001-9000 2001-3000 3001-4000 4001-5000 5001-6000 1001-2000 9001-10000 17001-18000 19001-20000 20001-above 15001-16000 10001-11000 11001-12000 18001-19000 16001-17000 12001-13000 13001-14000 14001-15000 Gene Size Range Gene Size Distribution (All Sequence Data) (FgenesH vs. Genscan)

  26. 20000 18000 59,808 FgeneSH predicted exons 55,792 Genscan predicted exons 16000 FgeneSH 14000 Genscan Number of Exons 12000 10000 8000 6000 4000 2000 0 1-50 401-500 51-100 101-200 201-300 301-400 501-600 601-700 701-800 801-900 901-1000 3001-3500 1001-1500 1501-2000 2001-2500 3501-4000 2501-3000 Exon Size Range Exon Size Distribution (All Sequence Data) (FgenesH vs. Genscan)

  27. 12000 10000 46,412 FgeneSH predicted introns 44,305 Genscan predicted introns FgeneSH Genscan 8000 Number of Introns 6000 4000 2000 0 1-50 401-500 51-100 501-600 301-400 601-700 701-800 801-900 101-200 201-300 2501-3000 901-1000 3001-3500 3501-4000 1001-1500 1501-2000 2001-2500 Intron Size Range Intron Size Distribution (All Sequence Data) (FgenesH vs. Genscan)

  28. Gene Density of the ~450 Mb Medicago truncatula genome FgeneSHGenscan Total number of genes 13,397 11,488 Total length of genes 30,793,326 51,687,528 Total exon length 15,794,243 14,400,445 Total number of exons 59,808 55,792 Total intron length 14,999,083 37,287,083 Total number of introns 46,412 44,305 _______________________________________________________ Base Pairs Sequenced 87,423,457 87,423,457 _______________________________________________________ Gene Space (Gene Length/BP Sequenced) 35% 59% _______________________________________________________ Gene Density (Genes/200Mb) 30,649 26,281 1 gene/6.5 kb 1 gene/7.6 kb _______________________________________________________ Arabidopsis 25,498 protein coding genes

  29. Data Analysis and Annotation Schema BAC Sequences Catenated Contig Sequences (>5 KB) Medic Repeats BLASTX (Arab.) BLASTN (GB-EST) Genscan FgeneSH BLASTX (GB- NR) BLASTN (TIGR_Plant gene Indices) tRNA/rRNA Analysis BLASTP (against KEGG-A. thaliana) KEGG Metabolic Reconstruction GBrowse

  30. Medicago GC Content for ~90 Mb of Genomic BAC Clones Sequenced (mainly from gene rich regions)

  31. Multiple COG Hits 8% DNA Metabolism 23% No Hits 5% Poorly Characterized 17% Cellular Processes 23% Metabolism 24% Metabolic Overview of Medicago 13,396 FgeneSH predicted genes using the COG Database

  32. Metabolic Overview (detailed view) of Medicago 13,396 FgeneSH predicted genes using the COG Database Translation, ribosomal structure & biogenesis 7% Multiple COG Hits 8% Transcription 5% No Hits 5% DNA replication, recombination & repair 11% Poorly Characterized 17% Cell division & chromosome partitioning 2% Secondary metabolites biosynthesis, transport & catabolism 3% Posttranslational modification, protein turnover, chaperones 5% Lipid metabolism 2% Cell envelope biogenesis, outer membrane 4% Coenzyme metabolism 2% Cell motility & secretion 3% Nucleotide transport & metabolism 2% Inorganic ion transport & metabolism 3% Amino acid transport & metabolism 5% Signal transduction mechanisms 5% Carbohydrate transport & metabolism 4% Energy production & conversion 5%

  33. Gene Duplication: Three copies of the phosphoglycerate kinase gene in one BAC

  34. Gene Duplication: Three copies of phosphoglycerate kinase in one BAC AC138448.fg.10 MATKRSVGTLKEAELKGKRVFVRVDLNVPLDDNLNITDDTRIRAAVPTIKYLTGYGAKVILSSHL----- AC138448.fg.11 MA-KKSVGDLSGAELKGKKVFVRADLNVPLDDNQNITDDTRIRAAIPTIKYLIQNGAKVILSSHL----- AC138448.fg.8 MATKRSVGTLKEGELKGKRVFVRVDLNVPLDDNLNITDDTRIRAAVPTIKYLTGYGAKVILSSHLEIYKT AC138448.fg.10 ------------------------------------------GRPKGVTPKYSLKPLVPRLSELLGTQVK AC138448.fg.11 ------------------------------------------GRPKGVTPKYSLAPLVPRLSELIGIEVI AC138448.fg.8 EVSVSEYNLAVSEYKLAISDTYRYRIRVRHDSSPFLEYRGSQGRPKGVTPKYSLKPLVPRLSELLETQVK AC138448.fg.10 IADDSIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNDPEFAKKLASLADLYVNDAFGTAHRAHASTEGV AC138448.fg.11 KAEDSIGPEVEKLVASLPDGGVLLLENVRFYKEEEKNDPEHAKKLAALADLYVNDAFGTAHRAHASTEGV AC138448.fg.8 ISDDCIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNEPEFAKKLASLADLYVNDAFGTAHRAHASTEGV AC138448.fg.10 AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIFTFYKA AC138448.fg.11 TKYLKPSVAGFLLQKELDYLVGAVSSPKRPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIFTFYKA AC138448.fg.8 AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIYTFYKA AC138448.fg.10 QGYAVGSSLVEEDKLDLATTLIEKAKAKGVSLLLPTDVVIADKFAADANDKIVPASSIPDGWMGLDIGPD AC138448.fg.11 QGLAVGSSLVEEDKLELATTLIAKAKAKGVSLLLPSDVVIADKFAPDANSQIVPASAIPDGWMGLDIGPD AC138448.fg.8 QGYSIGSSLVEEDKLDLATSLMEKAKAKGVSLLLPTDVVIADKFSADANDKIVPASSIPDGWMGLDIGPD AC138448.fg.10 SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTTIIGGGDSVAAVEKVGLADKM AC138448.fg.11 SIKTFNEALDTTQTIIWNGPMGVFEFDKFAVGTESIAKKLADLSGKGVTTIIGGGDSVAAVEKVGVADVM AC138448.fg.8 SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTTIIGGGDSVAAVEKVGLADKM AC138448.fg.10 SHISTGGGASLELLEGKPLPGVLALDDA* 401 amino acids AC138448.fg.11 SHISTGGGASLELLEGKELPGVLALDEATPVAV* 405 amino acids, differs at 42 positions AC138448.fg.8 SHISTGGGASLELLEGKPLPGVLALDDA* 448 amino acids, differs at 6 positions

  35. PrintrepeatAnalysis of M. truncatula BAC AC121240 vs. A. thaliana Chr.2 Expansion, Duplication, Repeat Elements ~25 kb region ~5 kb region

  36. PIP of M. truncatula BAC AC121240 vs. A. thaliana Chr.2

  37. Medicago truncatula Summary and Conclusions • Average Predicted Gene Density of 1 gene per 6.5 to 7.6 Kb by FgeneSH and Genscan, respectively. • Genome characteristics such as %GC, intron/exon size and conserved unique 5’ splice sites reveal Medicago characteristics • The sequence of the Medicago truncatula genome shows homology to the sequenced Arabidopsis thalianagenome but expansion, rearrangements and duplications are evident.

  38. Data Release and Preliminary Annotation • All our sequence data is available through links on our web site to GenBank and on our ftp site at URL: ftp.genome.ou.edu/medicago • keyword and blast searches can be done on our web site at URL: http://www.genome.ou.edu/medicago.html • Additional annotation via Genome Browser database are available on our web site at URL: http://www.genome.ou.edu/medicago_table.html • E-mail suggestions for additional annotation to Bruce Roe at: broe@ou.edu

  39. Three Year Plan • Obtain the contiguous sequence of the Gene Rich regions of four of the 8 Medicago truncatula genome at OU, with the remaining four being completed by our international partners at TIGR, Sanger, and Genoscope. • This information will serve as a solid foundation for anticipated comparative and functional legume genomics.

  40. Laboratory Organization Bruce Roe, PI Support Teams Reagents & Equip. Maint. Informatics Production DNA Synthesis Administration Jim White Steve Kenton Hongshing Lai Sean Qian Rose Morales-Diaz* Mounir Elharam* Yonas Tesfai Steve Shaull** Doug White Work-study Undergraduates** Phoebe Loh* Sulan Qi Bart Ford* Mounir Elharam* Doug White Kay Lynn Hale Dixie Wishnuck Tami Womack Mary Catherine Williams Research Teams Limei Yang Angie Prescott* Audra Wendt** Mandi Aycock** Doris Kupfer Julia Kim* Sun So Graham Wiley** Lauren Ritterhouse** Ziyun Yao Steve Shaull* Youngju Yoon Axin Hua Weihong Xu ShaoPing Lin Honggui Jia Hongming Wu Baifang Qin Peng Zhang Fares Najar Chunmei Qu Keqin Wang Carson Qu Shuling Li Jami Milam Sara Downard** Trang Do Anh Do Lily Fu Yang Ye James Yu Tessa Manning** Stephan Deschamps Shelly Oommen Christopher Lau Yanhong Li Fu Ying Liping Zhou Ruihua Shi Junjie Wu Pheobe Loh * Sulan Qi Bart Ford* Lin Song Ying Ni Huarong Jiang Funding from the Noble Foundation, DOE, and NSF Collaborators at Univ. Minnesota, UC Davis, TIGR, Sanger, Genoscope, and the Noble Foundation * Previous undergraduate research student ** Present undergraduate research student

  41. TheACGTTeam

  42. Conserved Intron/Exon Boundry Features by a FELINEs**Analysis of 181,444 Medicago truncatula ESTs in GenBank vs Genomic Sequence Size RangeMean Length Exons 6 - 5,789 nt 268 nt Introns 20 - 3,921 nt 429 nt Intron Conserved Splice Site Sequence ElementsPercent Introns w/ 5’ GU 99.21% Introns w/ 5’ GC 0.36%* Introns w/ 5’ AU 0.31% Introns w/ U12 branch sites instead of A12 0.13% *Compared to 0.5 - 2.5% in fungi, and 0.5% in mammals with an EST minimum identity of 90% ** S. Drabensctot, D. Kupfer, J. White, D. Dyer, B. Roe, K. Buchanan and J. Murphy. FELINES: A Utility for Extracting and Examining EST-Defined Introns and Exons. Nucleic Acid Research 31(22), E141 (2003).

  43. GU intron consensus AU intron consensus Consensus Logogram of the 5’GU vs the 5’AU Class of Introns in Medicago truncatula determined by FELINES

More Related