10 likes | 138 Views
Functional Distribution Overview of M. truncatula Predicted ORFs (via BlastP vs Arabidopsis). Data Analysis and Annotation Schema. Supercontig Construction. Metabolism. Medicago truncatula Sequencing Progress at OU. Genetic Information & Processing. BAC Sequences. 120000000.
E N D
Functional Distribution Overview of M. truncatula Predicted ORFs (via BlastP vs Arabidopsis) Data Analysis and Annotation Schema Supercontig Construction Metabolism Medicago truncatula Sequencing Progress at OU Genetic Information & Processing BAC Sequences 120000000 BAC shotgun sequence data No Arabidopsis Homologue 29%** Concatenated Contig Sequences (>5 KB) Biodegradation of Xenobiotics 17% Carbohydrate Metabolism 10% May 4, 2004 Phase 3 October 1, 2003 100000000 • Using a modified version of the SSAHA algorithm coupled with crossmatch • Identify the sequence overlaps between BACs • Search the BAC-end sequences for overlap Phase 2 Energy Metabolism 5% Medic Repeats ~ 20 Mb Finished ~ 20 Mb Total Lipid Metabolism 1% Translation 54% Phase 1 80000000 Nucleotide Metabolism 3% Genscan FgeneSH BLASTX (Arab.) BLASTN (GB-EST) BLASTX (GB- NR) Biosynthesis of Secondary Metabolites 13% BLASTN (TIGR_Plant gene Indices) Sorting and Degradation 23% Amino acid Metabolism 6% 60000000 Metabolism of Other Amino Acids 2% Metabolism 28% BLASTP (against KEGG-A. thaliana) Metabolism of Cofactors & Vitamins11% Putative 40% 40000000 BAC sequence overlaps are scored by combining the extent of overlap and supporting BAC-end pairs. Super-contigs then are built in the order of highest scoring overlaps to lowest scoring overlaps. Metabolism of Complex Carbohydrates 13% Replication 3% 20000000 Genetic Information & Processing 2% Transcription 20% Cellular Processes 1% Metabolism of Complex Lipids 19% ** only 3% of the 29% No Arabidopsis homologues had homology to GenBank with the majority homologous to genes in pea and rice Environmental Information Processing <1% 0 KEGG Metabolic Reconstruction GBrowse May-02 Key: Light Green = one BAC end matches to OU-AGCT BAC Dark Green = two BAC ends match to OU-AGCT BAC Dark Blue = OU-ACGT BAC sequence Purple = BACs with BAC end sequences Examples of Reconstructed Pathways in M. truncatula (Purine and Pyrimidine Metabolism) Key: Green Background = A. thaliana gene Red Numbers = Gene also found in M. truncatula INSIGHTS ABOUT THE MEDICAGO TRUNCATULA GENOME BASED ON EXTENSIVE BAC SEQUENCINGS. Lin1, F. Ying1, F. Z. Najar1, A. Hua1, H.S. Lai1, S. Kenton1, J. White1, D. White1, S. Deshpande1, S. Qian1, M. Seigfried1, R. Shi1, L. Song1, I. Vasylenko1, J. Wu1, W. Xu1, X. Xu1, J. Yi1, L. Yang1, Z. Yao1, A. Do1, H. Jia1, S. Qi1, B. Qin1, L. Zhou1, S. Downard1, C. Qu1, K. Wang1, Y. Ye1, Y. Xing1, A. Zhang1, T. Do1, M. Elharam1, S. Shaull1, C. D. Town2, DJ. Kim3, D. R. Cook3, E. Retzel4, N. D. Young5, and B. A. Roe1.1The Advanced Center for Genome Technology (ACGT), Dept. of Chemistry and Biochemistry, University of Oklahoma, 620 Parrington Oval, Norman, OK 730192The Institute for Genome Research (TIGR), 9712 Medical Center Drive, Rockville, MD 208503Dept. of Plant Pathology, University of California-Davis, 1 Shields Avenue, Davis, CA 956164Center for Computational Biology and Bioinformatics, University of Minnesota, MMC 43, 420 Delaware Street SE, Minneapolis, MN 554555Dept of Plant Pathology, University of Minnesota, 495 Borlaug Hall, Saint Paul, MN 55108 Analysis Medicagotruncatula, barrel medic, an important forage crop, is widely considered a model legume for laboratory studies. It is genetically tractable with a relatively small genome of ~470 million base pairs, simple Mendelian genetics, short seed-to-seed generation time, relatively high transformation efficiency, an excellent collection of phenotypic mutants, and large collections of diverse, naturally occurring ecotypes. Recent work at UC Davis, has resulted in construction of a ~30X BAC library and fingerprinting to a depth of ~12X. Parallel research at the Noble Foundation, University of Minnesota, Cornell University, Genoscope and TIGR has generated over 185,000 expressed sequence tags (ESTs) representing genes expressed in almost every Medicagotruncatula tissue, developmental stage and growth condition. Building on these outstanding genomic resources for M. truncatula, we recently initiated work to sequence the gene-space of M. truncatula. Sample sequence data from an initial whole genome shotgun identified approximately 10% of the genome represented as short tandem repeat sequences. Together with cytogenetic analysis conducted by the Bisseling laboratory in Wageningen, we confirmed that the eight chromosomes of M.truncatula are organized into distinct gene-rich euchromatic and separate pericentromeric repeat-rich regions. With funding from the Noble Foundation, the Advanced Center for Genome Technology (ACGT) at the University of Oklahoma is completing the working draft sequence of ~1000 mapped M. truncatula BAC clones, and finishing a significant number of these BACs with additional funding from the U.S. Department of Energy. Funding from the National Science Foundation now will enable us to completely sequence the euchromatic regions of six M truncatula chromosomes as a collaborative effort involving the ACGT (chromosomes 1, 4, 6, and 8) and TIGR (chromosomes 2 and 7) coordinated through the University of Minnesota. Additional funding from the European Union FP6 programme to proceed with work on the two remaining chromosomes, (chromosome 3 at Sanger/JIC and chromosome 5 at Genoscope/INRA) will make this truly an international effort. The results of these studies to date indicate that the gene density in M. truncatula is on the order of one gene in every 6-7 kilobase pairs (kbp). The ~200 Mbp of euchromatic regions that we will sequence therefore is expected to encodes ~30,000 to 33,000 genes. Following in the tradition of other genome projects, all our sequence data is made freely and rapidly available through the international databases. A detailed annotation of the BACs sequenced is updated daily on our gbrowse website (http://dna8.chem.ou.edu/cgi-bin/gbrowse?source=medicago_new) were BAC accession numbers can be used to view and download the annotations including output of gene prediction programs (Genscan and FgeneSH), and BlastX against Arabidopsis. We also are working to build an ontology database to be associated with our current annotations using Arabidopsis GO database (TAIR at www.geneontology.org). All predicted genes are also blasted against the KOG database (http://www.ncbi.nlm.nih.gov/COG/new/kognitor.html) were a preliminary metabolic outlook can be generated for M. truncatula. • Why sequence the Medicago genome? • An important forage crop • A genetically tractable model legume • A relatively small (~450 Mbp) diploid genome • Active legume research community • Medicago Research Consortium • Large collection of ESTs • Excellent BAC library • Integrated physical and genetic map • Large number of BAC-end sequences Medicago truncatula Sequencing Centers ChromosomeCenterEstimated SizeaProjected BACSb 1 Oklahoma 51 µm / 30 Mbp 156 2 TIGR 42 µm / 25 Mbp 130 3 Sanger/JIC 63 µm / 38 Mbp 197 4 Oklahoma 61 µm / 37 Mbp 192 5 Genoscope/INRA 39 µm / 23 Mbp 119 6 Oklahoma 22 µm / 13 Mbp 68 7 TIGR 43 µm / 26 Mbp 135 8 Oklahoma 26 µm / 16 Mbp 83 a Size estimates are based on observations of pachytene chromosomes and previous (upper) estimates of 600 kbp / µm in euchromatic regions of the Medicago genome. b Projections based on estimated chromosome size, a total euchromatic size of 200 Mbp, an average 100 kbp non-overlapping coverage by each BAC, and proportional distribution of previously sequenced BAC clones. Gene Density of the ~450 Mb Medicago truncatula genome FgeneSHGenscan Total number of genes 13,397 11,488 Total length of genes 30,793,326 51,687,528 Total exon length 15,794,243 14,400,445 Total number of exons 59,808 55,792 Total intron length 14,999,083 37,287,083 Total number of introns 46,412 44,305 ___________________________________________________________________ Base Pairs Sequenced Phase 1 Phase 2 Phase3 Total: 95,660,469 24,169,898 32,579,732 38,910,839 ___________________________________________________________________ Gene Space (Gene Length/BP Sequenced) 35% 59% _______________________________________________________ Gene Density (Genes/200Mb) 30,649 26,281 1 gene/6.5 kb 1 gene/7.6 kb _______________________________________________________ Arabidopsis 25,498 protein coding genes • Three Year Plan • Obtain the contiguous sequence of the Gene Rich regions of four of the 8 Medicago truncatula genome at OU, with the remaining four being completed by our international partners at TIGR, Sanger, and Genoscope. • This information will serve as a solid foundation for anticipated comparative and functional legume genomics.