1 / 39

a comparative genomics resource to study gene and genome evolution in plants Sebastian Proost

a comparative genomics resource to study gene and genome evolution in plants Sebastian Proost. Introduction. Number of sequenced plant genomes is increasing rapidly Currently 9 species (eg. A.thaliana , O. sativa , poplar, … ) Number is growing … rapidly! Abundance of data

jonflores
Download Presentation

a comparative genomics resource to study gene and genome evolution in plants Sebastian Proost

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. a comparative genomics resource to study gene and genome evolution in plants Sebastian Proost PSB 2010

  2. Introduction • Number of sequenced plant genomes is increasing rapidly • Currently 9 species (eg. A.thaliana, O. sativa, poplar, … ) • Number is growing … rapidly! • Abundance of data • Excellent dataset for comparative genomics • Transfer knowledge from model to non-model species • Find core characteristics of plant genomes • Study species-/lineage-specific adaptations • Genome evolution • … PSB 2010

  3. Challenges • Data integration • Sequencing by different consortia • Released in different formats & quality • Problematic to analyze • Several pre-processing steps • Special hardware needs • New tools to analyze & visualize data PSB 2010

  4. Challenges • PLAZA • Pipeline to perform all pre-processing steps • Stores generated data in a database • Web-interface • Browse the data, find relevant genes, gather more information • Study gene families, function, expansion/contraction, … • Analyze genome evolution, visualize effects of large scale duplications • … PSB 2010

  5. Gene Family Evolution Genome Evolution PSB 2010

  6. Initial Data • Structural Annotation (where are the genes, what is their structure) • Collected & Stored in a uniform format • Quality checks • Functional annotation (what are they doing) • Reactome Data • Pathway information • Arabidopsis thaliana only • InterPro • Domain information • Provided or generated using InterProScan • Gene Ontology • Controlled vocabulary • Biological Process, Cellular Component and Molecular function PSB 2010

  7. PLAZA overview • Stored all annotation in a MySQL database • Can be browsed using Anno-J • (Firefox recommended !) • BLASTP and graph-based clustering • Tribe-MCL to group homologous genes into gene families & OrthoMCL to group orthologous genes into sub-families PSB 2010

  8. JGI TAIR All-against-all sequence similarity search (BLAST) TIGR Tribe-MCL OrthoMCL Similarity Heatmap PSB 2010

  9. PLAZA overview • For each (sub-) family the multiple sequence alignment (MUSCLE) and phylogenetic tree (PhyML) were generated JalView PSB 2010

  10. ATV/Archaeopteryx PSB 2010

  11. Improving data quantity and quality • An algorithm traversed the phylogenetic trees and detects monocot & dicot TROGS (tree-based orthologous groups) • Reliable GO-annotation was projected onto all members from that TROG • new or improved functional information for 36,473 genes (now 39% of papaya genes have annotation) PSB 2010

  12. Colinearity features (very brief) • Genome evolution • i-ADHoRe was used to detect colinear blocks • Study remnants large-scale duplications • Inversions, … • tandem and/or block duplicates • Gene loss, retained duplicates, … PSB 2010

  13. Website PSB 2010

  14. Data exploration PSB 2010

  15. PSB 2010

  16. PSB 2010

  17. PSB 2010

  18. Some past questions (1) • Orthologous dicot genes for reference gene X • Paralogs for gene X PSB 2010

  19. PSB 2010

  20. Some past questions (2) • I’m looking for a list of genes without close paralogs in Arabidopsis • Genes in orthologous groups with only 1 Arabidopsis gene or based on phylogenetic trees • Cannot be done using the website  Easy when using direct access to the database PSB 2010

  21. Some past questions (3) • List of all genes/gene families involved in process X in Arabidopsis PSB 2010

  22. PSB 2010

  23. PSB 2010

  24. PSB 2010

  25. Some past questions (4) • I have a set of genes related to process X and would like to compare InterPro domains PSB 2010

  26. Using the phylogenetic tree PSB 2010

  27. Workbench PSB 2010

  28. Workbench • User specifies a set of genes • Requires registration your workbench is private • Set of genes can be anything • Genes overexpressed, results of a TAP, … • Use BLAST to map genes from a different organism to a species included in PLAZA (eg. cDNA) • Using the Workbench toolbox • Compare duplication types, intron-exon structure & functional domains • Find all related genes & gene families • Map genes on the genome • Calculate GO-enrichment (for all species included!) PSB 2010

  29. PSB 2010

  30. GO-enrichment • For all species • Different formats • Raw-output • Bar charts • newGraphs PSB 2010

  31. Proof-of-Concept Study Species specific duplicates Divided in block & tandem duplicates Gene-sets PLAZAworkbench GO enrichment PSB 2010

  32. PSB 2010

  33. Workbench • Revealed an enrichment for “response to biotic stimulus” in tandem duplicates of Arabidopsis, poplar and grapevine • Closer inspection of genes causing the enrichment revealed • Mostly related with bacterial response in Arabidopsis • In poplar defense versus fungi is increased • Grapevine displayed an intermediate pattern • Seems to be correlated with the number of fungal interactions reported by USDA Agricultural Research Service and literature (Lucas, 1998) PSB 2010

  34. Microarray transcript profiling Detect up-regulated genes Grow plants normal & stress conditions Differential microarray PLAZAworkbench Mapping Gene Families GO enrichment PSB 2010

  35. Conclusion • PLAZA is an extensive resource • Step into comparative genomics without various pre-processing steps • Stored all data in database • Web-interface to browse these data • Several tools & visualizations are available • Comprehensive comparative genomics data & tools is available for non-bioinformaticians PSB 2010

  36. Perspectives • More genomes • Dicots • Arabidopsis lyrata • Medicago truncatula • Glycine max • Lotus japonicus • Ricinus communis • Manihot esculenta • Cucumis sativus • Monocots • Zea mays • Brachypodium distachyon • More outgroups • Selaginella moelendorffii • Volvox carteri • … • Update genomes to latest version • One species  more genomes • 1001 Arabidopsis genomes PSB 2010

  37. Perspectives • More data • Map EST & markers • Link with other platforms (eg. CORNET) • More features • Interactive visualizations • Expand workbench PSB 2010

  38. URL : http://bioinformatics.psb.ugent.be/plaza • Extensive Documentation,Tutorials & FAQ on the website • PLAZA: a comparative genomics resource to study gene and genome evolution in plants, Plant Cell (in press) • Feedback is highly appreciated • Ideas & Suggestions • Request new features • Bug- & error-reports • plaza@psb.vib-ugent.be PSB 2010

  39. Acknowledgments Michiel Van Bel Lieven Sterck, PhD Thomas Van Parys Klaas Vandepoele, PhD Kenny Billiau Yves Van de Peer, PhD, Prof. plaza@psb.vib-ugent.be PSB 2010

More Related