390 likes | 425 Views
a comparative genomics resource to study gene and genome evolution in plants Sebastian Proost. Introduction. Number of sequenced plant genomes is increasing rapidly Currently 9 species (eg. A.thaliana , O. sativa , poplar, … ) Number is growing … rapidly! Abundance of data
E N D
a comparative genomics resource to study gene and genome evolution in plants Sebastian Proost PSB 2010
Introduction • Number of sequenced plant genomes is increasing rapidly • Currently 9 species (eg. A.thaliana, O. sativa, poplar, … ) • Number is growing … rapidly! • Abundance of data • Excellent dataset for comparative genomics • Transfer knowledge from model to non-model species • Find core characteristics of plant genomes • Study species-/lineage-specific adaptations • Genome evolution • … PSB 2010
Challenges • Data integration • Sequencing by different consortia • Released in different formats & quality • Problematic to analyze • Several pre-processing steps • Special hardware needs • New tools to analyze & visualize data PSB 2010
Challenges • PLAZA • Pipeline to perform all pre-processing steps • Stores generated data in a database • Web-interface • Browse the data, find relevant genes, gather more information • Study gene families, function, expansion/contraction, … • Analyze genome evolution, visualize effects of large scale duplications • … PSB 2010
Gene Family Evolution Genome Evolution PSB 2010
Initial Data • Structural Annotation (where are the genes, what is their structure) • Collected & Stored in a uniform format • Quality checks • Functional annotation (what are they doing) • Reactome Data • Pathway information • Arabidopsis thaliana only • InterPro • Domain information • Provided or generated using InterProScan • Gene Ontology • Controlled vocabulary • Biological Process, Cellular Component and Molecular function PSB 2010
PLAZA overview • Stored all annotation in a MySQL database • Can be browsed using Anno-J • (Firefox recommended !) • BLASTP and graph-based clustering • Tribe-MCL to group homologous genes into gene families & OrthoMCL to group orthologous genes into sub-families PSB 2010
JGI TAIR All-against-all sequence similarity search (BLAST) TIGR Tribe-MCL OrthoMCL Similarity Heatmap PSB 2010
PLAZA overview • For each (sub-) family the multiple sequence alignment (MUSCLE) and phylogenetic tree (PhyML) were generated JalView PSB 2010
ATV/Archaeopteryx PSB 2010
Improving data quantity and quality • An algorithm traversed the phylogenetic trees and detects monocot & dicot TROGS (tree-based orthologous groups) • Reliable GO-annotation was projected onto all members from that TROG • new or improved functional information for 36,473 genes (now 39% of papaya genes have annotation) PSB 2010
Colinearity features (very brief) • Genome evolution • i-ADHoRe was used to detect colinear blocks • Study remnants large-scale duplications • Inversions, … • tandem and/or block duplicates • Gene loss, retained duplicates, … PSB 2010
Website PSB 2010
Data exploration PSB 2010
Some past questions (1) • Orthologous dicot genes for reference gene X • Paralogs for gene X PSB 2010
Some past questions (2) • I’m looking for a list of genes without close paralogs in Arabidopsis • Genes in orthologous groups with only 1 Arabidopsis gene or based on phylogenetic trees • Cannot be done using the website Easy when using direct access to the database PSB 2010
Some past questions (3) • List of all genes/gene families involved in process X in Arabidopsis PSB 2010
Some past questions (4) • I have a set of genes related to process X and would like to compare InterPro domains PSB 2010
Using the phylogenetic tree PSB 2010
Workbench PSB 2010
Workbench • User specifies a set of genes • Requires registration your workbench is private • Set of genes can be anything • Genes overexpressed, results of a TAP, … • Use BLAST to map genes from a different organism to a species included in PLAZA (eg. cDNA) • Using the Workbench toolbox • Compare duplication types, intron-exon structure & functional domains • Find all related genes & gene families • Map genes on the genome • Calculate GO-enrichment (for all species included!) PSB 2010
GO-enrichment • For all species • Different formats • Raw-output • Bar charts • newGraphs PSB 2010
Proof-of-Concept Study Species specific duplicates Divided in block & tandem duplicates Gene-sets PLAZAworkbench GO enrichment PSB 2010
Workbench • Revealed an enrichment for “response to biotic stimulus” in tandem duplicates of Arabidopsis, poplar and grapevine • Closer inspection of genes causing the enrichment revealed • Mostly related with bacterial response in Arabidopsis • In poplar defense versus fungi is increased • Grapevine displayed an intermediate pattern • Seems to be correlated with the number of fungal interactions reported by USDA Agricultural Research Service and literature (Lucas, 1998) PSB 2010
Microarray transcript profiling Detect up-regulated genes Grow plants normal & stress conditions Differential microarray PLAZAworkbench Mapping Gene Families GO enrichment PSB 2010
Conclusion • PLAZA is an extensive resource • Step into comparative genomics without various pre-processing steps • Stored all data in database • Web-interface to browse these data • Several tools & visualizations are available • Comprehensive comparative genomics data & tools is available for non-bioinformaticians PSB 2010
Perspectives • More genomes • Dicots • Arabidopsis lyrata • Medicago truncatula • Glycine max • Lotus japonicus • Ricinus communis • Manihot esculenta • Cucumis sativus • Monocots • Zea mays • Brachypodium distachyon • More outgroups • Selaginella moelendorffii • Volvox carteri • … • Update genomes to latest version • One species more genomes • 1001 Arabidopsis genomes PSB 2010
Perspectives • More data • Map EST & markers • Link with other platforms (eg. CORNET) • More features • Interactive visualizations • Expand workbench PSB 2010
URL : http://bioinformatics.psb.ugent.be/plaza • Extensive Documentation,Tutorials & FAQ on the website • PLAZA: a comparative genomics resource to study gene and genome evolution in plants, Plant Cell (in press) • Feedback is highly appreciated • Ideas & Suggestions • Request new features • Bug- & error-reports • plaza@psb.vib-ugent.be PSB 2010
Acknowledgments Michiel Van Bel Lieven Sterck, PhD Thomas Van Parys Klaas Vandepoele, PhD Kenny Billiau Yves Van de Peer, PhD, Prof. plaza@psb.vib-ugent.be PSB 2010