190 likes | 441 Views
Melampsora Genome Annotation and Genome Structure Analysis First Annotation Workshop of the Melampsora Genome Consortium. Yao-Cheng Lin Bioinformatics & Evolutionary Genomics VIB Department of Plant Systems Biology, UGent. Overview. Gene prediction (structure annotation)
E N D
MelampsoraGenome Annotation and Genome Structure Analysis First Annotation Workshop of the MelampsoraGenome Consortium Yao-Cheng Lin Bioinformatics & Evolutionary Genomics VIB Department of Plant Systems Biology, UGent
Overview • Gene prediction (structure annotation) • Gene family analysis • Phylogeney position of Melampsora
EuGène: gene prediction platform Intrinsic information Other prediction programs Coding IMM Intronic IMM Translation start FunSiP Content potential for coding, intronic and intergenic Alternative models GT/AG Splice site start site Predicted genes Genomic sequence EuGène BlastN GenomeThreader TblastX RepeatMasker BlastX Pucciniagenomic sequence TE & Repeat database Protein databases ESTs databases Extrinsic information
Resources for Melampsoragene prediction • Gene models for training • Previously identified core genes in basidiomycetes • Genes with manual curation from INRA-Nancy • Splice site training/prediction • FunSiP: Michiel Van Bel developed it & helped for training • BlastX database • 8 basidiomycete proteomes, Fungi RefSeq, SwissProt • TBLASTX database • Pucciniagraminisgenomic sequence • EST libraries • JGI Sanger sequencing • 454 Pyrosequencing (the 1stmira assembly) • Repeat libraries • Hadi/Marie-Pierre. • In-house script, collected from first run of gene prediction. • Masked area from JGI. • EuGene 3.4
Example: metallothionein-like protein • Metallothionein-like protein in Magnaporthe • Protein length: 22-amino acid (MMT1) • Six Cystein residues. • Mmt1 mutants loose the ability to cause plant disease. • Difficulties inin siliconidentification • Sequence divergence. • Short sequence, easily been rejected by E-value cut-off.
Overview • Gene prediction and annotation platform • Gene family analysis • Phylogeny position of Melampsora
Gene family expansion and contraction • Gene family clustering • Similarity search with 12 fungi genomes (10 basidiomycetes, 2 ascomycetes), (All-against-all BLASTP, E-value cutoff 1e-5). • Gene families constructed by TribeMCL with inflation factor 4.0. • Species/Lineage specific gene family expansions • The mean gene family size and standard deviations were calculate for all gene families (exclude SSFs and orphans). • To center and normalize the data, the matrix of previous profile was transformed into a matrix of z-score. • Functional assignment • Domain based: RPS-BLAST • HMM profile for each family -> Search the SwissProt and NR database. • GO terms.
Protein phylogeny profile / z-score Protein phylogeny profile Z-score profile Genome Family Core-gene family Species specific gene family Gene number – mean gene number Z = Standard deviation
Difference in average gene family size *Total 8035 families, exclude the species specific families
Hierarchical clustering of gene family N. crassa M. grisea S. roseus P. graminis M. larici-populin U. maydis M. globosa P. placenta P. chrysosporium C. cinereus L. bicolor C. neoformans • Top100 most variable profiles, based on the standard deviations were calculated. • Red: Protein kinase, esteraselipase, crerecombinase, DNA/RNA helicase, Leucine-richrepeat • Blue: major facilitatorsuperfamily
Overview • Gene prediction and annotation platform • Gene family analysis • Phylogeny position of Melampsora
Phylogenies of Melampsora • Construct the Melampsoraphylogenic tree based on FUNYBASE with selected fungi genomes. • FUNYBASE: single-copy gene family (246 genes) within 21 fungi species (mostly ascomycetes). • 22 selected species: • Ascomycete: Aspergillusnidulans, Coccidioidesimmitis, Fusariumgraminearum, Mycosphaerellagraminicola, Magnaporthegrisea, Neurosporacrassa, Nectriahaematococca, Pyrenophoratritici-repentis, Stagonosporanodorum, Schizosaccharomycespombe, Sclerotiniasclerotiorum. • Basidiomycete: Coprinuscinereus, Cryptococcus neoformans, Laccaria bicolor, Malasseziaglobosa, Melampsoralarici-populina, Phanerochaetechrysosporium, Pucciniagraminis,Postia placenta, Sporobolomycesroseus, Ustilagomaydis • Zygomycete: Rhizopusoryzae *new genome; reject in FUNYBASE
Phylogenies of Melampsora- Method • 246 HMM models for the conserved protein sequence blocks in FUNYBASE . • For each genome, HMMER search against whole proteome and retain the protein sequence of the best hit in each model. • 148 models have single-copy gene in our 22 selected species. • Concatenate the 148 single-copy orthologs for tree building.
Melampsorain the phylogenetic tree of fungi using phylo_win, Neighbor joining method with Poisson correction, 500 bootstrap.
Acknowledgements Gent StephaneRombauts Michiel Van Bel KlaasVandepoele Kenny Billiau Thomas Abeel Pierre Rouzé LievenSterck Yves Van de Peer • Nancy • StéphaneHacquard • Emilie Tisserant • Marie-Pierre Oudot-Le Secq • SébastienDuplessis • Francis Martin