270 likes | 404 Views
Mlp Summer workshop – INRA Nancy, August 20-21 2008. The genome sequence of Melampsora larici- populina the causal agent of the poplar rust disease Gene content in the Mlp Genome ( automated annotation). Duplessis Sébastien (INRA Nancy).
E N D
MlpSummer workshop – INRA Nancy, August 20-21 2008 The genomesequence of Melampsoralarici-populina the causal agent of the poplarrustdisease Gene content in the MlpGenome (automatedannotation) Duplessis Sébastien (INRA Nancy) Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM
Coding potential search EuGene,FGeneSH, Genewise Blastn Blastx tBlastx Annotation of MlpGenome – Gene prediction 2006-2007 SpliceMachine Intrinsic approaches Repeats Netstart Predicted Genes (manual curation) Extrinsic approaches Puccinia Sporobolomyces Basidiomycetes MlpESTs Swissprot
MlpGenomeProject – Summer 2007 Pre-release of Mlpgenomeassembly (16.4% gaps – Assembledwith JAZZ) Main genomescaffold total: 2,682 ESTsfrom 50/50 spores and germtubes of Mlp 98AG31 INRA Nancy => ~4,000 (2004) JGI => ~60,000 (2007) => ~52,000 ESTs ESTsfrom spores and germlings of MelamsporaSpp. [Mlp, Mmd, Mmt, Mo] CFS Laval => ~3,000 Mlp / ~4,200 Mmd / ~3,000 Mo / ~3,000 Mmt In planta ESTsfromMlp haustoria=> ~1,700 Mlp H3B => ~15,000 ESTs
Melampsora IAM website => summer 2007 (B. Hilselberger) updated in 2008 (E. Tisserant) Blast againstMlpscafolds Blast againstMlpESTs Blast againstavailablebasidiomycetegenomes
Melampsora IAM website => summer 2007 (B. Hilselberger) updated in 2008 (E. Tisserant) • Files to help in annotation usingArtemis • => fasta ofgenomescaffolds • => gff files ofESTsclusters • => gff files of blastn Hits vs. Puccinia, Sporobolomyces& Ustilagogenemodels
Annotation of FL sequences = TRAINING SET for genepredictors (EuGene, fgenesh, ) Gene models annotation based on complete EST support & Homology Coding for know ubiquitousfunction (metabolism, cytoskeletonelements…) Coding for hypotheticalproteins and new genes? Coding for proteins of various size Mannual curationperformedwithArtemis(Nancy & Québec) => 348 GM curated Edition of annotation cards => MelampsoraGenome Consortium website
TRAINING SET for geneprediction (EuGene, fgenesh, ) => 348 GM curated => 52,269 ESTsfromMlp 98AG31 => raw TE predictionbased on Mlpgenomepre-release
JGI Gene prediction (AndreaAerts– Jan-Mar/2008 ) • 39 scaffolds (43.9 Mbp) • 409 repetitive elements provided by collaborator , 87 generated in pipeline • nr: N.crassa, M.grisea, F.graminearum • ESTs • 3941 uniseqs described in 2003 paper • 6318 uniseqs described in 2008 paper • 8799 JGI cluster consensi (includes external ESTs) • 5 C.parasiticaCDSs from NCBI
Prediction of Gene ModelsusingEuGene (VIB - Ghent) Annotation performedwithMlpgenomepre-release M-P Oudot Le Secq - Eugene annotation usingLaccariabicolorannotation parameters => ~ 17,000 Mlpgenemodels (<1,500 TEs) => Mlp GM v0.0 Yao-Cheng Lin - Eugene annotation usingparametersspecificallydefined for M. larici-populina => ~9,000 Mlpgenemodels (> 200aa) Annotation performedwithMlpgenomeassembly release Jan2008 Yao-Cheng Lin - EuGeneannotation usingspecific training for M. larici-populina => 12,386 Mlpgenemodels 4308 hits vs yeast 4899 hits againstUniprot (7487 no hits - 1/3 ; 2/3) 4708 supported by ESTs Yao-Cheng Lin – Last EuGene annotation (summer 2008) including 454 data (~ 5000 contigs) and adjustedparameters for smallsecretedproteinsprediction => 17,167Mlpgenemodels (6,989 < 300aa)
JGI Gene prediction (AndreaAerts– 03/28/2008 ) • Genewise – 9193 models • Fgenesh_pm 3147 models • estExt_fpm 2438 models + EuGene Prediction Reconciliation and release in April 2008
JGI Gene Models prediction 16,694 gene models predicted by JGI predictions (& EuGene) • Prediction method: • Ab initio: 51 % • EuGene: 27 % • Homology based: 14 % • EST based: 8 % • Gene Model validation: • Complete (5'M-3'*): 94 % • Alignment with nr: 43 % • Alignment with pfam: 25 % • EST support: 27 % 16694 Gene models 4465 EuGene models (27%) 4810 fgenesh1 (29%) + 5422 fgenesh2 (32%) => 65.5% fgenesh models 1997 Genewise/GenewisePlus models (12%) 21% of fgenesh/genewise models were consolidated with EST Extension
JGI Gene Models prediction 16,694 gene models predicted by JGI (& EuGene) Mean gene length: 1685 pb (Laccaria: 1.5 kb) Mean transcript length: 1224 b (Laccaria: 1.1 kb) Exon # / gene: 4.90 (Laccaria: 5.4) Mean exon size: 250 pb (Laccaria: 210 pb) Mean intron size: 120 pb (Laccaria: 93 pb) Mean protein size: 378 (Laccaria: 367 aa) Protein length < 300 aa — Laccaria: 52%, Coprinus: 40% — Melampsora: 49%, Puccinia: 54%
Gene Models density on the 20 largest scaffolds Mean gene density of 2.04/10kb => 1 gene /4.9 kb (Laccaria1 gene / 3.1 kb)
JGI Gene Models prediction – The Mlpgenespace 28% of the genomeiscodingsequence 16,694 putative proteins (genemodels) = JGI prediction + extra putative proteinsidentifiedwithEuGene 15,725 proteins > 100 AA Laccaria >17,000 Phanerochaete 10,048 Coprinopsis 8,759 Ustilago 6,522 7,830 withhomologs in nr (47%) including3,893hypotheticalproteins (Puccinia, Laccaria, mostly basidiomycete) 5,461 withhomologs in swissprot(33%) 6,820 withhomologs in Laccaria (41%) 4,507 supported byMlpESTs (27%) A large proportion (30%) ofMlpgenes do not have homologues inotherfungalgenomesincludingPuccinialesP. graminisand Sporobolomycesroseus
Blast vs. Other fungal deduced proteomes 33% of Melampsoralarici-populinaspecific Gene Models (5,500 models with no homologs but ~300 Pfam/IPR hits) 10,344 homologs in P.graminis (62%) ~ 25% of orthologs with P.graminis
JGI summary – A complete table to help in annotating Mlp gene models
Mlp 98AG31 the 'bad guy' genomic team at INRA UMR 1136 IAM Duplessis Sébastien & Francis Martin Emilie Tisserant & Benoît Hilselberger (INRA Nancy) MlpBioinfo Yao-Cheng Lin (VIB, Ghent, BE) EuGene prediction, Mlp gene families Marie-Pierre Oudot-Le Secq (INRA Nancy) early EuGene gene prediction