260 likes | 444 Views
Mlp Summer workshop – INRA Nancy, August 20-21 2008. The genome sequence of Melampsora larici-populina , the causal agent of the poplar rust disease M. larici-populina Transcriptome. Duplessis Sébastien (INRA Nancy).
E N D
Mlp Summer workshop – INRA Nancy, August 20-21 2008 The genome sequence of Melampsora larici-populina, the causal agent of the poplar rust disease M. larici-populina Transcriptome Duplessis Sébastien (INRA Nancy) Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM
Mlp Transcriptome – Goals and Means Goals Gene Expression - Identifygeneticdeterminantsinvolved in Mlpbiology - Identify sets of genesinvolved in development of infection structures (secretion, effectors, avirulence, ...) - Identify sets of genesinvolved in biotrophy (nutrition, transport) - Identify expression profiles expressedduringplant-fungalinteraction Gene Models Annotation - Validation of Gene Modelsprediction - Detection of new Gene Models
Mlp Transcriptome – Goals and Means Means EST sequencing - Sanger ESTsfromspecificcDNAlibrary (cDNAcloning / 100-1000s ESTs) - 454-pyrosequencing fromspecific tissue (no cDNAcloning / 200-400k reads) 454: 80 Mb in 1 run for 10K€ vs. 1000s of Sanger ESTs for muchmore => Genesexpressed in a given tissue (specific and ubiquitous) => No geneprediction a priori Array-based expression profiling - DNA Chips – NimbleGen Systems oligonucleotidearrays => Expression of all predictedgenesrepresented on the array =>Gene prediction a priori or EST sequencingrequired
Mlp Transcriptome – EST sequencing I cDNA Library of Mlp 98AG31 uriniospores and germlings 250 µg of DNasefree-RNAwereisolatedfromMlp 98AG31 urediniospores and germlings (urediniosporesgrown for lessthan 12h on agar) sent to JGI Mlpis an obligatebiotrophso spores are unique sources for uncontaminatedESTs cDNA Library => 29,081 cDNA clones 5'/3' sequencing => 52,269 ESTs(including ~ 4,500 ESTspreviouslyobtainedat INRA Nancy) EST assembly => 11,535 Consensus (mean size 780nt: 100 -> 5052 nt) — 6,599 singletons — 4,936 clusters — 119 consensus contain> 50 ESTs Best Blast Hits of mostabundantESTsconsisted in: — stress response TF rds1, HSP, glycosidase, ubiquitin, fruitingbodyprotein, cyclin, SOD, Ras, antibioticresistance, protease, laccase, tubulin — dehydrogenases and cytP450 fromUromycesfabae — predictedgenemodelsfromP. graminis
Mlp Transcriptome – EST Sequencing I Comparison to released Pucciniales ESTs (e-value < 10-5) Phakopsora pachyrizi (soybean rust) ESTs => Germinated/not germ spores, Infected tissues Puccinia graminis f. sp. tritici (wheat stem rust) => Germ/not germ urediniospores and teliospores Pp Mlp Pgt Mlp 45,812 6,483 56,753 46,411 5,858 28,536 5,738 4,045 Pp spore ESTs Pgt spore ESTs
Mlp Transcriptome – EST Sequencing I Mlp 98AG31 ESTs for Gene Prediction and Gene model support ESTs were used in JGI and EuGene predictions => 27 % of Gene Models supported => 4,507 Gene models supported ESTs to support gene curation => ESTs and clusters are shown on the JGI Melampsora website
Mlp Transcriptome – EST Sequencing II cDNA Libraries from various Melampsora Spp. (Feau, Joly, Hamelin, CFS, Canada) • M. medusaef.sp. deltoidae (MMD) • — Multiple isolates, diff. growth stages (field) • M. larici-populina(MLP and MLP-H) • — Multiple isolates, diff. growth stages (field) • — Single isolate, haustoria-enriched (in vitro) • M. medusaef.sp. tremuloidae(MMT) • — Single isolate, 13 days growth (in vitro) • M. occidentalis(MO) • — Single isolate, 13 days growth (in vitro)
Mlp Transcriptome – EST Sequencing II cDNA Libraries from various Melampsora Spp.
Mlp Transcriptome – EST Sequencing II Annotation of Melampsora Spp. ESTs Feau et al. 2007. Can.J.Bot
Mlp Transcriptome – EST Sequencing II Annotation of Melampsora Spp. ESTs Feau et al. 2007. Can.J.Bot
Mlp Transcriptome – EST Sequencing III: 454-pyrosequencing 454-pyrosequencing of poplarleafinfected tissues Melampsorais an obligatebiotroph => specialized infection structures (haustoria) formedafter 16 h post-inoculation (pi) and urediniaformedafter 7 dpi only in the plant host StrongMlp invasion of plant tissueswasobservedat 4 dpi (Rinaldi et al., 2007) Pyrosequencingallowsthe generation of 100,000s sequencesfromisolatedtranscripts => 200,000 ESTsfromtranscriptsisolatedfromPoplarinfectedleaves at 4 and 7 dpi with 454 GS-FLEX (Roche) by Cogenix — Transcriptsexpressedduring plant infection — Transcriptsinvolved in infection structure development, maintenance and biotrophy — Transcriptsinvolved in spore formation and maturation — Identification of plant infection-specifictranscripts by comparisonwith Sanger ESTs
Mlp Transcriptome – 454-pyrosequencing (From Ellegren, Mol. Ecol. 2008)
Mlp Transcriptome – 454-pyrosequencing 454-sequencing at JGI
Mlp Transcriptome – 454-pyrosequencing 1. 250 µg of total RNA were isolated from infected Poplar leaves ('Beaupré') at 4 hpi and 7 dpi with Mlp 98AG31 2. cDNA synthesis with SMART cDNA synthesis kit from 60 ng purified mRNA 3. 10 µg cDNA recovered and sent to Cogenix for 454-pyrosequencing on GS-FLEX (Roche) 4 dpi: infection hyphae, haustoria 4 dpi: infection hyphae, haustoria, uredinia, spore-forming cells Pictures by S Hacquard & S Duplessis (2008) by confocal microscopy with PI/Uvitex staining
Mlp Transcriptome – 454-pyrosequencing Cogenix report on 454-sequencing 454-pyrosequencing allow to generate > 400,000 sequences or 2 x 200,000 sequences in 1 run Poplar infected tissues => ~ 185,663 sequences 454-sequences are small (mean length 203 nt) and requires assembly for transcript reconstruction Assembly by Newbler => 148,688 assembled in 10,629contigs & 36,975 reads (= singletons?)
Mlp Transcriptome – 454-pyrosequencing Newblerassembly vs. MIRA assembly Newbleris a de novo assembler designed for genomicsequences (not transcripts) working in flow-chartspace, not nucleotidespace Newbler tends to eliminateseveralreadswith no obviousreasons (>38,000 reads are lost) Cogenixrecommended the use of other de novo assembler dedicated to transcriptassembly CAP3 is not recommended MIRA is an ESTs assembler recentlyupdated for 454-data => http://chevreux.org/projects_mira.html MIRA generates more contigs thanNewbler => 17511 contigs (including 2,600 singletons) MIRA provides information on overallquality of sequences (tag 'too short' = lowqualitysequences) Genomethreader (Gth) allows to maptranscriptsequences to a genomesequence MIRA contigs are mapped to Mlp andpoplargenomesto identifyfungal and plant transcripts
Mlp Transcriptome – 454-pyrosequencing Newbler vs. MIRA Mlp sequences Poplar sequences Singletons reads from Newbler are mostly low quality sequences
Mlp Transcriptome – 454-pyrosequencing Final MIRA assembly vs. poplar and Mlpgenomes — Contigs thatshowed a Gth score < 0.9 weredissolvedin singletons — Contigs attributed to bothgenomeswithGth scores > 0.9 weremanuallyresolved — Contigs attributed to a genome and containingreadsattributed to the othergenomeweremanuallyinspectedwithConsed => new contigs/singletons — Singletons withGth scores < 0.9 were not retained 5,956 contigs & 9,562 singletons attributed to Mlp 6,414 contigs & 21,400 singletons attributed to Poplar PASA (Program to Assemble SplicedAlignment) PASA is a tooldesigned for curation of genecatalogsusing sets ofESTsand FL-CDNA and based on stringentalignment to genomesequencewith GMAP, assembly in clusters based on position ongenomesequence, comparison to current catalogue of genemodels => curation PASA wasused in severalpublished 454-analyses, and in Arabidopsiscommunity for gene curation PASA=> Mlp EST (Sanger & 454 contigs) vs. Mlpgenome/genemodels
Mlp Transcriptome – 454-pyrosequencing PASA outputs for Mlp 454 Contigs PASA wasrunusing all 454 readsagainstMlpGenomeand a similarnumber of genemodelsweresupported
Mlp Transcriptome – 454-pyrosequencing PASA outputs for Mlp Sanger contigs Total of 6294 Mlp Gene Modelssupported (38%)
Mlp Transcriptome – 454-pyrosequencing Examples of genemodels curation based on Mlp 454 Contigs proposed by PASA
Mlp Transcriptome – 454-pyrosequencing Most abundanttranscriptssupportingMlp Gene Modelsidentifiedthrough 454-sequencing 4010 Gene modelssupported by 454 ESTs — 935 no hits in nr/swissprot- 391 specific to Pucciniales - 519 specific to Mlp — 265 encodes SSPs => 166 no hits in nr/swpr - 34 specific to Pucciniales - 128 specific to Mlp
Mlp Transcriptome – NimbleGen Systems oligonucleotide arrays ~390,000 60-mer oligoprobesevenlydistributed on 2cm2array 4plex arrays = 80 to 90,000 probes per array (+ controls) Set of 8 oligoprobes/geneduplicated in Laccariabicolor 16,694 JGI models + new EuGenemodelswith 454 support [All 454 supported new CDS ?] 17 to 20,000 Mlp Gene Models => 4 probes/genes => no duplicated probes => Populusfiltered 10 x 4plex NimbleGenarraysordered– Design ASAP Mlp Gene Expression duringtimecourse infection NimbleGen Systems Expression oligontarrays
Mlp Transcriptome – Conclusions Conclusions — 52,269 Mlp 98AG31 ESTs support 27% JGI Mlp Gene Models — ESTsfromotherMlpSpp to help in annotation (+ polymorphismstudy) — 185,000 454-reads wereassembled in 12,370 Contigs & 30,962 Singletons 5,956 contigs & 9,562 singletons attributed to Mlp by Gth 6,414 contigs & 21,400 singletons attributed to Poplar by Gth — PASA identified a total of 6294 Mlp Gene Modelssupportedboth by 454 and Sanger ESTs contigs = 38% of Mlp Gene Models (11% increase) — MIRA identifiedmany Gene modelsthatmayneed annotation — MIRA alsoidentified more than 2,500 putative new genes (to beverified) — Among the 4,010 Gene Modelsexpressed in planta => 519 are specific to Mlp and 391 to Pucciniales => 265 encode SSPs and 128 SSPsare specifictoMlp
Mlp Transcriptome – Conclusions Ongoing… — Curation of Gene Modelssupported by 454 contigs — Prediction/Curation of putative new geneswith 454 contigs support — Design of NimbleGen Systems OligoarrayMlp v1.0 To come… — Alternative splicing — Presence of SNPs (Transcriptsexpressed in bothnuclei?) — Profiles of candidate genesduringtimecourse infection of poplarleaves
Mlp 98AG31 Duplessis Sébastien & Francis Martin the 'bad guy' genomic team at INRA UMR 1136 IAM Emilie Tisserant & Benoît Hilselberger (INRA Nancy) Mlp Bioinfo Stéphane Hacquard (INRA Nancy) Mlp effectors Marie-Pierre Oudot-LeSecq (INRA Nancy) EST annotation Yao-Cheng Lin (VIB, Ghent, BE) EuGeneprediction, Mlpgenefamilies