120 likes | 279 Views
UMR 1095 - ASP. Structural & Comparative Genomics in Bread Wheat TriAnnotPipeline A LifeGrid Project based on AUVERGRID. 3rd EGEE User Forum February 12th, 2008. F. Giacomoni , M. Reichstadt, P. Leroy Génétique, Diversité & Ecophysiologie des Céréales - Clermont-Ferrand, France.
E N D
UMR 1095 - ASP Structural & Comparative Genomics in Bread Wheat TriAnnotPipelineA LifeGrid Project based on AUVERGRID 3rd EGEE User Forum February 12th, 2008 F. Giacomoni, M. Reichstadt, P. Leroy Génétique, Diversité & Ecophysiologie des Céréales - Clermont-Ferrand, France
17.000 Mb Human ~ 3.000 Mb 4.800 Mb • 85% Repeat sequences 2.800 Mb 380 Mb 70-80% 140 Mb 50% 50-80% 10% Maize Barley Bread wheat Rice A. thaliana Wheat as a challenge for Genomics • Important Economic Crop • Large Genome size
I.N.R.A. Work on the Wheat Genome • Sequencing • Annotating • Discover Genes • Find Transposable Elements • Study other biological components AAAATCGATATAGAGTATGTAGACAAATTTTAAACCCGGGGGAGAGAGAGA DNA sequence Results after Annotation of the DNA Sequence
TriAnnot PipelineGRID DNAsequences DataBase (chado) & Viewers (GBrowse) http://urgi.versailles.inra.fr/projects/TriAnnot/ TEs Genes Manual curation Manual curation TREPcons REPET TriSet GeneFarm training data set General Pipeline Structure of TriAnnot TEs Manual curation EugeneGenemarkHMMGeneID
TriAnnotPipelineGRID Architecture WEB / PipelineDevelopment RepeatMasker, est2genome, Gmap, BLAST, HMMPfam DataBanks GRID & Cluster WEB / PipelineProduction Users GFF GBrowse Login/password GnpGenome On Line APOLLO Login/password Manual Curation DownLoadgff/ARTEMIS gameXml/APOLLO Login/password GnpDB Local UpLoad Login/password gff
BAC sequenceFASTA format Panel 2 Gene annotation Panel 1 Gene Structure ab initio PredictionGeneMarkHMM, GeneID, EuGene, GENSCAN, GeneZilla Transposable Element & repeats Block2 BLAST/Gmap with transcriptsFL-cDNA, EST, mRNA Block3a Block1a Block1b RepeatMasker BLASTx SwissProt / TrEMBL Block3b TREPnr, TREPtotalRepBase, Gene Model Panel 3 RAP-like (Japan) Other biological target searches BLASTx / TREPprot Block3c EVM + PASA (US) EUGENE (France) BLASTnUGset / IRGSP/ TIGR pseudo Block5a TRF SSR Gene Function Block4 Masking Annotation Best Hit IWGSC annotation guide line nt, sts, htgs, gss Known Protein Best Hit proteins - At - Os Putative Protein tRNA Block5b BAC with masked TE Domain Containing Protein miRNA Block5c Expressed Gene Conserved Hypothetical Gene mtDNA Block5d Hypothetical Gene cpDNA … BAC with masked TEs & Genes TriAnnotPipelineGRID Detailed Architecture
WEB INTERFACE PART with: Upload of BAC FASTA format sequence Programming parameters of the Annotation with 5 blocks Production of a step.xml Wheat Seq PIPELINE PART : STEP_0: * 3 RepeatMasker vs 3 DataBanks STEP_1: * 8 BLASTn vs 8 DataBanks * 1 BLASTx vs 1 DataBank * 1 Tandem Repeat Finder STEP_2: * 1 EugeneIMM Rice * 1 GeneId * 4 GeneMarkHMM with 4 matrix STEP_3: * 1 tBLASTx vs 1 DataBank * 1 BLASTn vs 1 DataBank * 1 BLASTx vs 1 DataBank STEP_4: * 2 tBLASTn vs 2 DataBank RESULTS FILES (GFF Format)
PIPELINE LOCAL PART: STEP_1B: * 1 TRF STEP_2: * 1 EugeneIMM Rice * 1 GeneId * 4 GeneMarkHMM STEP_3C: * 3 Gene Modelling TriAnnotPipelineGRID Architecture WEB INTERFACE PART with: Upload of BAC FASTA format sequence Programming parameters of the Annotation with 5 blocks Production of a step.xml Wheat Seq PIPELINE PART: PIPELINE_GRID PART I (STEP_1A) 5 RepeatMasker (RM) PIPELINE_GRID PART II (STEP_1B, 3A, 3B, 4A, 4B, 5A et 5D) 14 BLASTn 8 GMap 5 RM 3 BLASTx 6 BLASTp 1 tBLASTn 1 PFAM RESULTS FILES (GFF Format)
Bioinformatic algorithms UI JDL Bioinformatic algorithms Bioinformatic databases SE DB update service Computing Element (CE) User Interface Server Grid part Server part Bioinformatic package
Bioinformatic algorithms UI JDL Server UI CE Computing Element (CE) Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database
Bioinformatic algorithms UI JDL 4-Creation environment 3-copy input files 1-Parameters + input file 7- job output 5-job submission 8-output transfer CE 6-job running (BLAST/ HMMer/RepeatMasker/GMap) 2-Creation XML file 9-DB filling
F. Giacomoni C. Charpentier N. Guilhot F. Choulet P. Leroy C. Feuillet M. ReichstadtA. ClaudeM. Liauzu A. Mahul TriAnnotPipelineGRID Partners 2007-2008 M. Alaux T. Flutre I. Blanc-Lenfle S. Reboux H. Quesneville B. Haas F. Legeai T. Tanaka H. Ikawa H. Numa T. Itoh B. Kronmiller