100 likes | 260 Views
The (IMG) Systems for Comparative Analysis of Microbial Genomes & Metagenomes:. Postmortem. Europe: 386. N America: 1,180. Asia: 235. Africa: 6. 1,979. S America: 96. Oceania: 81. Outline. User Community Data Transformation Data Flow What’s Next.
E N D
The (IMG) Systems for Comparative Analysis of Microbial Genomes & Metagenomes: Postmortem Europe: 386 N America: 1,180 Asia: 235 Africa: 6 1,979 S America: 96 Oceania: 81
Outline • User Community • Data Transformation • Data Flow • What’s Next
Data interpretation for individual genomes Genome exploration Chromosome map Functional annotation*: Pathways Sequencing: Qualified reads Sequencing: Raw reads Sequence browser Transformation • Browse & search genome • Browse genome sequence: genes coordinates, features • Search genome for presence of specific genes, functions Characterization Structural annotation: Predicted genes Functional annotation: Annotated proteins Questions • Data • Structure & semantics • Logical: objects, correlations • Physical: files, formats, size • Processing • Methods, tools • Implementation • Data management • Computing infrastructure Assembly: Assembled reads
g1 g2 g3 g3 g2 g1 Genome 1 Function Profile g1 g2 g3 g4 Genome n Data interpretation across genomes Comparative Analysis Conserved genes Pathways Gene correlations Genome integration • Review, revise, improve quality of annotations • Explore /compare gene & functional content of genomes & metagenomes • Detect /correct annotation gaps & inconsistencies Genome fusion: pangenomes “OMICS” integration • Gene expression from: • Proteomics • Transcriptomics Genome k Genes Functions Questions • Data • Structure & semantics • Logical: data model • Physical: database system • Integration • Methods, tools • Implementation • Analysis operations • Flow (composition) • Performance
Biological data interpretation process • Questions • Gene prediction accuracy • Need re-annotation of all microbial genomes Genomes Phenotype prediction Structural annotation Scaffolds Ʃ genes Data Integration Genes Functional catalogue Phenotypes Functional annotation Phenotype rules • Questions • Multiple resources, methods • Potential conflicts, errors • Missing annotations • Requires integrated context (IMG ER) + tools for review/curation • Questions • Completeness & consistency of functional catalogue for genomes • Consistence: IMG terms & pathways • Completeness: IMG metabolic reconstruction • Expert curation in IMG ER Functions
IMG systems data flow: up to Dec 2011 Every 4 months On demand Monthly Instructor & Student Tools Monthly
IMG systems data flow: May 2012 7,989 Genomes 12.6 Mil genes Every 2-3 weeks • 9,991 Public Genomes • 22.5 mil genes • 1,293 Private Genomes • 6.1 mil genes On demand Bi weekly + 1,077 Samples: >120 Studies + 2.5 Bil Genes Instructor & Student Tools Monthly 357 Samples > 95 Studies +140 Mil Genes
IMG development focus • Large metagenome datasets in IMG/M ER • Extended underlying datastore • Revision of metagenome analysis tools • New User Workspace for handling sets of genomes, functions, genes • Long running operations transitioned to background execution mode • Content update process • New genomes added to IMG ER & IMG/M ER at the same time • Data distribution • Documentation
IMG data distribution genome.jgi.doe.gov