200 likes | 452 Views
From Metagenomic Sample to Useful Visual . Anna Shcherbina Bioinformatics Challenge Day 02/02/2013. This work is sponsored by the Defense Threat Reduction Agency under Air Force Contract #FA8721-05-C-0002.
E N D
From Metagenomic Sample to Useful Visual Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 This work is sponsored by the Defense Threat Reduction Agency under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations, recommendations and conclusions are those of the authors and are not necessarily endorsed by the United States Government. Distribution Statement A: Approved for public release; distribution is unlimited.
The Opportunity • NGS instruments have recently given us the ability to characterize the microbiomes that we live in and that live in us. • We can get a step closer to this goal by creating a visualization program that facilitates manual data curation by a human.
Your Mission • Invent novel visualization approaches to represent metagenomic data. • Subgoals: • Pick out anomalies within a given dataset. • Generate time series representation of multiple datasets. • Compress data efficiently to allow visualization of huge datasets.
The Data (I) Metagenomic datasets (FASTQ format) from clinical and environmental samples. • Metagenome of the human oral cavity under healthy and diseased conditions, with a focus on supragingival dental plaque and cavities. • “oral_healthy” and “oral_diseased” datasets • Roche 454 • Nose/throat swab from Nicaraguan child with acute respiratory illness • “nicaragua” dataset • Illumina
The Data (II) • Skin surface from the palm of a human hand • “palm” dataset • Roche 454 • Human abscess sample of unknown etiology • “abscess” dataset • Illumina • Cultivated corn soil metagenome • “soil” dataset • Illumina
Our Processing Pipeline Data is available from each stage of the processing pipeline
Parsed BLAST File Example for a Single Hit S62.141238_159200 Query Name + Query Strand 1 Query Start 232 Query End NeisseriameningitidisQuery Organism Bacteria; Proteobacteria; Betaproteobacteria; Query Taxonomy 232 Identities 100 Percent 0 Number Gaps 0 Number Characters GU561418 Target Name - Target Strand 47 Target Start 278 Target End NeisseriasubflavaTarget Organism Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria.Target Taxonomy CTGGGCCGTGTCTCAGTCCCAGTGTGGC Query Sequence CTGGGCCGTGTCTCAGTCCCAGTGTGGC Target Sequence BLASTN Analysis Program bacteria.gdnaDatabase
Your Open-Source Toolkit • MEGAN4 • IMG/IM • KRONA (included with PhymmBl) • MG-RAST • METAREP • Mothur • Feel free to use any additional tools you think are useful.
MEGAN4-MEtaGenomoe ANalyzer • A simple lowest common ancestor algorithm assigns reads to taxa. • Taxonomic level reflects the degree of conservation of a sequence. • Dissects large datasets without assembly or the targeting of specific phylogenetic markers. • Graphical and statistical output for comparing different datasets.
MEGAN4-MEtaGenomoe ANalyzer Oral Diseased Virus Oral Healthy Virus Oral Diseased Bacteria Oral Healthy Bacteria
MEGAN4-MEtaGenomoe ANalyzer Oral healthy Vs. Oral diseased Virus Oral healthy Vs. Oral diseased Bacteria
IMG/IM – Integrated Microbial Genomes with Microbial Samples • Web interface: http://img.jgi.doe.gov/cgi-bin/m/main.cgi source: http://img.jgi.doe.gov/m/doc/about_index.html
IMG/IM Phylogenetic Distribution of Genes Based on Distribution of BLAST Hits source: http://img.jgi.doe.gov/m/doc/about_index.html
IMG/M Abundance Profile Overview source: http://img.jgi.doe.gov/m/doc/about_index.html
KRONA • KRONA allows hierarchal data to be explored with zoomable pie-charts. • Excel template or KRONA tools. • Support for several bioinformatics tools and raw data formats. source: http://sourceforge.net/p/krona/home/krona/
MG-RAST Oral Diseased source: http://blog.metagenomics.anl.gov/
MG-RAST Oral Healthy source: http://blog.metagenomics.anl.gov/
MG-RAST Oral Healthy Oral Diseased source: http://blog.metagenomics.anl.gov/
JCVI Metagenomics Reports (METAREP) • A Web 2.0 application to analyze and compare annotated metagenomic datasets. • Compare absolute and relative counts of multiple datasets at various functional and taxonomic levels. • Statistical tests, multidimensional scaling, heatmap and hierarchal clustering plots. Heatmap Plot Hierarchical Clustering Plot METASTAT Results source: http://blogs.jcvi.org/tag/metarep/
Mothur: 16S rRNA Sequence Analysis • A single platform for sequence alignment, pairwise distance calculation, distance matrix analysis. • Venn diagrams, community trees, heat maps, sample-based rarefaction curves.