1 / 22

Metagenome analysis

Metagenome analysis. Natalia Ivanova. MGM Workshop February 2, 2012. 1. Metagenome definitions: a refresher course. Metagenome is a collective genome of microbial community, AKA microbiome (native, enriched, sorted, etc.).

branxton
Download Presentation

Metagenome analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metagenome analysis Natalia Ivanova MGM Workshop February 2, 2012

  2. 1. Metagenome definitions: a refresher course

  3. Metagenome is a collective genome of microbial community, AKA microbiome (native, enriched, sorted, etc.). Metagenomic library (or libraries) is constructed from isolated DNA (native, enriched, etc.). Metagenomic library can be single-end (AKA standard) or paired-end Metagenome definitions

  4. Single-end (standard) metagenomic library will produce contigs upon assembly (i. e. longer sequences based on overlap between reads) Any Ns found in contigs correspond to low quality bases Paired-end metagenomic library will produce scaffolds upon assembly (non-contigous joining of reads based on read pair information) Ns found in scaffolds correspond either to low quality bases or to gaps of unknown size ATGCAAAGGCCGCATCCAGCAGGTT TACGTTTCCGGCGTAGGTCGTCCAA ATGCAAAGGCCGCATCC TACGTTTCCGGCGTAGG Metagenome definitions AGCAGGTT NNNNNN TCGTCCAA

  5. Amplified and Unamplified Libraries Amplified Library Unamplified Library Fragmentation (1ug) Fragmentation (1ug) Double SPRI End repair / Phosphorylation End repair / Phosphorylation SPRI Clean Double SPRI A-tailing with Klenow exo- A-tailing with Klenow exo- SPRI Clean DNA Chip Heat Inactivation DNA Chip Adaptor Ligation Adaptor Ligation SPRI Clean PCR 10-cycle Amplification SPRI Clean DNA Chip SPRI Clean DNA Chip qPCR Quantification qPCR Quantification

  6. Unless the community has very low complexity (i. e. dominated by one or a few clonal populations), assembly at 100% nucleotide identity will be very fragmented. What to do with k-mer based assemblies? Use multiple k-mer settings, combine assemblies with an overlap-layout consensus assembler like minimus2 using minimal % identity of 95%. Tradeoff between overlap length and % identity. Metagenome definitions (contd): overlap = alignment of reads at x% identity

  7. Reasoning behind combining multiple assemblies

  8. Assembly Pipeline v.0.9 Trimming does not appear to be ideal for this process CPU time intensive, no known metagenomic Kmer prediction algorithm A snapshot of older (454-Illumina) metagenome assembly pipeline Picking best kmer – manual process 8

  9. Assembly of sequences at less than 100% identity => population contigs and scaffolds representing a consensus sequence of species population isolate contigspecies population contigs Metagenome definitions (contd): overlap = alignment of reads at x% identity

  10. Sequence coverage (AKA read depth) How many times each base has been sequenced => needs to be considered when calculated protein family abundance Per-contig average coverage Per-base coverage => per-gene coverage 2. Bins Scaffolds, contigs and unassembled reads can be binned into sets of sequences (bins) that likely originated from the same species population or a population from a broader taxonomic lineages 2 more important definitions

  11. Scaffolds and contigs are generated by assembly – not provided in IMG/M Sequence coverage can be computed by the assembler based on alignments it generates (preferable) or can be added later by aligning reads to contigs – the latter can be provided in IMG/M Bins are generated by binning software – not provided in IMG/M Scaffolds, contigs and unassembled reads are annotated with non-coding RNAs, repeats (CRISPRs), and protein coding genes (CDSs); the latter are assigned to protein families (COGs, Pfams, TIGRfams, KEGG Orthology, EC numbers, internal clusters) – is provided in IMG/M What IMG does and doesn’t do

  12. We prefer to assemble the data longer sequences -> better quality of gene prediction and functional annotation longer sequences -> chromosomal context and binning -> population-level analysis But we don’t provide assembly services except for metagenomes sequenced at the JGI we may be able to help with assembly of 454 we’re not equipped to assemble massive amounts of Illumina data http://galaxy.jgi-psf.org Contact person: Ed Kirton, ESKirton@lbl.gov IMG does not provide tools for analysis of 16S data from the metagenome itself we do assembly -> none of assembled 16S sequences is reliable BLASTn of reads matching conserved regions is misleading we do pyrotags for every metagenome sequenced at the JGI http://pyrotagger.jgi-psf.org What’s the difference between IMG and MG-RAST, IMG and CAMERA?

  13. 2. IMG/M features: divide and conquer (see also IMG/M -> Using IMG/M -> Using IMG/M -> IMG User Guide and IMG/M Addendum) http://img.jgi.doe.gov/m http://img.jgi.doe.gov/mer username: public password: public

  14. IMG/M User Interface MapAbout IMG/M -> Using IMG/M -> User Interface Map

  15. Dividing the contigs by GC content or length • Statistics Microbiome Details -> Genome Statistics -> DNA Scaffolds • Search Microbiome Details -> Scaffold Search

  16. gene lists gene counts histogram (phylum/class) summary statistics counts, lists, statistics histogram (family) counts, lists histogram (species) recruitment plots Dividing the genes phylogenetically: Phylogenetic Distribution Phylogenetic Distribution of Genes Microbiome Details -> Phylogenetic Distribution of Genes Components: • histograms • Protein Recruitment Plots • summary statistics tables • lists of genes

  17. Dividing the contigs: Scaffold Cart • Lists of contigs or genes in Gene Cart E. g. Microbiome Details -> Genome Statistics -> DNA Scaffolds -> scaffold counts Scaffold Cart Features: • Scaffold Export • Adding all genes to Gene Cart • Function Profile (against functions in Function Cart) • Histograms by GC content, length and gene count • Phylogenetic Distribution

  18. All Carts in IMG are interconnected Gene Cart Scaffold Cart Function Cart

  19. Dividing the genes by abundance/ by function • Abundance Profiles Compare Genomes -> Abundance Profiles Tools Components: Common parameters: • Normalization (none/scale for size) • Type of count (raw counts/estimated gene copies) • Type of protein family (COG, Pfam, Enzyme, TIGRfam)

  20. Other tools • Phylogenetic Marker COGs Find Functions -> Phylogenetic Marker COGs • SNP BLAST and SNP Vista Gene Page -> SNP BLAST -> SNP VISTA IMG/M exercises: http://genomebiology.jgi-psf.org/Content/MGM-11.Feb2012/agenda.html The first 3 pages are questions without answers; the rest is a cheat sheet

  21. Alignment-based tools MEGAN – BLAST+LCA http://www-ab.informatik.uni-tuebingen.de/software/megan MTR – BLAST+ MTR http://cs.ru.nl/gori/software/MTR.tar.gz SOrt-ITEMS – processed BLAST best hit http://metagenomics.atc.tcs.com/binning/SOrt-ITEMS CARMA and Web-CARMA – MSA + neighbor-joining tree http://webcarma.cebitec.uni-bielefeld.de Compositional tools PhyloPythia – 6-mers, SVM http://cbcsrv.watson.ibm.com/phylopythia.html TACOA – 2-6 mers, k-nearest neighbor classifier http://www.cebitec.uni-bielefeld.de/brf/tacoa/tacoa.html Phymm and PhymmBL – Interpolated Markov models (IMMs) http://www.cbcb.umd.edu/software/phymm/ ClaMS – DOR, DBC http://clams.jgi-psf.org Life outside IMG: binning tools

  22. Comparison of 2 samples MEGAN -http://www-ab.informatik.uni-tuebingen.de/software/megan STAMP - http://kiwi.cs.dal.ca/Software/STAMP Comparison of sets of samples ShotgunFunctionalizeR – R package for statistical analysis -http://shotgun.zool.gu.se METAREP – package from JCVI, includes multidimensional scaling, hierarchical clustering, etc -http://www.jcvi.org/metarep METASTATS – package for analysis of paired samples with replicates -http://metastats.cbcb.umd.edu/ LEfSE – package for comparison of multiple classes of samples with replicates - http://huttenhower.sph.harvard.edu/lefse/ Life outside IMG: statistical analysis tools

More Related