1 / 38

Metagenomics and the microbiome

Metagenomics and the microbiome. What is metagenomics ?. Looking at microorganisms via genomic sequencing rather than culturing Environmental use case: ag , biofuels, pollution monitoring Health use case: The human microbiome. You = 10 13 your cells + 10 14 bacterial cells

lucia
Download Presentation

Metagenomics and the microbiome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metagenomics and the microbiome

  2. What is metagenomics? • Looking at microorganisms via genomic sequencing rather than culturing • Environmental use case: ag, biofuels, pollution monitoring • Health use case: The human microbiome

  3. You = 1013 your cells + 1014 bacterial cells • More actionable genomics Why care about microbiome? Source: http://www.med-health.net/Best-Time-To-Take-Probiotics.html http://www.mayo.edu/research/labs/gut-microbiome/projects/fecal-microbiota-transplant-c-diff-colitis

  4. Why care about microbiome? • Diagnostic or modulatory implications in: • Obesity, Diabetes, Fatigue, Pain disorders • Anxiety, Depression, Autism • Antibiotic resistant bacteria • IBD and other gut disorders • Cardiac function, cancer

  5. Diseases and the microbiome Source: The human microbiome: at the interface of health and disease. Nature reviews genetics

  6. Why care about microbiome? Publications containing ‘microbiome’ by date on Science Direct

  7. Goal 1: Composition Source: The human microbiome: at the interface of health and disease, Nature Reviews Genetics http://huttenhower.sph.harvard.edu/metaphlan

  8. Diversity measures • Alpha diversity: how diverse is this population? Simpson’s index, Shannon’s index, etc • Difference in alpha diversity before and after antibiotics • Beta diversity: Taxonomical similarity between 2 samples • Finding compositional associations between disease cohort and microbial makeup

  9. Sequencing for diversity • Pyrosequencing the 16s ribosomal RNA subunit • < 10 taxa appear in > 95% of people in HMP • Recall the implicated diseases. Looks like GWAS common disease, small effect size + common disease, rare variant

  10. Goal 2: Functional profiling Source: The human microbiome: at the interface of health and disease. Nature reviews genetics

  11. Functional profiling • Current: Which genes are present and are being transcribed • In development: proteomics, metabolomics

  12. Sequencing for function • Whole microbiome sequencing • Avoids primer biases and is more kingdom agnostic • Assembly is hard, especially where reference genomes don’t exist

  13. Two big problems • Can’t understand the body without understanding the microbiome • Can’t understand the microbiome by only looking at bacteria • Read fragment assembly is very very hard in metagenomics

  14. Kingdom-Agnostic Metagenomics

  15. The players in your body • Your cells • Metabolites • Bacteria • Bacteriophages • Other viruses • Fungi

  16. That’s not complexity Source: A comprehensive map of the toll‐like receptor signaling network. Molecular Systems Biology

  17. Prokaryotic virome: bacteriophages • Infect prokaryotic bacteria • Transfer genetic material among prokaryotic bacteria • Rapidly evolving • Put constant selection pressure on bacterial microbiome

  18. Bacteriophages: deep sequencing results • 60% of sequences dissimilar from all sequence databases • More than 80% come from 3 families • Little intrapersonal variation • Large interpersonal variation, even among relatives • Diet affects community structure • Antibiotic resistance genes found in viral material

  19. Bacteriophages and function • Cross the intestinal barrier possibly affecting systemic immune response • Adhere to mucin glycoproteins potentially causing immune response in gut epithelium • IBD/Chron’s: relative increase in Caudovirales bacteriophages • Affect bacterial composition and/or host directly

  20. Eukaryotic virome • Fecal samples from healthy children shows complex community of typically pathogenic viruses • Includes plant RNA viruses from food • Anelloviruses and circoviruses present in nearly 100% by age 5, likely from industrial ag

  21. Eukaryotic viruses and function • Simian immunodeficient experiment showed enteric virome expansion • Increased gut permeability and caused intestinal lining inflammation • Acute diarrhea subjects showed novel viruses and highly divergent viruses with less than 35% similarity to catalogued viruses at amino acid level

  22. Meiofauna • Fungi, protazoa, and helminths (worms) • No experiments conducted with sampling to saturation, much more work to be done • 18S sequencing showed 66 genera of fungi in gut and fungi were found in 100% of samples • Most subjects had less than 10 genera • But high fungal diversity is bad: increases in IBD, increases with antibiotic usage

  23. But it’s very hard • Amplicon-based don’t work well for viruses • Heterogeneous sample-prep is required • Large differences in genome sizes from a few kb in viruses to 100+Mb in fungi • Small genomes+divergence require lots of coverage to get contigs

  24. Getting the whole picture Source: Meta'omic Analytic Techniques for Studying the Intestinal Microbiome. Gastroenterology.

  25. The assembly problem

  26. Isn’t assembly easy? • Recall: 500-1000 species of bacteria in the gut, but about 30 of them make up 99% of composition • 33% of bacterial microbiome not well-represented in reference databases, > 60% for bacteriophages

  27. Coverage • Coverage: mean number of reads per base • L=read length, N=number of reads, G=genome size • Problem, with 2nd gen WMS technologies, L is low and G is astronomical or unknown • Thus, “full or sometimes even adequate coverage may be unattainable” Source: A primer on metagenomics

  28. Sequence length and discovery Source: A primer on metagenomics

  29. All is not lost Can use rarefaction curves to estimate our coverage

  30. All is not lost • For composition analysis the phylogenetic marker regions (18S, 16S) work pretty well • For functional analysis: can still find ORFs fairly reliably and can be aligned to homologs in databases • Barring this, clustering and motif-finding yield some information

  31. Different sequencing approaches? • Single-cell microfluidics in the future • Now: hybrid long/short read approaches. “finishing” with Sanger sequencing • Pacific biosciences SMRT approach • SMRT errors are random, unbiased • De novo assembly is 99.999% concordant with reference genomes

  32. Select longest reads as seeds • Use seed reads to recruit short reads • Assemble using off the shelf assembly tools • Refine assembly using sequencer metadata HGAP: the SMRT assembly algorithm Source: Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods

  33. Seed selection • Order reads according to length • Considering reads above length L ~ 6kb • Rough end-pair align reads until ~20x coverage is reached • 17.7k seed reads, averaging 7.2kb in length, already at 86.9% accuracy compared to reference

  34. Recruiting short reads • Align all reads to the seed reads • Each read can be mapped to multiple seed reads, controlled by –bestn parameter • -bestn must be chosen so that the coverage of seeds + short aligned reads is about equal to the expected coverage of the sequenced genome • Use MSA and consensus to error correct long reads • Result is 17.2k reads of length 5.7kb with 99.9% accuracy

  35. Overlap layout consensus assembly Source: Overview of Genome Assembly Algorithms. NtinoKrampis. http://www.slideshare.net/agbiotec/overview-of-genome-assembly-algorithms

  36. Refinement • Use Quiver algorithm which looks at raw physical data from sequencer • Uses an HMM and observed data to tell classify base calls as genuine or spurious • Do a final consensus alignment, conditioned on Quiver’s probabilities • Final result: 17.2k reads, length of 5.7kb, accuracy of 99.999506%

  37. Summary • Most of the cells in your body aren’t yours • But looking at bacteria alone is insufficient • Expanding our view causes us to look for needles in haystacks which is beyond most conventional approaches • Motif-finding and hybrid approaches will work until 3rd gen sequencing arrives

  38. References • Cho, Ilseung, and Martin J. Blaser. "The human microbiome: at the interface of health and disease." Nature Reviews Genetics 13.4 (2012): 260-270. • Wooley, John C., Adam Godzik, and Iddo Friedberg. "A primer on metagenomics." PLoS computational biology 6.2 (2010): e1000667. • Chin, Chen-Shan, et al. "Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data." Nature methods 10.6 (2013): 563-569. • Human Microbiome Project Consortium. "Structure, function and diversity of the healthy human microbiome." Nature 486.7402 (2012): 207-214. • Norman, Jason M., Scott A. Handley, and Herbert W. Virgin. "Kingdom-agnostic metagenomics and the importance of complete characterization of enteric microbial communities." Gastroenterology 146.6 (2014): 1459-1469. • Morgan, X. C., and C. Huttenhower. "Meta'omic Analytic Techniques for Studying the Intestinal Microbiome." Gastroenterology (2014).

More Related