510 likes | 1.19k Views
Metagenomics and the microbiome. What is metagenomics ?. Looking at microorganisms via genomic sequencing rather than culturing Environmental use case: ag , biofuels, pollution monitoring Health use case: The human microbiome. You = 10 13 your cells + 10 14 bacterial cells
E N D
What is metagenomics? • Looking at microorganisms via genomic sequencing rather than culturing • Environmental use case: ag, biofuels, pollution monitoring • Health use case: The human microbiome
You = 1013 your cells + 1014 bacterial cells • More actionable genomics Why care about microbiome? Source: http://www.med-health.net/Best-Time-To-Take-Probiotics.html http://www.mayo.edu/research/labs/gut-microbiome/projects/fecal-microbiota-transplant-c-diff-colitis
Why care about microbiome? • Diagnostic or modulatory implications in: • Obesity, Diabetes, Fatigue, Pain disorders • Anxiety, Depression, Autism • Antibiotic resistant bacteria • IBD and other gut disorders • Cardiac function, cancer
Diseases and the microbiome Source: The human microbiome: at the interface of health and disease. Nature reviews genetics
Why care about microbiome? Publications containing ‘microbiome’ by date on Science Direct
Goal 1: Composition Source: The human microbiome: at the interface of health and disease, Nature Reviews Genetics http://huttenhower.sph.harvard.edu/metaphlan
Diversity measures • Alpha diversity: how diverse is this population? Simpson’s index, Shannon’s index, etc • Difference in alpha diversity before and after antibiotics • Beta diversity: Taxonomical similarity between 2 samples • Finding compositional associations between disease cohort and microbial makeup
Sequencing for diversity • Pyrosequencing the 16s ribosomal RNA subunit • < 10 taxa appear in > 95% of people in HMP • Recall the implicated diseases. Looks like GWAS common disease, small effect size + common disease, rare variant
Goal 2: Functional profiling Source: The human microbiome: at the interface of health and disease. Nature reviews genetics
Functional profiling • Current: Which genes are present and are being transcribed • In development: proteomics, metabolomics
Sequencing for function • Whole microbiome sequencing • Avoids primer biases and is more kingdom agnostic • Assembly is hard, especially where reference genomes don’t exist
Two big problems • Can’t understand the body without understanding the microbiome • Can’t understand the microbiome by only looking at bacteria • Read fragment assembly is very very hard in metagenomics
The players in your body • Your cells • Metabolites • Bacteria • Bacteriophages • Other viruses • Fungi
That’s not complexity Source: A comprehensive map of the toll‐like receptor signaling network. Molecular Systems Biology
Prokaryotic virome: bacteriophages • Infect prokaryotic bacteria • Transfer genetic material among prokaryotic bacteria • Rapidly evolving • Put constant selection pressure on bacterial microbiome
Bacteriophages: deep sequencing results • 60% of sequences dissimilar from all sequence databases • More than 80% come from 3 families • Little intrapersonal variation • Large interpersonal variation, even among relatives • Diet affects community structure • Antibiotic resistance genes found in viral material
Bacteriophages and function • Cross the intestinal barrier possibly affecting systemic immune response • Adhere to mucin glycoproteins potentially causing immune response in gut epithelium • IBD/Chron’s: relative increase in Caudovirales bacteriophages • Affect bacterial composition and/or host directly
Eukaryotic virome • Fecal samples from healthy children shows complex community of typically pathogenic viruses • Includes plant RNA viruses from food • Anelloviruses and circoviruses present in nearly 100% by age 5, likely from industrial ag
Eukaryotic viruses and function • Simian immunodeficient experiment showed enteric virome expansion • Increased gut permeability and caused intestinal lining inflammation • Acute diarrhea subjects showed novel viruses and highly divergent viruses with less than 35% similarity to catalogued viruses at amino acid level
Meiofauna • Fungi, protazoa, and helminths (worms) • No experiments conducted with sampling to saturation, much more work to be done • 18S sequencing showed 66 genera of fungi in gut and fungi were found in 100% of samples • Most subjects had less than 10 genera • But high fungal diversity is bad: increases in IBD, increases with antibiotic usage
But it’s very hard • Amplicon-based don’t work well for viruses • Heterogeneous sample-prep is required • Large differences in genome sizes from a few kb in viruses to 100+Mb in fungi • Small genomes+divergence require lots of coverage to get contigs
Getting the whole picture Source: Meta'omic Analytic Techniques for Studying the Intestinal Microbiome. Gastroenterology.
Isn’t assembly easy? • Recall: 500-1000 species of bacteria in the gut, but about 30 of them make up 99% of composition • 33% of bacterial microbiome not well-represented in reference databases, > 60% for bacteriophages
Coverage • Coverage: mean number of reads per base • L=read length, N=number of reads, G=genome size • Problem, with 2nd gen WMS technologies, L is low and G is astronomical or unknown • Thus, “full or sometimes even adequate coverage may be unattainable” Source: A primer on metagenomics
Sequence length and discovery Source: A primer on metagenomics
All is not lost Can use rarefaction curves to estimate our coverage
All is not lost • For composition analysis the phylogenetic marker regions (18S, 16S) work pretty well • For functional analysis: can still find ORFs fairly reliably and can be aligned to homologs in databases • Barring this, clustering and motif-finding yield some information
Different sequencing approaches? • Single-cell microfluidics in the future • Now: hybrid long/short read approaches. “finishing” with Sanger sequencing • Pacific biosciences SMRT approach • SMRT errors are random, unbiased • De novo assembly is 99.999% concordant with reference genomes
Select longest reads as seeds • Use seed reads to recruit short reads • Assemble using off the shelf assembly tools • Refine assembly using sequencer metadata HGAP: the SMRT assembly algorithm Source: Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods
Seed selection • Order reads according to length • Considering reads above length L ~ 6kb • Rough end-pair align reads until ~20x coverage is reached • 17.7k seed reads, averaging 7.2kb in length, already at 86.9% accuracy compared to reference
Recruiting short reads • Align all reads to the seed reads • Each read can be mapped to multiple seed reads, controlled by –bestn parameter • -bestn must be chosen so that the coverage of seeds + short aligned reads is about equal to the expected coverage of the sequenced genome • Use MSA and consensus to error correct long reads • Result is 17.2k reads of length 5.7kb with 99.9% accuracy
Overlap layout consensus assembly Source: Overview of Genome Assembly Algorithms. NtinoKrampis. http://www.slideshare.net/agbiotec/overview-of-genome-assembly-algorithms
Refinement • Use Quiver algorithm which looks at raw physical data from sequencer • Uses an HMM and observed data to tell classify base calls as genuine or spurious • Do a final consensus alignment, conditioned on Quiver’s probabilities • Final result: 17.2k reads, length of 5.7kb, accuracy of 99.999506%
Summary • Most of the cells in your body aren’t yours • But looking at bacteria alone is insufficient • Expanding our view causes us to look for needles in haystacks which is beyond most conventional approaches • Motif-finding and hybrid approaches will work until 3rd gen sequencing arrives
References • Cho, Ilseung, and Martin J. Blaser. "The human microbiome: at the interface of health and disease." Nature Reviews Genetics 13.4 (2012): 260-270. • Wooley, John C., Adam Godzik, and Iddo Friedberg. "A primer on metagenomics." PLoS computational biology 6.2 (2010): e1000667. • Chin, Chen-Shan, et al. "Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data." Nature methods 10.6 (2013): 563-569. • Human Microbiome Project Consortium. "Structure, function and diversity of the healthy human microbiome." Nature 486.7402 (2012): 207-214. • Norman, Jason M., Scott A. Handley, and Herbert W. Virgin. "Kingdom-agnostic metagenomics and the importance of complete characterization of enteric microbial communities." Gastroenterology 146.6 (2014): 1459-1469. • Morgan, X. C., and C. Huttenhower. "Meta'omic Analytic Techniques for Studying the Intestinal Microbiome." Gastroenterology (2014).