480 likes | 501 Views
3.0 An Introduction to Microbiome Studies. MMIC 7050 Natalie Knox October 8 th , 2019. What we will cover today. Applications of NGS for microbiome/metagenomics Overall challenges in the field Microbial ecology terminology and concepts Sequencing approaches Metataxonomics
E N D
3.0 An Introduction to Microbiome Studies MMIC 7050 Natalie Knox October 8th, 2019
What we will cover today • Applications of NGS for microbiome/metagenomics • Overall challenges in the field • Microbial ecology terminology and concepts • Sequencing approaches • Metataxonomics • Shotgun metagenomics • Framing your research questions, study design, and statistics • Take home messages
Other “meta-” approaches metataxonomics(Marchesi and Ravel, 2015) • microbiome survey via marker gene sequencing metagenomics • unbiased and random shotgun sequencing of DNA in a sample metatranscriptomics • similar to metagenomics but based on RNA; gene profile of actively transcribing genes • what is expressed under certain conditions metaproteomics • using mass-spectrometry to generate profiles of protein expression and posttranslational modifications of proteins metabolomics • survey the metabolites in a given sample (also using mass spec)
Microbiome research is complicated • method dependant on research questions • not a “one size fits all” • lots of data • need lots of compute muscle and storage • need expertise in many different areas • biology • genomics • data sciences (bioinformatics, biostatistics, big data visualization) • microbial ecology
Other challenges • tools are “research grade” • reliant on databases • methodology biases • Lack of standards • sample collection and storage • DNA extraction • library preparation • sequencing technology • informatics approach
Microbial biodiversity WHO? • Identification HOW MANY? • Their proportions • Predominant vs. rare organisms WHAT DO THEY DO? • Metabolic phenotype • How do they interact with their environment, host, and other microbes • Complex network of functionalities
What are microbes • Bacteria • Archaea • Virus (non-living) • Parasite • Helminths (parasitic worms) • Fungi • yeast, mold, mushrooms • Protozoa • Ciliates • Amoebae • flagellates
Microbial Ecology Terminology Microbiota • Assemblage of microorganisms present in a given environment Metagenome • Genes and/or genomes of microbiota of a microbiota Microbiome • Includes both the microbes and their entire genetic content, for a given environment
Taxonomic Ranking System Life Bacteria Eubacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enterobacteriaceae Escherichia Escherichia coli Life Eukaryota Animalia Chordata Mammalia Primates Hominidae Homo Homo sapiens
History of microbiome studies • Realization that only 1% of organisms were culturable • Led to the development of culture independent methods to circumvent culturing
NGS-based microbiome profiling approaches • Metataxonomic • High-throughput sequencing of phylogenetically informative biomark amplicon • Metagenomics • shotgun sequencing of all genetic material
NGS-based microbiome profiling approaches Targeted amplicon sequencing Shotgun metagenomics Generates uniform reads for targeted Region of Interest (ROI) only Generates random reads for the entirety of sample-derived templates, including possibility of ‘host’ reads
Metataxonomicapproach:Overview • Useful for microbial community structure characterization • Selection of biomarker • Taxonomically informative and discriminatory & under low selective pressure • Broad taxonomic coverage • Ideally single copy • Conserved anchors • Length of amplicon appropriate for chosen NGS technology • Amplification efficiently • Highly curated and comprehensive reference database
Metataxonomicapproach:Biomarkers • Taxonomically informative biomarkers • Bacteria and archaea • 16S rRNA • cpn60 • rpoB • Microeukaryotes • 18S rRNA • Fungal • Internal transcribed spacer ITS) – ITS1 and ITS2 • Viruses • ?
Metataxonomic approach: The 16S rRNA biomarker • Bacteria and archaea specific • mitochondria, chloroplasts • Multiple copy numbers (anywhere from 1 to 15) dispersed throughout the genome • e.g. E. coli has ~ 7 copies of same operon (rrnA, rrnB, etc.) • rrnDB: https://rrndb.umms.med.umich.edu/ • A ribosomal RNA operon DB for bacteria and archaea • Eukaryotic rRNA operons typically occur in tandem arrays • The “ultimate chronometer” for phylogenetic classification of bacterial species - Woese 1997 Stoddard et al. 2015
Metataxonomic approach:Other biomarkers • ITS: Internal transcribed spacers (ITS1 and ITS2) for Fungi • Variable lengths: ~ 360/232 bp each (600-700 bp) • cpn60: chaperonin 60 (cpn60 group I) ~550 bp • rpoB: beta subunit of DNA polymerase ~370 bp • 18S rRNA ~ 1800 bp
Metataxonomic approach:Workflow for Illumina Targeted Sequencing Genomic Template Extraction 16S Variable Region Amplification High Throughput Sequencing …
Metataxonomic approach:Illumina Targeted Sequencing considerations • Low base diversity libraries • PhiX incorporation(~10-50%) • Cluster density • Amplicon length • Sequencing overlap
Metataxnomic approach:Overall workflow … … Less standardized approaches
Metataxnomic approach:Operational Taxonomic Units (OTUs) A ‘bin’ containing sequences of X % sequence similarity - a sorting process *Suggested guidelines: 97% represents Species level 95% ~equivalent to Genus level OTU1 OTU2 OTU3 OTU4
Metataxonomic approach:OTUs and phylogeny Goodrich 2014
Metataxonomic approach:Understanding diversity indices Alpha diversity (“within sample diversity”: richness and eveness) Beta diversity • (“diversity between samples”: • distance between samples) Richness: Observed species, Chao1 Richness and Eveness: Shannon index, Simpson index
Metataxonomicapproach:Challenges • Problematic for several reason • Bias in taxonomic coverage • Variability in copy number (in some cases) • PCR bias and generation of chimeric sequences • Low discriminatory power • Variable amplification efficiency • …
Metataxonomic approach:Other considerations • Lab reagent microbiome • e.g . DNA extraction reagents • DNA is everywhere! • Autoclaving ≠ DNA-free • Bleach and UV will destroy DNA • Separate your DNA extraction and PCR setup stations • Sample-to-sample contamination • Some reagents produced by bacteria • Sequence a negative control • Even if no band present on gel • Sequence a mock community
Metataxonomic approach:Other considerations • Low biomass samples more susceptible to sequencing contaminants • Aim for starting sample >103-104 cells • Careful sample collection (e.g. aseptic) • Random order processing (different kits for replicates) • Documentation is key (eg. lot numbers) • Critical evaluation of results
Metagenomics approach:Overview • Unrestricted sequencing of all DNA present in a sample • Eukaryotic, prokaryotic, virus • Sampling all genomic content • Sequencing depth • Sample matrix • Low or high biomass sample • Functional profiling • Assemblies challenging
Metagenomics approach:Preliminary analytical workflow Breitwieser, Lu, and Salzberg 2017
Metagenomics approach:Downstream analytical workflow No turnkey solution! Ruppe et al (2017) Sci Rep
Metagenomics approach: Assembly-based vs. read-based analysis • will depend on your research question • taxonomic profiling? • detection (presence or absence)? • functional potential (metabolic pathway) profiling? • known or novel organisms expected? • contamination detection?
Metagenomics approach: Taxonomic profiling approaches • Assignment of every read • aligning reads • mapping k-mers • Using complete genome • Aligning marker genes • Translating DNA and aligning protein sequences • amino acid sequences more conserved than DNA • can be used for better sensitivity and classification • slow
Metagenomics approach:Metagenomics de novo assemblies • Metagenome-assembled genomes (MAGs) • Difficult and complicated • nearly impossible • Uneven sequencing depth of organisms • assemblers assume sequencing coverage uniformity across genome • Untangling closely related organisms • Lack of deep coverage for all organisms • Quality control important
Metagenomics approach:Metagenomics de novo assemblies Resulting Contigs after Read Assembly: Ambiguous reads: to whichGenomedo they belong? Rarely acquire more than partial (unambiguous) genomes generated
Metagenomics approach: Contig binning • Attempts to bin contigs into operational taxonomic units (OTUs) • reference-based vs. reference-free • Features: • Compositional features • Tetranucleotide frequency • GC content • Abundance • Relative abundance • Copy number • Followed by reads mapped back to contigs • Annotation of bins Kang et al. 2015
Study design • Most critical point in microbiomics • Considerations • Power calculations • Pilot study • Sources of variability • Sample selection (e.g. appropriate controls) • Sample collections (e.g. temporal dynamic) • Technical considerations • Negative controls • Mock community • Ideally include spike-in
Data structure • Multi-dimensional • Generally have more features than samples • Non-normally distributed • Zero-inflated (excessive zero observations in taxa counts) • Sparse (taxa not present in all samples) • Overdispersion (variance is larger than the mean) • uneven (unbalanced) library sizes • How to deal with rare taxa – are they real?
Statistical analysis • What data structure to use • Counts • Proportions/ratios • Relative abundance • Normalized • Rarefied • Parametric vs. non-parametric methods • Multiple-testing correction – loss of power • Principal Component Analysis (PCA) • Microbial differential abundance testing
Stats take-home messages • Don’t torture the data till it talks • aka: p-hacking, data dredging, data fishing • Death of the p-value? • p-value is only one data point - you need to put it in biological context. It's about the weight of evidence • No perfect method • Bias is inevitable
Summary • Lack of standardizations • Garbage in = garbage out • Data QC, filtering, and de-noising critical • Databases are critical • Metagenomics data analysis (including statistics) should be tailored to your research questions • Extreme caution should be used when using bioinformatics software • Steep learning curve • many proficiencies required • Best to start with simulated datasets • Does it make sense biologically • Validations are critical • Going forward: multi-omics dataset integration studies