1 / 47

3.0 An Introduction to Microbiome Studies

3.0 An Introduction to Microbiome Studies. MMIC 7050 Natalie Knox October 8 th , 2019. What we will cover today. Applications of NGS for microbiome/metagenomics Overall challenges in the field Microbial ecology terminology and concepts Sequencing approaches Metataxonomics

rrodgers
Download Presentation

3.0 An Introduction to Microbiome Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3.0 An Introduction to Microbiome Studies MMIC 7050 Natalie Knox October 8th, 2019

  2. What we will cover today • Applications of NGS for microbiome/metagenomics • Overall challenges in the field • Microbial ecology terminology and concepts • Sequencing approaches • Metataxonomics • Shotgun metagenomics • Framing your research questions, study design, and statistics • Take home messages

  3. Applications of microbiomics

  4. Other “meta-” approaches metataxonomics(Marchesi and Ravel, 2015) • microbiome survey via marker gene sequencing metagenomics • unbiased and random shotgun sequencing of DNA in a sample metatranscriptomics • similar to metagenomics but based on RNA; gene profile of actively transcribing genes • what is expressed under certain conditions metaproteomics • using mass-spectrometry to generate profiles of protein expression and posttranslational modifications of proteins metabolomics • survey the metabolites in a given sample (also using mass spec)

  5. Microbiome research is complicated • method dependant on research questions • not a “one size fits all”  • lots of data • need lots of compute muscle and storage • need expertise in many different areas • biology • genomics • data sciences (bioinformatics, biostatistics, big data visualization) • microbial ecology

  6. Other challenges • tools are “research grade” • reliant on databases • methodology biases • Lack of standards • sample collection and storage • DNA extraction • library preparation • sequencing technology • informatics approach

  7. Microbial biodiversity WHO? • Identification HOW MANY? • Their proportions • Predominant vs. rare organisms WHAT DO THEY DO? • Metabolic phenotype • How do they interact with their environment, host, and other microbes • Complex network of functionalities

  8. What are microbes • Bacteria • Archaea • Virus (non-living) • Parasite • Helminths (parasitic worms) • Fungi • yeast, mold, mushrooms • Protozoa • Ciliates • Amoebae • flagellates

  9. Microbial Ecology Terminology Microbiota • Assemblage of microorganisms present in a given environment Metagenome • Genes and/or genomes of microbiota of a microbiota Microbiome • Includes both the microbes and their entire genetic content, for a given environment

  10. Taxonomic Ranking System Life Bacteria Eubacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enterobacteriaceae Escherichia Escherichia coli Life Eukaryota Animalia Chordata Mammalia Primates Hominidae Homo Homo sapiens

  11. History of microbiome studies • Realization that only 1% of organisms were culturable • Led to the development of culture independent methods to circumvent culturing

  12. Scientific Nomenclature

  13. NGS-based microbiome profiling approaches • Metataxonomic • High-throughput sequencing of phylogenetically informative biomark amplicon • Metagenomics • shotgun sequencing of all genetic material

  14. NGS-based microbiome profiling approaches Targeted amplicon sequencing Shotgun metagenomics Generates uniform reads for targeted Region of Interest (ROI) only Generates random reads for the entirety of sample-derived templates, including possibility of ‘host’ reads

  15. Metataxonomic approach

  16. Metataxonomicapproach:Overview • Useful for microbial community structure characterization • Selection of biomarker • Taxonomically informative and discriminatory & under low selective pressure • Broad taxonomic coverage • Ideally single copy • Conserved anchors • Length of amplicon appropriate for chosen NGS technology • Amplification efficiently • Highly curated and comprehensive reference database

  17. Metataxonomicapproach:Biomarkers • Taxonomically informative biomarkers • Bacteria and archaea • 16S rRNA • cpn60 • rpoB • Microeukaryotes • 18S rRNA • Fungal • Internal transcribed spacer ITS) – ITS1 and ITS2 • Viruses • ?

  18. Metataxonomic approach: The 16S rRNA biomarker • Bacteria and archaea specific  • mitochondria, chloroplasts • Multiple copy numbers (anywhere from 1 to 15) dispersed throughout the genome • e.g. E. coli has ~ 7 copies of same operon (rrnA, rrnB, etc.) • rrnDB: https://rrndb.umms.med.umich.edu/ • A ribosomal RNA operon DB for bacteria and archaea • Eukaryotic rRNA operons typically occur in tandem arrays • The “ultimate chronometer” for phylogenetic classification of bacterial species - Woese 1997 Stoddard et al. 2015

  19. Metataxonomicapproach: 16S rRNAbiomarker

  20. Metataxonomic approach:Other biomarkers • ITS: Internal transcribed spacers (ITS1 and ITS2) for Fungi • Variable lengths: ~ 360/232 bp each (600-700 bp) • cpn60: chaperonin 60 (cpn60 group I) ~550 bp • rpoB: beta subunit of DNA polymerase ~370 bp • 18S rRNA ~ 1800 bp

  21. Metataxonomic approach:Workflow for Illumina Targeted Sequencing Genomic Template Extraction 16S Variable Region Amplification High Throughput Sequencing …

  22. Metataxonomic approach:Illumina Targeted Sequencing considerations • Low base diversity libraries • PhiX incorporation(~10-50%) • Cluster density • Amplicon length • Sequencing overlap

  23. Metataxnomic approach:Overall workflow … … Less standardized approaches

  24. Metataxnomic approach:Operational Taxonomic Units (OTUs) A ‘bin’ containing sequences of X % sequence similarity - a sorting process *Suggested guidelines: 97% represents Species level 95% ~equivalent to Genus level OTU1 OTU2 OTU3 OTU4

  25. Metataxonomic approach:OTUs and phylogeny Goodrich 2014

  26. Metataxonomic approach:Understanding diversity indices Alpha diversity (“within sample diversity”: richness and eveness) Beta diversity • (“diversity between samples”: • distance between samples) Richness: Observed species, Chao1 Richness and Eveness: Shannon index, Simpson index

  27. Metataxonomicapproach:Challenges • Problematic for several reason • Bias in taxonomic coverage • Variability in copy number (in some cases) • PCR bias and generation of chimeric sequences • Low discriminatory power • Variable amplification efficiency • …

  28. Metataxonomic approach:Other considerations • Lab reagent microbiome • e.g . DNA extraction reagents • DNA is everywhere! • Autoclaving ≠ DNA-free • Bleach and UV will destroy DNA • Separate your DNA extraction and PCR setup stations • Sample-to-sample contamination • Some reagents produced by bacteria • Sequence a negative control • Even if no band present on gel • Sequence a mock community

  29. Metataxonomic approach:Other considerations • Low biomass samples more susceptible to sequencing contaminants • Aim for starting sample >103-104 cells • Careful sample collection (e.g. aseptic) • Random order processing (different kits for replicates) • Documentation is key (eg. lot numbers) • Critical evaluation of results

  30. Metagenomics approach

  31. Metagenomics approach:Overview • Unrestricted sequencing of all DNA present in a sample • Eukaryotic, prokaryotic, virus • Sampling all genomic content • Sequencing depth • Sample matrix • Low or high biomass sample • Functional profiling • Assemblies challenging

  32. Metagenomics approach:Preliminary analytical workflow Breitwieser, Lu, and Salzberg 2017

  33. Metagenomics approach:Downstream analytical workflow No turnkey solution! Ruppe et al (2017) Sci Rep

  34. Metagenomics approach: Assembly-based vs. read-based analysis • will depend on your research question • taxonomic profiling? • detection (presence or absence)? • functional potential (metabolic pathway) profiling? • known or novel organisms expected? • contamination detection?

  35. Metagenomics approach: Taxonomic profiling approaches • Assignment of every read • aligning reads • mapping k-mers • Using complete genome • Aligning marker genes • Translating DNA and aligning protein sequences • amino acid sequences more conserved than DNA • can be used for better sensitivity and classification • slow

  36. Metagenomics approach:Metagenomics de novo assemblies • Metagenome-assembled genomes (MAGs) • Difficult and complicated • nearly impossible • Uneven sequencing depth of organisms • assemblers assume sequencing coverage uniformity across genome • Untangling closely related organisms • Lack of deep coverage for all organisms • Quality control important

  37. Metagenomics approach:Metagenomics de novo assemblies Resulting Contigs after Read Assembly: Ambiguous reads: to whichGenomedo they belong? Rarely acquire more than partial (unambiguous) genomes generated

  38. Metagenomics approach: Contig binning • Attempts to bin contigs into operational taxonomic units (OTUs) • reference-based vs. reference-free • Features: • Compositional features • Tetranucleotide frequency • GC content • Abundance • Relative abundance • Copy number • Followed by reads mapped back to contigs • Annotation of bins Kang et al. 2015

  39. Study design, methodology, analytical plan

  40. Study design • Most critical point in microbiomics • Considerations • Power calculations • Pilot study • Sources of variability • Sample selection (e.g. appropriate controls) • Sample collections (e.g. temporal dynamic) • Technical considerations • Negative controls • Mock community • Ideally include spike-in

  41. Selecting the right tool for the job

  42. Data structure • Multi-dimensional • Generally have more features than samples • Non-normally distributed • Zero-inflated (excessive zero observations in taxa counts) • Sparse (taxa not present in all samples) • Overdispersion (variance is larger than the mean) • uneven (unbalanced) library sizes • How to deal with rare taxa – are they real?

  43. Statistical analysis • What data structure to use • Counts • Proportions/ratios • Relative abundance • Normalized • Rarefied • Parametric vs. non-parametric methods • Multiple-testing correction – loss of power • Principal Component Analysis (PCA) • Microbial differential abundance testing

  44. Stats take-home messages • Don’t torture the data till it talks • aka: p-hacking, data dredging, data fishing • Death of the p-value? • p-value is only one data point - you need to put it in biological context. It's about the weight of evidence • No perfect method • Bias is inevitable

  45. Summary

  46. Summary • Lack of standardizations • Garbage in = garbage out • Data QC, filtering, and de-noising critical • Databases are critical • Metagenomics data analysis (including statistics) should be tailored to your research questions • Extreme caution should be used when using bioinformatics software • Steep learning curve • many proficiencies required • Best to start with simulated datasets • Does it make sense biologically • Validations are critical • Going forward: multi-omics dataset integration studies

More Related