680 likes | 862 Views
CS 6293 Advanced Topics:. Chapter 12: Human Microbiome Analysis OMER ASLAN. Outline. An overview of the analysis of microbial communities Understanding the human microbiome from phylogenetic and functional Perspectives Methods and tools for calculating taxonomic and phylogenetic diversity
E N D
CS 6293 Advanced Topics: Chapter 12: Human Microbiome Analysis OMER ASLAN
Outline • An overview of the analysis of microbial communities • Understanding the human microbiome from phylogenetic and functional Perspectives • Methods and tools for calculating taxonomic and phylogenetic diversity • Metagenomic assembly and pathway analysis • Human Microbiome Project (HMP) • The impact of the microbiome on human host • Summary
An overview of the analysis of microbial communities • The human microbiome is the aggregate of microorganisms, a microbiome that resides on the surface and in deep layers of skin, in the saliva and oral mucosa, in the conjunctiva, and in the gastrointestinal tracts [1]. They include bacteria, archaea , fungi, and viruses.
Understanding the human microbiome from phylogenetic and functional Perspectives • Under normal circumstances, some of these organisms perform tasks that are useful for the human host such as helping to digest our food and produce certain vitamins, regulate our immune system, and keep us healthy by protecting us against disease-causing bacteria. • However, some of them cause some kinds of disease in some conditions includes inflammatory bowel disease to diabetes to antibiotic-resistant infection.
human microbiome cont. • According to some scientist humans are born only with their own eukaryotic human cells, but over the first several years of life, the skin surface, oral cavity, and gut are colonized by a tremendous diversity of bacteria (the majority), archaea, fungi, single-celled eukaryetes and viruses. • However, some other scientist includes Dr. Madan and a number of other researchers are now convinced mothers seed their fetuses with microbes during pregnancy.
human microbiome functional Perspectives. Recent searches show that the specific sites on the body, a different set of microbes may perform the same function for different people. For instance, on the tongues of two different people two entirely different sets of organisms will break down sugars in the same way. This suggests that medical science may be forced to abandon the one-microbe model of disease, and rather pay attention to the function of a group of microbes that has somehow gone awry.
Some characteristic of human microbiome. • Humans are host to > 100 trillion organisms • They outnumber human cells 10: 1 • Their combined genome is 100 fold greater • They comprise 700-800 separate species • The human microbiome makes up about one to two percent of the body mass of an adult. • Microbes contribute more genes responsible for human survival than humans' own genes. It is estimated that bacterial protein-coding genes are 360 times more abundant than human genes.
Some characteristic of human microbiome cont. So we can say that we are more Microbiome than human . .Short video about human microbiome: http://www.youtube.com/watch?v=5DTrENdWvvM
Bacteria . • Bacterium is a large domain of prokaryotic microorganisms. • Howard Hughes Medical Institute of Maryland reports that the largest concentration of bacteria in the humanbody is found in the intestines. They also inhabit the skin and mucosa, and gut. • if microbe numbers grow beyond their typical ranges or if microbes populate atypical areas of the body such as through poor hygiene or injury, disease can result.
Human Bacteria. • It is estimated that 500 to 1,000 species of bacteria live in the human gut [1] • Bacterial cells are much smaller than human cells, and there are at least ten times as many bacteria as human cells in the body. The mass of microorganisms are estimated to account for 1-3% total body mass [1]. • Many of the bacteria in the digestive tract, are able to break down certain nutrients such as carbohydrates that humans otherwise could not digest. The majority of these commensal bacteria (commensal relationship is between two organisms the one get benefit without affecting the other) survive in an environment with no oxygen.
Archaea • Archaea are a kingdom of single-celled microorganism. These microbes are prokaryotes , meaning they have no nucleus in the cell . • Archaea were initially classified as a bactaria, receiving the name archaebacteria), but this classification is very old . • Archaeal cells have unique properties separating them from the other two domains of life: Eukaryote and Bacteria .
Human Archaea • Archaea are present in the human gut, but, in contrast to the enormous variety of bacteria in this organ, the numbers of archaeal species are much less than bacteria. • Although a relationship has been proposed between the presence of some methanogens and human periodontal disease, no clear examples of archaeal pathogens are known
Fungal • Fungi, a large group of eukaryotic organisms, which is separate from plants, animals, protists, and bacteria. • Fungi, in particular yeasts, are present in the human gut. • The best-studied of these are Candida species. This is because of their ability to become pathogenic in immunocompromised hosts.
virus • A virus is a small infectious agent that replicates only inside the living cells of other organisms [6]. • Viruses can infect all types of life forms, from animals and plants to bacteria and archaea. • Basic structural characteristics, such as genome type, virion shape and replication site, generally share the same features among virus species within the same family. There are currently 21 families of viruses known to cause disease in humans
History of MicrobiomeStudies • Historically, microbial community were identified in situ by stains which targeted their physiological characteristics, such as the Gram stain . • This technique distinguish many broad clades of bacteria, however, were non-specific at lower taxonomic levels. Hence, microbiology was culture-dependent; it was necessary to grow an organism in the lab in order to study it. • Specific kinds of microbial species were detected by plating samples on specialized media selective for the growth of that organism. • This approach limited the range of organisms which could be detected actively grow in laboratory culture,
History of MicrobiomeStudies cont. • But it has been known that the majority of microbial species have never been grown in the laboratory, and options for studying and quantifying the uncultured were severely limited until the development of DNA based culture-independent methods in the 1980s. • Culture-independent technique analyzes the DNA extracted directly from a sample rather than from individually cultured microbes. This technique allow us to investigate many aspects of microbial communities includes taxonomic diversity, such as how many of which microbes are present in a community.
History of MicrobiomeStudies cont • One of the earliest targeted metagenomic assays for studying uncultured communities without prior DNA extraction was fluorescent in situ hybridization (FISH). • FISH probes can be targeted to almost any level of taxonomy from species to phylum. Even though FISH was initially limited to the 16S rRNA marker gene and therefore to diversity studies, it has since been expanded to functional gene probes that can be used to identify specific enzymes in communities. • However, this earliest technique remains a primarily low throughput, imaging-based technology
History of MicrobiomeStudies cont. • Even though DNA sequencing has existed since the 1970s, it was quite expensive because it required additional time and expense of clone library construction. But later it has been become economically feasible for most scientists to sequence the DNA of an entire environmental sample, and metagenomic studies have since become increasingly common.
Taxonomic Diversity • 1. The 16S rRNA Marker Gene. • 2 .Binning 16S rRNA Sequences into OTUs (Operational Taxonomic Unit ) I will explain this a bit later. • 3. Measuring Population Diversity
The 16S rRNA Marker Gene. • Generally microbial community consists of a collection of individual cells, each carrying a distinct complement of genomic DNA. However , communities are obviously differ from multicellular organisms in which their component cells may or may not carry identical genomes, although substantial subsets of these cells are typically assumed to be clonal.
16S rRNA cont. • Therefore, assign a frequency to each distinct genome within the community describing either the absolute number of cells in which it is carried or their relative abundance within the population. it is not practical to fully sequence every genome in every cell • Microbial ecology has defined a number of molecular markers that uniquely tag distinct genomes. A marker is a DNA sequence that identifies the genome that contains it, without the need to sequence the entire genome.
The 16S rRNA Marker Gene • Even though different markers can be chosen for analyzing different populations, several properties are desirable for a good marker. • A marker should be present in every member of a population. • A number of such markers have been defined, including ribosomal protein subunits. Small 16S ribosomal RNA subunit gene 1.5 Kbp gene • 16S ribosomal RNA (16SrRNA) is a component of the 30S small subunit of prokaryoticribosomes. The genes coding for it are referred to as16S rDNA and are used in reconstructing phylogenies
The 16S rRNA Marker Gene • Multiple sequences of 16S rRNA can exist within a single bacterium and It is relatively cheap and simple to sequence only the 16S sequences from a microbiome. Hence, describing the population as a set of 16S sequences and the number of times each was detected. • Sequences assayed in this manner have been characterized for a wide range of cultured species and environmental isolates; • These are stored and can be automatically matched against several databases including Ribosomal Database Project, GreenGenes, and Silva
Ribosomal Database Project • Ribosomal Database Project (RDP) is a curated database that offers ribosome data along with related programs and services. • The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, rRNAsecondary structure diagrams and various software packages for handling, analyzing and displaying alignments and trees. • The data are available via ftp and electronic mail. Certain analytic services are also provided by the electronic mail server. • http://rdp.cme.msu.edu/ you can access database.
Greengenes • Greengenes web application provides access to the 2011 version of the greengenes 16S rRNA gene sequence alignment for browsing, blasting, probing, and downloading. • The data and tools presented by greengenes can assist the researcher in choosing phylogenetically specific probes, interpreting microarray results, and aligning novel sequences. • You can download from http://greengenes.secondgenome.com/
SILVA • SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya). • You can visit http://www.arb-silva.de/ for SILVA
Binning 16S rRNA Sequencesinto OTUs • The challenge that appears in the analysis of rRNA genes is the precise definition of a ‘‘unique’’ sequence. Even though much of the 16S rRNA gene is highly conserved, several of the sequenced regions are variable or hypervariable, so small numbers of base pairs can change in a very short period of evolutionary time. • There is a fair chance that they will thus contain at least one sequencing error , because 16S regions are typically sequenced using only a single pass.
OTUs cont. • Some degree of sequence divergence is typically allowed - 95%, 97%, or 99% are sequence similarity cutoffs often used in practice and the resulting cluster of nearly-identical tags is referred to as an Operational Taxonomic Unit (OTU) or sometimes phylotype. • OTUs take the place of ‘‘species’’ in many microbiome diversity analyses because named species genomes are often unavailable for particular marker sequences.
OTUs cont. The assignment of sequences to OTUs is referred to as binning, and it can be performed by • 1) Unsupervised clustering of similar sequences • 2) Phylogenetic models incorporating mutation rates and evolutionary relationships • 3) Supervised methods (whole genome shotgun ) that directly assign sequences to taxonomic bins based on labeled training data
Measuring Population Diversity • Population diversity is a very important concept when dealing with OTUs or other taxonomic bins because this is critical for human health. • since a number of disease conditions have been shown to correlate with decreased microbiome diversity, presumably as one or a few microbes overgrow during immune or nutrient imbalance in a process, it can affect human health seriously. • Human intestinal contents appear to be highly personalized when considered in terms of microbial presence, absence, and abundance.
Measuring Population Diversity cont. • We can ask two well-defined questions when quantifying population diversity given that x bins have been observed in a sample of size y from a population of size z. • How many bins are expected to exist in the population; or, given that x bins exist in a population of size z. • If I have sequenced some amount of diversity, how much more exists in my microbiome? and, How much do I need to sequence to completely characterize my microbiome?
Measuring Population Diversity cont. • Measurement exists for calculating alpha diversity, the number (richness) and distribution (evenness) of taxa expected within a single population. • These give rise to figures known as collector’s or rarefaction curves, since increasing numbers of sequenced taxa allow increasingly precise estimates of total population diversity. • On the other hand, when comparing multiple populations’ beta diversity measures including absolute or relative overlap describe how many taxa are shared between them.
Alpha, beta diversity whereas an alpha diversity measure acts like a summary statistic of a single population, a beta diversity measure acts like a similarity score between populations, allowing analysis by sample clustering .
Alpha diversity isoften quantified by the Shannon Index Simpson Index where pi is the fraction of total species comprised by species i .
Beta diversity • Beta diversity can be measured by simple taxa overlap quantifiedby the Bray-Curtis dissimilarity • where Si and Sj are the number of species in populations i and j, and Cij is the total number of species at the location with the fewest species. Like similarity measures in expression array analysis, many alpha- and beta-diversity measures have been developed that each reveal slightly different aspects of community ecology
Shotgun Sequencing andMetagenomics • Metagenomics is a investigation of the microbes that inhabit oceans, soils, and the human body etc. with sequencing technologies. • The composition and function of uncultured microbial communities are often referred to collectively as ‘‘metagenomic,’’ • Metagenomics is the study of metagenomes, genetic material recovered directly from environmental samples.
Metagenomics • While traditional microbiology and microbial genome sequencing rely on cultivated clonalcultures, • early environmental gene sequencing cloned specific genes (often the 16S rRNA gene) to produce a profile of diversity in a natural sample. • Such work revealed that the vast majority of microbial biodiversity had been missed by cultivation-based methods. • Recent studies use "shotgun" Sanger sequencing or massively parallel pyrosequencing to get largely unbiased samples of all genes from all the members of the sampled communities.
Metagenomics • Because of its ability to reveal the previously hidden diversity of microscopic life, metagenomics offers a powerful lens for viewing the microbial world that has the potential to revolutionize understanding of the entire living world. • The term metagenomics also is used with some frequency to describe the entire body of high-throughput studies now possible with microbial communities, although it also refers more specifically to whole-metagenome shotgun (WMS) sequencing of genomic DNA fragments from a community’s metagenome.
Shotgun metagenomics • The approach, used to sequence many cultured microorganisms and the human genome, randomly shears DNA, sequences many short sequences, and reconstructs them into a consensus sequence. Shotgun sequencing reveals genes present in environmental samples. • This provides information both on which organisms are present and what metabolic processes are possible in the community. This can be helpful in understanding the ecology of a community, especially if multiple samples are compared to each other
Shotgun metagenomics • Shotgun metagenomics also is capable of sequencing nearly complete microbial genomes directly from the environment. • Because the collection of DNA from an environment is largely uncontrolled, the most abundant organisms in an environmental sample are most highly represented in the resulting sequence data. • To achieve the high coverage needed to fully resolve the genomes of under-represented community members, large samples, often prohibitively so, are needed.
Shotgun metagenomics cont. • On the other hand, the random nature of shotgun sequencing ensures that many of these organisms, which would otherwise go unnoticed using traditional culturing techniques, will be represented by at least some small sequence segments.
High-throughput sequencing • The first metagenomic studies conducted using high-throughput sequencing used massively parallel. • Three other technologies commonly applied to environmental sampling are the Ion Torrent Personal Genome Machine, the Illumina Genome Analyzer II and the Applied Biosystems solid system. • These techniques for sequencing DNA generate shorter fragments than Sanger sequencing. These read lengths are significantly shorter than the typical Sanger sequencing read length of ~750 bp.
Sequence pre-filtering • The first step of metagenomic data analysis requires the execution of certain pre-filtering steps, including the removal of redundant, low-quality sequences and sequences of probableeukaryotic origin (especially in metagenomes of human origin). • The methods available for the removal of contaminating eukaryotic genomic DNA sequences include Eu-Detect and DeConseq.
Metagenome Data Analysis • Unlike whole-genome shotgun (WGS) sequencing of individual organisms,metagenomes tend not to have a single finish line and have been successfully analyzed using a range of assembly techniques. • Metagenome-specific assembly algorithms have been proposed that reconstruct only the open reading frames from a population, recruiting highly sequence similar fragments on complete single gene sequences and avoiding assembly of larger contigs.
Metagenome Data Analysis cont • The most challenging option is to attempt full assemblies for complete genomes present in the community. • When successful, this has the obvious benefit of establishing synteny, structural variation, and opening up the range of tools developed for whole-genome analysis. • A key bioinformatic tradeoff in analyzing metagenomic WMS sequences, regardless of their degree of assembly, is whether they should be analyzed by homology.
Metagenome Data Analysis cont • An illustrative example is the task of determining which parts of each sequence read encode one or more genes, • i.e. gene finding or calling. By homology, each sequence can be BLASTed against a large database of reference genomes. • This method is robust to sequencing and assembly errors, but it is sensitive to the contents of the reference database. Conversely, de novo methods have been developed to directly bin and call genes within metagenomic sequences using DNA features alone.
Computational FunctionalMetagenomics • Computational functional metagenomics typically focus on the function of individual genes and gene products within a community and fall into one of two categories Top-down approaches and Bottom-up approaches. • Both approaches relies, first, on cataloging some or all of the gene products present in a community and assigning them molecular functions and/or biological roles in the typical sense of protein function predictions.
Computational FunctionalMetagenomics cont. • Top-down approaches screen a metagenome for a functional class of interest, for instance a particular enzyme family, transporter, pathway, or biological activity, essentially asking the question, ‘‘Does this community carry out this function and, if so, in what way? • On the other hand, Bottom-up approaches attempt to reconstruct profiles, either descriptive or predictive, of overall functionality within a community, typically relying on pathway and/or metabolic reconstructions and asking the question, What functions are carried out by this community?