70 likes | 249 Views
Gabe Al-Ghalith Jimmy Reeve Chapter 28, data mining. Machine Learning and Data Mining: A Case Study with Enterotypes.
E N D
Gabe Al-Ghalith Jimmy Reeve Chapter 28, data mining Machine Learning and Data Mining: A Case Study with Enterotypes
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-047-computational-biology-genomes-networks-evolution-fall-2008/lecture-notes/MIT6_047f08_lec04_slide04.pdfhttp://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-047-computational-biology-genomes-networks-evolution-fall-2008/lecture-notes/MIT6_047f08_lec04_slide04.pdf
Choosing Between Clustering and Classification • Clustering: summarize big data without a priori hypotheses • How would you categorize people based on their: • Blood-Type? • Gut bacteria? • Blood type calls for Classification • Consensus on blood groups: A, B, AB, O • Gut Bacteria calls for Clustering • No consensus on types or even number of categories http://www.nytimes.com/2011/04/21/science/21gut.html?_r=2&scp=2&sq=bacteria&st=cse&
Reasons to Consider Gut Bacteria • Contribute to diseases and response to treatments • Protective role, digestive role • We have 100s of genes that involve handling these bacteria • NPR.org.- “Gut bacteria might guide the workings of our minds” • Characterizing these bacteria can help us tease out these associations: • Personalized medicine and treatment http://www.npr.org/blogs/health/2013/11/18/244526773/gut-bacteria-might-guide-the-workings-of-our-minds http://www.gutmicrobiotawatch.org/gut-microbiota-info/
3 Distinct “Enterotypes” Revealed from Clustering Approach • Bacterial populations fell into 3 groups based on population composition • These three “enterotypes” each contain one representative member of gut bacteria (chief/first principle component) • Enterotype 1: Bacteroides, enriched in vitamins B5,B7,C • Enterotype 2: Prevotella, enriched in vitamins B1, B9 • Enterotype 3: Blautia (Ruminococcus): H2/CO2 to acetate • ~ 1500 known sequences used as filter for raw metagenomic reads. These are the “features.” A “sample” is the population composition in a subject's gut. • 85 metagenomes from one source, 154 from another, 33 from a third. Same 3 classes emerged upon clustering each. Enterotypes of the human gut microbiome. Nature 473: 174–180.
Clustering Methodology Used in the Original Paper • Karhunen–Loève transform (KLT) – PCA • Dimensionality reduction technique • Parallels with SQL3: “pivot” along axis with most variance, then final “roll up” based on distance metric • Some metrics: Euclidian, Manhattan, Vector angle, Pearsons, Jensen-Shannon... • Ade4 package in R uses “pam” algorithm (“K-medoid”) Enterotypes of the human gut microbiome. Nature 473: 174–180.
References • Cluster in R (ade4 hooks this) http://cran.r-project.org/web/packages/cluster/cluster.pdf • Ade4 primer on dimensionality reduction: cran.r-project.org/web/packages/ade4/index.html • “The human gut microbiome: are we our enterotypes?” Microbial Biotechnology (2011) 4(5), 550–553 • “Bacteria Divide People Into 3 Types, Scientists Say.” New York Times, April 20th, 2011. • Dan Knights. Seminar: “Diet and microbiome: Which came first, the chicken nuggets or the Eggerthella?” Sep 26, 2013