1 / 11

Hierarchical Clustering in R

Hierarchical Clustering in R. Quick R Tips. How to find out what packages are available library() How to find out what packages are actually installed locally (.packages()). Hierarchical Clustering. A type of cluster analysis

Download Presentation

Hierarchical Clustering in R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchical Clustering in R

  2. Quick R Tips • How to find out what packages are available • library() • How to find out what packages are actually installed locally • (.packages())

  3. Hierarchical Clustering • A type of cluster analysis • There is both “divisive” and “agglomerative” HC…agglomerative is most commonly used • Group objects that are “close” to one another based on some distance/similarity metric • Clusters are created and linked based on a metric that evaluates the cluster-to-cluster distance • Results are displayed as a dendrogram

  4. Step 1: Data matrix • First you need a numeric matrix • Typical array data set will have samples as columns and genes as rows • We want to be sure our data are in the form of an expression matrix • Use Biobase library/package • See http://www.bioconductor.org/packages/2.2/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf > exprs<-as.matrix(data, header=TRUE, sep="\t", row.names=1, as.is=TRUE)

  5. Step 2: Calculate Distance Matrix • Default dist() method in R uses rows as the vectors..but we want the distance between samples….i.e., the columns of our matrix. • There is a handy package to help us at MD Anderson called oompaBase source("http://bioinformatics.mdanderson.org/OOMPA/oompaLite.R") oompaLite() oompainstall(groupName="all") • Once installed, be sure to locally activate the libraries library(oompaBase) library(ClassDiscovery) library(ClassComparison) • oompaBase also requires the mclust and cobs packages…download these from CRAN

  6. Use the function distanceMatrix() to create a distance matrix of your samples…. • Uses the expression set created in Step 1 as input • Remember that there are many different types of distance metrics to choose from! • See help(distanceMatrix) x<- distanceMatrix(exprs,'pearson')

  7. Step 3: Cluster • Use the hclust() function to create a hierarchical cluster based on your distance matrix, x, created in Step 2. > y<-hclust(x,method="complete") > plot(y)

  8. Testing for Differential Gene Expression with the T-test

  9. Get the multtest package from CRAN • Package contains data from the Golub leukemia microarray data set (ALL v AML) • 38 arrays • 27 from lymphoblastic • 11 from myeloid http://people.cryst.bbk.ac.uk/wernisch/macourse/

  10. library(multtest) • data(golub) • golub.cl • Generate the T statistic • teststat <-mt.teststat(golub, golub.cl) • Convert into P-values • rawp0 <-2*pt(abs(teststat),lower.tail=F, df=38-2) • Correct for multiple testing and show the ten most significant genes • procs <-c(“Bonferroni”, “BH”) • res<-mt.rawp2adjp((rawp0), procs) • res$adjp[1:10,] http://people.cryst.bbk.ac.uk/wernisch/macourse/

More Related