Lab 5 Unsupervised and supervised clustering

Lab 5Unsupervised and supervisedclustering Feb 22th 2012 Daniel Fernandez Alejandro Quiroz

Outline • Unsupervised • Hierarchical clustering • Principal component analysis • Supervised • LIMMA package • Linear models for microarray data

Before any high level analysis…. • Download the data set used in lab 4 • Go to and download GSE10940 • Load the .CEL files and use the custom CDF file annotation used in lab 4: “drosophila2dmrefseqcdf” • Perform RMA normalization and obtain in a matrix the expression intensities • Obtain the genes that are up and down expressed with a fold change of 2. • Store the gene ides in: X.top

The data set • Secretory and transmembrane proteins traverse the endoplasmic reticulum (ER) and Golgi compartments for final maturation prior to reaching their functional destinations. • Members of the p24 protein family function in trafficking some secretory proteins in yeast and higher eukaryotes. • Yeast p24 mutants have minor secretory defects and induce an ER stress response that likely results from accumulation of proteins in the ER due to disrupted trafficking. • Test the hypothesis that loss of Drosophila melanogaster p24 protein function causes a transcriptional response characteristic of ER stress activation.

Supervised MethodLIMMA • Linear Models for MicroArray data • A package for differential expression analysis from microarray data. • Makes use of linear models to describe the expression of each gene. • Uses empirical Bayes and other shrinkage methods to borrow information across genes making the analyses stable even for experiments with small number of arrays.

LIMMA uses linear models to analyze microarray data. • The approach requires the definition of 2 matrices • Design matrix • Provides the representation on how the different factors are distributed in the data • It is assumed a linear model • Where yj contains the expression for gene j • The estimates of αj are provided by lmFit() • Contrast matrix • Allows the definition of the comparison between factors of interest • If the parameters are of interest • C is the contrast matrix • These parameters are estimated by contrast.fit()

Given the large number of linear models fits arising from a microarray there is a pressing need to take advantage of the parallel structure whereby the same model is fitted to each gene • Using a hierarchical framework, a moderate t-statistic is computed • Standard errors are shrunk towards a common value using a Bayesian model • This borrows information for the inference of individual genes • The degrees of freedom are increased • Reflexes the greater reliability to the smoothed standard errors

Unsupervised MethodHierarchical clustering • Hierarchical clustering • First, need to calculate all the pair wise distances • D=dist(t(X.top)) • Finally, perform the hierarchical clustering • H1=hclust(D,method=“single”) • H2=hclust(D,method=“complete”) • H3=hclust(D,method=“average”) • plot(Hi) • Is there something odd from the clustering?

Unsupervised MethodMDS • Multidimensional scaling (MDS) is a set of related statistical techniques to explore similarities in data*. • *Wikipedia.

Unsupervised Method Principal component • In R, the function prcomp performs principal component analysis • In our context, the idea is to visualize the impact of possible dimension reduction in GENES • Important: Remember that in prcomp, the genes have to be columns and the samples rows.

Lab 5 Unsupervised and supervised clustering

Lab 5 Unsupervised and supervised clustering

Presentation Transcript

Semi-Supervised Clustering I

Supervised Clustering --- Algorithms and Applications

Unsupervised Learning: Clustering

Classification (Supervised Clustering)

Algorithms for Distributed Supervised and Unsupervised Learning

Semi-Supervised Clustering II

Part V: Unsupervised Learning and Clustering

Supervised and unsupervised wrapper generation

unsupervised learning - clustering

Unsupervised learning: Clustering

Supervised learning vs. unsupervised learning

Unsupervised Learning and Clustering

Unsupervised models and clustering

Supervised Clustering

Semi-Supervised Clustering

Classification Supervised and unsupervised

Unsupervised and Supervised Tracking

Semi-Supervised Clustering

Unsupervised Optimal Fuzzy Clustering

Semi-Supervised Clustering

Unsupervised Learning and Clustering