Tutorial 8

Tutorial 8 Clustering

Clustering • General Methods • Unsupervised Clustering • Hierarchical clustering • K-means clustering • Expression data • GEO • UCSC • ArrayExpress • Tools • EPCLUST • Mev

Microarray - Reminder

Expression Data Matrix • Each column represents all the gene expression levels from a single experiment. • Each row represents the expression of a gene across all experiments.

Expression Data Matrix Each element is a log ratio: log2(T/R). T - the gene expression level in the testing sample R - the gene expression level in the reference sample

Microarray Data Matrix Black indicates a log ratio of zero, i.e. T=~R Green indicates a negative log ratio, i.e. T<R Grey indicates missing data Red indicates a positive log ratio, i.e. T>R

Microarray Data: Different representations T>R Log ratio Log ratio T<R Exp Exp

A real example ~500 genes 3 knockdown conditions To complicate to analyze without “help”

Microarray Data: Clusters

How to determine the similarity between two genes? (for clustering) • Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology23, 1499 - 1501 (2005) , • http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html

Unsupervised Clustering Hierarchical Clustering

Hierarchical Clustering genes with similar expression patterns are grouped together and are connected by a series of branches (dendrogram). 2 1 3 4 5 6 1 6 3 5 2 4 Leaves (shapes in our case) represent genes and the length of the paths between leaves represents the distances between genes.

Hierarchical clustering finds an entire hierarchy of clusters. If we want a certain number of clusters we need to cut the tree at a level indicates that number (in this case - four).

Hierarchical clustering result Five clusters

K-means Clustering An algorithm to classify the data into K number of groups. K=4

How does it work? 1 2 3 4 The centroid of each of the k clusters becomes the new means. k initial "means" (in this casek=3) are randomly selected from the data set (shown in color). k clusters are created by associating every observation with the nearest mean Steps 2 and 3 are repeated until convergence has been reached. The algorithm divides iteratively the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters.

Different types of clustering – different results

How to search for expression profiles • GEO (Gene Expression Omnibus) • http://www.ncbi.nlm.nih.gov/geo/ • Human genome browser • http://genome.ucsc.edu/ • ArrayExpress • http://www.ebi.ac.uk/arrayexpress/

Searching for expression profiles in the GEO Datasets - suitable for analysis with GEO tools Expression profiles by gene Probe sets Microarray experiments Groups of related microarray experiments

Clustering Download dataset Statistic analysis

Clustering analysis

Clustering Download dataset Statistic analysis

The expression distribution for different lines in the cluster

Searching for expression profiles in the Human Genome browser.

Keratine 10 is highly expressed in skin

ArrayExpress http://www.ebi.ac.uk/arrayexpress/

What can we do with all the expression profiles? Clusters! How? EPCLUST http://www.bioinf.ebc.ee/EP/EP/EPCLUST/

In the input matrix each column should represents a gene and each row should represent an experiment (or individual). Hierarchical clustering Edit the input matrix: Transpose,Normalize,Randomize K-means clustering

Data Clusters

In the input matrix each column should represents a gene and each row should represent an experiment (or individual). Hierarchical clustering Edit the input matrix: Transpose,Normalize,Randomize K-means clustering

Samples found in cluster Graphical representation of the cluster Graphical representation of the cluster

10 clusters, as requested

Multi experiment viewer http://www.tm4.org/mev/

Tutorial 8

Tutorial 8

Presentation Transcript

Tutorial 8:

Tutorial 8

TUTORIAL 8

Pragmatics Tutorial 8

HND – Tutorial 8

DIP – Tutorial 8

Tutorial 8

CS3223 Tutorial 8

CS590VC – Tutorial 8

CS3223 Tutorial 8

IEG3080 Tutorial 8

Tutorial 8 : Normalization

Tutorial 8

Tutorial 8

Tutorial 8

Tutorial 8

Tutorial 8

Tutorial 8

Tutorial 8

Window 8 tutorial

Tutorial 8