Clustered alignments of gene-expression time series data

Clustered alignments of gene-expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics & Medical Informatics, Department of Computer Sciences and Department of Oncology, University of Wisconsin, Madison, USA BIOINFROMATICS Vol. 25 pages i119-i127, 2009

Outline • Introduction • Method • SCOW • Clustered alignments • Results and Discussion • Conclusion

Introduction • Charactering and comparing temporal gene-expression responses is an important computational task for answering a variety of questions in biological studies. • One application : Toxicongenomics charactering the potential toxicity of chemicals

Introduction • answering similarity queries:assess similarity by determine the temporal correspondence between the query and treatment

Introduction • Tow issue: • First : (Treatment B) all genes should be aligned together.(Treatment C) some genes need to be warped separately • Second : • The best alignment does not account for the complete extent of both time series. • Allow a type of local alignments in which the end of one series is unaligned • Shorting the alignment

Introduction • Multi-segment alignment method : Shorting : The alignment path that represents shorting ends in the top row or the right column of the alignment space diagram, but not in the top-right cell.

Introduction • To solve “all genes are assumed to be aligned in lockstep with one another” • Calculated clustered alignments • Find clusters of gene such that genes within a cluster share a common alignment • Each cluster is aligned independently of the others • Similar to k-means • Alternates between assigning genes to cluster and recomputing the alignment for each cluster using the genes assigned to it • To solve “alignment for the complete extent of both time series” • Multi-segment alignment • shorting

Method – SCOW (Shorting COW) • COW (Nielsen et al., 1998) • a dynamic programming algorithm designed to find an optimal alignment between two series with multiple channels of information(such as genes). • Briefly, it aligns and scores two give time series based on their similarity • Two series as q (for query series) and d (for database series) • The series are partitioned into m segments, in which the i-th segments of the two series correspond to each other. • The score of a give alignment is the sum of correlations between corresponding segments

Method – SCOW COW search for good segment boundaries in only a limited area of alignment space. The segment are assumed to be of constant length andusually evenly spaced in q The vector K contains the coordinates of the knots (segment endpoints) in q Variable in d

Method – SCOW • The zero-indexed matrix , which is of dimensions m+1 by |d|+1. • The element contains the score of the best alignment of d from zero to x and q from zero to k. Pearson correlation q(a,b) : Subseries of q from a to b d is defined likewise. The predecessor function list valid starting locations in d for segments ending at x

Method – SCOW • The best score • a one-channel time series : the expression profile of a single genea multi-channel time series : the expression profile of a set of genesThe only difference between these two cases is in how the correlations are calculated. • COW is apt to align segments which differ greatly in magnitude.

Method – SCOW • SCOW • Search for optimal knots in both dimensions Second step : SCOW alternates horizontal and vertical movement of each knot until it converges. The first step : seach independently in both dimensions.

Method – SCOW First step Second step

Method – SCOW • The matrix is calculated when the algorithm searches for knots with respect to q and hold them constant with respect to d, while is calculated during the opposite case. • The predecessor function : a cone-shaped search apace

Method – SCOW • Score function : • Include terms that incur penalties for segment that involve stretching and significant difference in amplitude. The stretching si is defined as the ratio of lengths between qi and di, and ai is the amplitude ratio between the two as determined by a weighted least squares fitting procedure.

Method – Clustered alignment • Find sets of genes that would have very similar alignments if they were aligned independently. • a variant of traditional k-means cluster • Identifying clusters in which the genes have similar warpings • The genes in one of our clusters may have very different expression profiles.

Method – Clustered alignment The first step is to assign the initial alignment centroids, to select a representative set of gene alignments as the centroids. Subroutine Align returns the best alignment between two sereis based on a give set of genes. ScoreGene returns the score of two series when aligned using a given alignment and a specified gene. Record the best score so far that gene using one of the current centroidls.

Method – Clustered alignment It alternates between assigning genes to cluster and recomputing the alignment for each cluster using the genes assigned to it.

Results and Discussion • SCOW experiments • We construct queries for which we know the correct matching database treatments and their correct alignments. • The data we use comes from the EDGE toxicolog databases (http://edge.oncology.wisc.edu) • Dataset consists of 216 unique observations of microarray data, each of which represents the the values for 1600 different genes. • Time range from 6h up to 96h. • The data span 11 different treatments.

Results and Discussion • Assemble 10 queries for each treatment by randomly sub-sampling time series in our dataset • We measure two accuracy : • Treatment accuracy : identify the treatment from which each query series was extracted • Alignment accuracy : align the query points to their actual time points in the treatment.

Results and Discussion The top line : treatment accuracy with different orders of splines The middle line : alignment accuracy by adding the criterion that the average time error in the mapping is less than or equal to 24 h The bottom line : alignment accuracy where this tolerance is decreased to 12 h.

Results and Discussion • Conclusion : • Multi-segment alignment computed by SCOW, COW and Generative Multi-segment are superior to the alignment determined by ordinary dynamic time warping and the linear alignment method • SCOW find more accurate alignment than the other two multi-segment algorithms

Results and Discussion • Clustered alignment experiments

Conclusion • Present new method which advance in two ways : • Compute clustered alignments • A new multi-segment alignment method, called SCOW

Clustered alignments of gene-expression time series data

Clustered alignments of gene-expression time series data

Presentation Transcript

Project 3: Cluster Analysis of Time Series Gene Expression Data

Continuous Representations of Time Gene Expression Data

Analysis of Gene Expression Data

A comparative approach for gene network inference using time-series gene expression data

Clustering Gene Expression Data

Clustering of Gene Expression Time Series with Conditional Random Fields

Accurate Estimation of Gene Expression Levels from Digital Gene Expression Sequencing Data

Classification of Microarray Gene Expression Data

Alignment and classification of time series gene expression in clinical studies

Analysis of time-course gene expression data

Interactive Exploration of Coherent Patterns in Time-series Gene Expression Data

Analysis of Time-Series Gene Expression Data : Methods, Challenges, and Opportunities

Introduction to Time-Course Gene Expression Data

Clustering short time series gene expression data

Clustering Gene Expression Data

Gene Expression Data

More Analysis of Gene Expression Data

Classification of Microarray Gene Expression Data

Soft clustering of gene expression data

Clustering Gene Expression Data

Clustering Gene Expression Data