240 likes | 371 Views
Clustered alignments of gene-expression time series data. Adam A. Smith, Aaron Vollrath , Cristopher A. Bradfield and Mark Craven Department of Biosatatistics & Medical Informatics, Department of Computer Sciences and Department of Oncology, University of Wisconsin, Madison, USA
E N D
Clustered alignments of gene-expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics & Medical Informatics, Department of Computer Sciences and Department of Oncology, University of Wisconsin, Madison, USA BIOINFROMATICS Vol. 25 pages i119-i127, 2009
Outline • Introduction • Method • SCOW • Clustered alignments • Results and Discussion • Conclusion
Introduction • Charactering and comparing temporal gene-expression responses is an important computational task for answering a variety of questions in biological studies. • One application : Toxicongenomics charactering the potential toxicity of chemicals
Introduction • answering similarity queries:assess similarity by determine the temporal correspondence between the query and treatment
Introduction • Tow issue: • First : (Treatment B) all genes should be aligned together.(Treatment C) some genes need to be warped separately • Second : • The best alignment does not account for the complete extent of both time series. • Allow a type of local alignments in which the end of one series is unaligned • Shorting the alignment
Introduction • Multi-segment alignment method : Shorting : The alignment path that represents shorting ends in the top row or the right column of the alignment space diagram, but not in the top-right cell.
Introduction • To solve “all genes are assumed to be aligned in lockstep with one another” • Calculated clustered alignments • Find clusters of gene such that genes within a cluster share a common alignment • Each cluster is aligned independently of the others • Similar to k-means • Alternates between assigning genes to cluster and recomputing the alignment for each cluster using the genes assigned to it • To solve “alignment for the complete extent of both time series” • Multi-segment alignment • shorting
Method – SCOW (Shorting COW) • COW (Nielsen et al., 1998) • a dynamic programming algorithm designed to find an optimal alignment between two series with multiple channels of information(such as genes). • Briefly, it aligns and scores two give time series based on their similarity • Two series as q (for query series) and d (for database series) • The series are partitioned into m segments, in which the i-th segments of the two series correspond to each other. • The score of a give alignment is the sum of correlations between corresponding segments
Method – SCOW COW search for good segment boundaries in only a limited area of alignment space. The segment are assumed to be of constant length andusually evenly spaced in q The vector K contains the coordinates of the knots (segment endpoints) in q Variable in d
Method – SCOW • The zero-indexed matrix , which is of dimensions m+1 by |d|+1. • The element contains the score of the best alignment of d from zero to x and q from zero to k. Pearson correlation q(a,b) : Subseries of q from a to b d is defined likewise. The predecessor function list valid starting locations in d for segments ending at x
Method – SCOW • The best score • a one-channel time series : the expression profile of a single genea multi-channel time series : the expression profile of a set of genesThe only difference between these two cases is in how the correlations are calculated. • COW is apt to align segments which differ greatly in magnitude.
Method – SCOW • SCOW • Search for optimal knots in both dimensions Second step : SCOW alternates horizontal and vertical movement of each knot until it converges. The first step : seach independently in both dimensions.
Method – SCOW First step Second step
Method – SCOW • The matrix is calculated when the algorithm searches for knots with respect to q and hold them constant with respect to d, while is calculated during the opposite case. • The predecessor function : a cone-shaped search apace
Method – SCOW • Score function : • Include terms that incur penalties for segment that involve stretching and significant difference in amplitude. The stretching si is defined as the ratio of lengths between qi and di, and ai is the amplitude ratio between the two as determined by a weighted least squares fitting procedure.
Method – Clustered alignment • Find sets of genes that would have very similar alignments if they were aligned independently. • a variant of traditional k-means cluster • Identifying clusters in which the genes have similar warpings • The genes in one of our clusters may have very different expression profiles.
Method – Clustered alignment The first step is to assign the initial alignment centroids, to select a representative set of gene alignments as the centroids. Subroutine Align returns the best alignment between two sereis based on a give set of genes. ScoreGene returns the score of two series when aligned using a given alignment and a specified gene. Record the best score so far that gene using one of the current centroidls.
Method – Clustered alignment It alternates between assigning genes to cluster and recomputing the alignment for each cluster using the genes assigned to it.
Results and Discussion • SCOW experiments • We construct queries for which we know the correct matching database treatments and their correct alignments. • The data we use comes from the EDGE toxicolog databases (http://edge.oncology.wisc.edu) • Dataset consists of 216 unique observations of microarray data, each of which represents the the values for 1600 different genes. • Time range from 6h up to 96h. • The data span 11 different treatments.
Results and Discussion • Assemble 10 queries for each treatment by randomly sub-sampling time series in our dataset • We measure two accuracy : • Treatment accuracy : identify the treatment from which each query series was extracted • Alignment accuracy : align the query points to their actual time points in the treatment.
Results and Discussion The top line : treatment accuracy with different orders of splines The middle line : alignment accuracy by adding the criterion that the average time error in the mapping is less than or equal to 24 h The bottom line : alignment accuracy where this tolerance is decreased to 12 h.
Results and Discussion • Conclusion : • Multi-segment alignment computed by SCOW, COW and Generative Multi-segment are superior to the alignment determined by ordinary dynamic time warping and the linear alignment method • SCOW find more accurate alignment than the other two multi-segment algorithms
Results and Discussion • Clustered alignment experiments
Conclusion • Present new method which advance in two ways : • Compute clustered alignments • A new multi-segment alignment method, called SCOW