110 likes | 214 Views
Unsupervised clustering in mRNA expression profiles. D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis Computational Intelligence Laboratory (CILAB), Department of Mathematics, University of Patras, GR-26110 Patras, Greece
E N D
Unsupervised clustering in mRNA expression profiles D.K. Tasoulis, V.P. Plagianakos, and M.N. Vrahatis Computational Intelligence Laboratory (CILAB), Department of Mathematics, University of Patras, GR-26110 Patras, Greece University of Patras Artificial Intelligence Research Center (UPAIRC), University of Patras, GR-26110 Patras, Greece Computers in Biology and Medicine In Press, Corrected Proof, Available online 24 October 2005
K-Windows Clustering • Adaptation of K-means, originally proposed in 2002 by Vrahatis et. al. • Windowing technique improves speed and accuracy • Tries to place a d-dimensional window (box) containing all patterns that belong to a single cluster
K-Windows – Basic Concepts • Move windows to find cluster centers (fig a) • Select k points as centers of d-windows of size a. • Window means becomes new center. • Repeat until stopping criterion (movement of center). • Enlarge windows to determine cluster edges (fig b) • Enlarge one dimension by a specified percent. • Relocate window as above. • Keep only if increase in instances in window exceeds threshold
Unsupervised K-Windows (UKW) • Start with sufficiently large number of windows • Merge to automatically determine the number of clusters • For each pair of overlapping windows, calculate proportion of overlap for each window. • Large overlap, considered same cluster, W1 is deleted. • Many points in common, considered the same cluster. • Low overlap, considered two different clusters.
Experimental Setup • Leukemia dataset – well characterized • Default UKW parameters used • Supervised dimension reduction • Two previously published gene subsets and their union • Unsupervised dimension reduction • Biclustering with UKW • PCA • PCA and UKW hybrid
Supervised Feature Selection • Use two gene subsets selected in previously published papers using supervised techniques. • All algorithms did best on combined set, results below.
Unsupervised Feature Selection(Biclustering Technique) • Apply UKW to cluster genes, select one gene, closest to cluster center, as representative from each cluster. • Apply UKW to samples, using those genes (239). • UKW accuracy: 93.6% (ALL) and 76% (AML) • No results reported for other algorithms
Unsupervised Feature Selection(PCA Techniques) • PCA and scree plot to reduce features • Poor Performance • Hybrid PCA and UKW method • Partition genes using UKW • Transform each partition using PCA • Select representative factors from each cluster • UKW accuracy: 97.87% (ALL) and 88% (AML)
Default parameters • initial window size a=5 • enlargement threshold θe=0.8 • merging threshold θm=0.1 • coverage threshold θc=0.2 • variability threshold θv=0.02 • Link to article