60 likes | 193 Views
Modified Multi-Dimensional Scaling (MDS) Algorithm for Mining Gene Expression Patterns. X.J. Ge*, S. Yonamene*, Y.M. Mi*, S. Tsutsumi**, Y. Kobune**, H. Aburatani** and S. Iwata* *Research into Artifacts, Center for Engineering (RACE), The University of Tokyo,
E N D
Modified Multi-Dimensional Scaling (MDS) Algorithm for Mining Gene Expression Patterns X.J. Ge*, S. Yonamene*, Y.M. Mi*, S. Tsutsumi**, Y. Kobune**, H. Aburatani** and S. Iwata* *Research into Artifacts, Center for Engineering (RACE), The University of Tokyo, Komaba 4-6-1, Meguro-ku, Tokyo 153-8904, Japan **Department of Life Sciences, Research Center for Advanced Science and Technology (RCAST), The University of Tokyo, Komaba 4-6-1, Meguro-ku, Tokyo 153-8904, Japan ABSTRACT: The dataset of Golub et al. is analyzed by using dimensionality-reduction techniques, including principal component analysis (PCA), multi-dimensional scaling (MDS) and a modified MDS algorithm. These methods produce snapshots that are helpful for class discovery. Data Set 2: Golub, et al. Science 286: 531(1999).
Background Gene expression patterns can be considered as points in multi-dimensional Euclidean spaces. As the high dimensionality causes difficulty in analysis, it is helpful to have a low-dimensional, representation that captures some characteristics of the raw dataset. In principal component analysis (PCA), the raw data points are linearly projected to some plane with maximum variance. In Multi-dimensional Scaling (MDS), data points are represented on low-dimensional space such that the distances between points are preserved. MDS is nonlinear. Similarity matrix n-D data points 2-D map
Results of PCA A linear projection of gene expression patterns using the first two principal components. Samples of ALL and AML are roughly mapped into different clusters.
Results of conventional MDS MDS minimizes the objective function: is the distance between points in the x-y plot is the Euclidean distance between gene expression patterns. Mapping of gene expression patterns by multi-dimensional scaling (MDS). AML and two subtypes of ALL samples are found in different regions. But the classification is difficult without clinical information.
Modified MDS Goal: Enlarge trans-cluster distances to make separation easier. Physics background: condensation of atoms to form solids with minimum free energy. (Objective function) Mapping of gene expression patterns by a modified multi-dimensional scaling (MDS) algorithm. AML and two subtypes of ALL samples are found in different regions. (Dissimilarity)
Conclusions • The difference between AML and ALL can be discovered even using linear methods like principal component analysis(PCA). But for more complicated data structures, such as the difference between subtypes of ALL, PCA is not sufficient. • Multi-dimensional scaling (MDS) can produce 2-D maps of gene expression patterns that reveal more complicated data structures.