A Nonlinear Mapping for Data Structure Analysis

A Nonlinear Mapping for Data Structure Analysis John W. Sammon, Jr., IEEE Transaction on Computers, Vol. C-18, No. 5, 1969, pp. 401-409. Presenter : Wei-Shen Tai Advisor : Professor Chung-Chian Hsu 2007/4/4

Outline • Introduction • Nonlinear mapping • Some computer results • Relationship of NLM to other structure analysis algorithm • Limitations and extensions • Comments • MDS

Motivation • Data structure visualization • Provide a highly effective visualization method in the analysis of multivariate data. • Data structure refers to geometric relationships among subsets of the data vectors in the L-space.

Objective • Nonlinear mapping algorithm (NLM) • Based upon a point mapping of the NL-dimensional vectors from the L-space to a lower dimensional space such that the inherent structure of the data is approximately preserved under the mapping.

Nonlinear mapping • N vectors in an L-space designated Xi, i= 1, …, N and corresponding to these we define N vectors in a d-space (d = 2 or 3) designated Yi, i=l, …, N. • Let the distance between the vectors Xi and Xj in the L-space be defined by dij*=dist [Xi, Xj] and the distance between the corresponding vectors-Yi and Yj in the d-space be defined by dij= dist [Yi, Yj]. • A steepest descent procedure to search for a minimum of the error

Computer results

19-dimensional Gaussian simplex distribution Fig 7. result of principle eigenvector plots Fig 6. result of NLM

Experiments in document classification • A document classification space • Every document in the library was represented as a 17-dimensional vector. All of them are described a mapping of 1125 preselected words and phrases into the C-space. • Query 1 ~ 5 and their related documents are shown, respectively. • Documents considered relevant to a given request were clustered. • Documents tend to be uniformly distributed throughout the space. • Clusters 2 and 3 tend to overlap, yet they are well-separated from clusters 4 and 5. In general, the intercluster relationships seem consistent with their respective subject relationships.

Relationship to other related algorithm • Multidimensional Scaling • Find a configuration of points in a t-space such that the resultant inter-point distances preserve a monotonic relationship to a given set of inter-element similarities (or dissimilarities). • Deficiencies • Resulting cluster configuration is highly dependent upon a set of control parameters which must be fixed by the user. • Particularly sensitive to hyper-spherical structure and are inefficient in detecting more complex relationships in the data. • Do not exist really good ways for evaluating a resultant cluster configuration. • When two clusters are close, the vectors between tend to form a bridge and cause spurious mergers.

Nonlinear mapping vantage • A highly promising structure analysis algorithm • None control parameters require a priori knowledge. • Highly efficient in identifying complex data structures. • Easy to detect and identify data structure. • Dealing extraneous data and spurious mergers. • Simple and efficient.

Limitation and extension • Limitations • Reliability of the scatter diagram in displaying extremely complex high-dimensional structure. • Minimum mapping error is too large (E>>0.1) and the 2-dimensional scatter plot fails to portray the true structure. • Number of vectors that it can handle. • Limited at present to N< 250 vectors. • When N> 250, we suggest using a data compression technique to reduce the data set to less than 250 vectors. • Extension • On-Line Pattern Analysis and Recognition System (OLPARS)

Comments • Advantage • A visualization method for hyper-space data. • The distance of data space can be preserved and interpreted in geometric relationship in the low-dimension map. • Drawback • Easy to learn and hard to compute. • The computational cost seems quite high. • Application • Data structure visualization related applications.

MDS • A very simple example, using mileage distances between cities. • Start with a map, which illustrates the relative geographic locations of a set of American cities. • The map is a geometric model in which cities are represented as points in two-dimensional space. The distances between the points are proportional to the geographic proximities of the cities. • Using the map/model it is easy to construct a square matrix containing the distances between any pair of cities. • The matrix, itself, is analogous to the mileage chart that is often included with road maps.

MDS algorithm • MDS uses the matrix of distances (i.e., the “mileage chart”) as input data. • The output from MDS consists of two parts: • A model showing the cities as points in space, with the distances between the points proportional to the entries in the input data matrix (i.e., a map). • A goodness-of-fit measure showing how closely the geometric point configuration corresponds to the data values from the input data matrix.

A Nonlinear Mapping for Data Structure Analysis

A Nonlinear Mapping for Data Structure Analysis

Presentation Transcript

A Closed Form Solution for Nonlinear Tolerance Analysis

Nonlinear Analysis: Hyperelastic Material Analysis

A Plea for Adaptive Data Analysis: An Introduction to HHT for Nonlinear and Nonstationary Data

Nonlinear Analysis: Riks Analysis

A Plea for Adaptive Data Analysis: Instantaneous Frequencies and Trends For Nonstationary Nonlinear Data

NDAR Data Dictionary Data Structure Definition: Creating a Mapping File

Nonlinear Analysis: Impact

Nonlinear Analysis: Viscoelastic Material Analysis

Structure of a model for mineral resource potential mapping

A Nonlinear Mapping for Data Structure Analysis

DATA ANALYSIS Fitting RDC Data to Structure

Modeling Nonlinear Data

Localization with a Nonlinear, Noninvertible Sensor Mapping

Can we use nonlinear and selfconsistent models for data analysis?

Chapter 14. TERRAIN MAPPING AND ANALYSIS 14.1 Data for Terrain Mapping and Analysis 14.1.1 DEM

A Handy Data Structure

Data structure for a continuous-time event history analysis

Genome Structure/Mapping

Structure-Property Mapping

NDAR Data Dictionary Data Structure Definition: Creating a Mapping File

Data structure for a discrete-time event history analysis