330 likes | 436 Views
FastMap : Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. Abstract. Describe a fast algorithm to map objects into points in some k-dimensional space, such that the dis-similarities are preserved. Abstract.
E N D
FastMap : Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets
Abstract • Describe a fast algorithm to map objects into points in some k-dimensional space, such that the dis-similarities are preserved.
Abstract • Thus, we can subsequently use fine-tuned spatial access methods (SAMs) to answer queries such as “query by example” or “all pairs query”.
Introduction • Not easy to extract k feature-extraction functions, which map to k-dimensional points • For instance, typed English words, what distance function should we consider to transform one string to the other?
Solutions • Old : Multi-Dimensional Scaling (MDS) • Unsuitable for indexing • Proposed : Fast Algorithm • Much faster • Allow indexing
Applications • Image and multimedia databases • Medical databases
Applications • String databases, e.g. OCR • Time series, e.g. financial data
Applications • Data mining and visualization applications
Desirable types of queries • query-by-example search a collection of objects to find the ones that are within a user-defined distance from the query object • all pairs query find the pairs of objects which are within distance from each other
Benefit of mapping objects • Accelerate the search time for queries, by employing SAMs like R*-trees and z-ordering • Help with visualization, clustering and data-mining
Ideal mapping fulfills… • Fast to compute: O(N) or O(N logN), but not O(N 2) • Preserve distances with little discrepancies • Should be very fast to map a new object
MDS • Used to discover the underlying (spatial) structure of a set of data items from the (dis)similarity information • Map objects to a k-dimensional space, so as to minimize the stress function
MDS • Stress function • it is the average difference between the distance of the "images" and the actual distance.
Drawbacks of MDS • Requires O(N2) time, which is impractical for large databases • Fast retrieval is questionable as MDS is not prepared for “query-by-example” operation
Definitions • k-d point Pi that corresponds to the object Oi, will be called the ‘image’ of object Oi. That is , Pi = (xi,1, xi,2,…, xi,k) • k-d space containing ‘images’ will be called target space
Proposed algorithm • Assumption: a domain expert has only provided us with a distance/dis-similarity function D (*, *) • For instance, the Euclidean distance between two feature vectors as the distance function between the corresponding objects
Proposed algorithm • Pretend that objects are indeed points in some unknown n-dimensional space, and to try to project these points on k mutually orthogonal directions • The challenge is to compute these projections from the distance matrix only
Proposed algorithm • Project the objects on a carefully selected “line” • Choose Oa and Ob be “pivot objects”
Proposed algorithm • compute the distance of each point from the pivot points using only information we know, i.e., the distances between objects
Proposed algorithm Oi Oa Ob Xi
Proposed algorithm • By Cosine Law, in any triangle OaOiOb db,i2 = da,i2 + da,b2– 2xida,b • di,j the shorthand for the distance D (Oi, Oj)
Proposed algorithm • By simple math manipulation Xi = (da,i2 + da,b2 - db,i2) / 2da,b • We can map objects into points on a line, preserving some of the distance information
Proposed algorithm • Solved 2-d space • Extend to higher dimensions
Proposed algorithm • Determines the coordinates of the N objects on a new axis, after each of k recursive calls • Record the “pivot objects” in each recursive call is to facilitate queries • Choose pivots objects by heuristic algorithm
Proposed algorithm • All steps are linear • Complexity is O(N k)
Experiments • Compare FastMap with MDS • speed and quality • Illustrate the visualization and clustering abilities • real and synthetic datasets
Comparison with MDS • Response time vs. no. of database size
Comparison with MDS • Response time vs. no. of dimensions k
Comparison with MDS • Response time vs. stress
Conclusion • A fast algorithm to map objects into points in k-d space • Accelerate searching by highly optimized SAMs e.g. R-trees, R*-trees etc. • Application of the algorithm to multimedia database, data-mining, clustering and document retrieval etc.
Reference • Christos Faloutsos, King-Ip (David) Lin FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets • Joseph B. Kruskal, Myron Wish Multidimensional scaling