260 likes | 486 Views
Support Vector Clustering Algorithm. presentation by : Jialiang Wu. Reference paper and code website. Support Vector Clustering by Asa Ben-Hur, David Horn, Hava T. Siegelmann, and Vladimir Vapnik. www.cs.tau.ac.il/~borens/course/ml/cluster.html by Elhanan Borenstein, Ofer,and Orit.
E N D
Support Vector Clustering Algorithm presentation by : Jialiang Wu
Reference paper and code website • Support Vector Clustering by Asa Ben-Hur, David Horn, Hava T. Siegelmann, and Vladimir Vapnik. • www.cs.tau.ac.il/~borens/course/ml/cluster.htmlby Elhanan Borenstein, Ofer,and Orit.
Clustering Clustering algorithm groups data according to the distance between points. • Points are close to each other will be allocated to the same cluster. • Clustering is most effective is data has some geometric structure. • Outliers may cause unjust increase in cluster size or a fault clustering.
Support Vector Machine(SVM) • SVM maps the data from data space to a higher dimensional feature space through a suitable nonlinear mapping. • Data from two categories can always be separated by a hyper-plane.
Support Vector Machine(SVM) Main Idea: 1.Much of the geometry of the data in the embedding space (relative positions) is contained in all pairwise inner product. We can work in that space by specifying an inner product function between points in it. An explicit mapping is not necessary. 2. In many cases, the inner product have simple kernel representation and therefore can be easily evaluated.
Support Vector Clustering(SVC) • SVC map data from data space to higher dimensional feature space using a Gaussian kernel. • In feature space we look for the smallest sphere the encloses the image of the data. • When the sphere is mapped back to data space, it forms a set of contours, which enclose the data points.
Support Vector Clustering(SVC) • The clustering level is controlled by: 1) q---the width parameter of Gaussian kernel: q increase number of disconnected contour increase, number of clusters increase. 2) C--- the soft margin constant that allow sphere in feature space not to enclose all points.
Conclusions • points located close to one another tend to be allocated to the same cluster. • the number of clusters increase as q grows. • q depends considerably on the specific sample points(scaling, range, scatter,etc.) , there is no one q which is always appropriate. Use drill-down search for dataset is a solution but it's very time consuming. • When samples represent a relatively large number of classes, the SVC in less efficient.
My work on progress • Theoretical exploration: To find out whether there is restriction we can impose on the inner product such that the mapped back figure in the data space is connected (or has only one component). • Importance