170 likes | 263 Views
Non Parametric Methods. Pattern Recognition and Machine Learning. Debrup Chakraborty. Nearest Neighbor classification. Given: Given a labeled sample of n feature vectors ( call X) A distance measure (say the Euclidian Distance). To find:
E N D
Non Parametric Methods Pattern Recognition and Machine Learning Debrup Chakraborty
Nearest Neighbor classification Given: Given a labeled sample of n feature vectors ( call X) A distance measure (say the Euclidian Distance) To find: The class label of a given feature vector xwhich is not in X
Nearest Neighbor classification (contd.) The NN rule: Find the point y in X which is nearest to x Assign the label of y to x
Nearest Neighbor classification (contd.) This rule allows us to partition the feature space into cells consisting of all points closer to a given training point x All points in such cells are labeled by the class of the training point. This partitioning is called a Voronoi Tesselation
Nearest Neighbor classification (contd.) Voronoi Cells in 2d
Nearest Neighbor classification Complexity of the NN rule Distance calculation Finding the minimum distance
Nearest Neighbor classification Nearest Neighbor Editing X= Data set, n= no of training points, j=0 Construct the full Voronoi diagram for X Doj=j+1, for each point xj in X find Voronoi neighbors of xj If any neighbor is not from the same class as xj then mark xj Untilj==n Discard all points that are not marked.
k nearest neighbor classification Given: Given a labeled sample of N feature vectors ( call X) A distance measure (say the Euclidian Distance) An integer k (generally odd) To find: The class label of a given feature vector x which is not in X
k-NN classification (contd.) Algorithm: Find out the k nearest neighbors of x in X Call them Out of the k samples, let ki of them belong to class ci . Choose that ci to be the class of x for which ki is maximum
K-nn Classification Class 1 Class 2 Class 3 z
k-NN classification (contd.) Distance weighted nearest neighbor In case x=xi, return f(xi) Training set Given an instance x to be classified Let be the nearest neighbors of x Return
Remarks on k-NN classification • The distance weighted kNN is robust to noisy training data and is quite effective when it is provided a sufficiently large set of training examples. • One drawbak of kNN method is that, it defers all computation till a new querry point is presented. Various methods have been developed to index the training examples so that the nearest neighbor can be found with less search time. One such indexing method is kd-tree developed by Bently 1975 • kNN is a lazy learner
Locally Weighted Regression • In the linear regression problem, to find h(x) at a point x we would do the following: • Minimize • Output
Locally Weighted Regression • In the llocally weighted regression problem we would do the following • Minimize • Output • A standard choice of weights is • is called the bandwidth parameter
Clustering Is different from Classification Classification is partitioning the feature space whereas Clustering is partitioning the data into“homogeneous groups” Clustering is Unsupervised!!
K-means Clustering Given: A data set Fix the number of clusters K Let represent the i-th cluster center (prototype) at the k-th iteration Let represent the j-th cluster at the k-th iteration
K-means Clustering Steps • Choose the initial cluster centers • At the k-th iterative step distribute the points in X in K cluster using: • Compute • If then the procedure has • converged else repeat from 2.