40 likes | 168 Views
Questions and Topics Review Nov. 30, 2010. Give an example of a problem that might benefit from feature creation How does DENCLUE form clusters? Why does DENCLUE use grid-cells? What are the main differences between DENCLUE and DBSCAN?
E N D
Questions and Topics Review Nov. 30, 2010 • Give an example of a problem that might benefit from feature creation • How does DENCLUE form clusters? Why does DENCLUE use grid-cells? What are the main differences between DENCLUE and DBSCAN? • Compute the Silhouette of the following clustering that consists of 2 clusters: {(0,0), (0,1), (2,2)} {(3,2), (3,3)}. • Compare Decision Trees, Support Vector Machines, and K-NN with respect to the number of decision boundary each approach uses! • K-NN is a lazy approach; what does it mean? What are the disadvantages of K-NN’s lazy approach? Do you see any advantages in using K-NN’s lazy approach. • Why do some support vector machine approaches map examples from a lower dimensional space to a higher dimensional space? • What is the role of slack variables in the Linear/SVM/Non-separable approach (textbook pages 266-270)—what do they measure? What properties of hyperplanes are maximized by the objective function f(w) (on page 268) in the approach? • Silhouette: For an individual point, i • Calculate a = average distance of i to the points in its cluster • Calculate b = min (average distance of i to points in another cluster) • The silhouette coefficient for a point is then given by:s = (b-a)/max(a,b)
Support Vector Machines • What if the problem is not linearly separable?
Linear SVM for Non-linearly Separable Problems • What if the problem is not linearly separable? • Introduce slack variables • Need to minimize: • Subject to (i=1,..,N): • C is chosen using a validation set trying to keep the margins wide while keeping the training error low. Parameter Inverse size of margin between hyperplanes Measures testing error Slack variable allows constraint violation to a certain degree
Questions and Topics Review Nov. 30, 2010 • Discussion of Problem1/2of Assignment4 • Give an example of a problem that might benefit from feature creation • How does DENCLUE form clusters? Why does DENCLUE use grid-cells? What are the main differences between DENCLUE and DBSCAN? • Compute the Silhouette of the following clustering that consists of 2 clusters: {(0,0), (0.1), (2,2)} {(3,2), (3,3)}. • Compare Decision Trees, Support Vector Machines, and K-NN with respect to the number of decision boundary each approach uses! DT: many, rectangular for numerical attributes K-NN: many, convex polygons (Voronoi cells), SVM: one, hyperplane • K-NN is a lazy approach; what does it mean? What are the disadvantages of K-NN’s lazy approach? Do you see any advantages in using K-NN’s lazy approach. … advantages: for quickly changing streaming data learning the model might be a waste of time and a lazy approach might be better… • Why do some support vector machine approaches map examples from a lower dimensional space to a higher dimensional space? To make them linearly separable. • What is the role of slack variables in the Linear/SVM/Non-separable approach (textbook pages 266-270)—what do they measure? What properties of hyperplanes are maximized by the objective function f(w) (on page 268) in the approach?