50 likes | 171 Views
Questions and Topics Review Nov. 22, 2011. Assume you have to do feature selection for a classification task. What are the characteristics of features (attributes) you might remove from the dataset prior to learning the classification algorithm ?
E N D
Questions and Topics Review Nov. 22, 2011 • Assume you have to do feature selection for a classification task. What are the characteristics of features (attributes) you might remove from the dataset prior to learning the classification algorithm? • How is region discovery in spatial datasets different from traditional clustering? • What are the unique characteristics of hierarchical clustering? • Compute the Silhouette of the following clustering that consists of 2 clusters: {(0,0), (0,1), (2,2)} {(3,2), (3,3)}. To be discussed in review on Dec. 1! • Silhouette: For an individual point, i • Calculate a = average distance of i to the points in its cluster • Calculate b = min (average distance of i to points in another cluster) • The silhouette coefficient for a point is then given by:s = (b-a)/max(a,b) • Compare Decision Trees, Support Vector Machines, and K-NN with respect to the number of decision boundary each approach uses! • K-NN is a lazy approach; what does it mean? What are the disadvantages of K-NN’s lazy approach? Do you see any advantages in using K-NN’s lazy approach? • Why do some support vector machine approaches map examples from a lower dimensional space to a higher dimensional space? • What is the role of slack variables in the Linear/SVM/Non-separable approach (textbook pages 266-270)—what do they measure? What properties of hyperplanes are maximized by the objective function f(w) (on page 268) in the approach?
Support Vector Machines • What if the problem is not linearly separable? Circle hyperplane Square hyperplane
Linear SVM for Non-linearly Separable Problems • What if the problem is not linearly separable? • Introduce slack variables • Need to minimize: • Subject to (i=1,..,N): • C is chosen using a validation set trying to keep the margins wide while keeping the training error low. Parameter Inverse size of margin between hyperplanes Measures error Slack variable allows constraint violation to a certain degree
Answers Review Nov. 22, 2011 • Assume you have to do feature selection for a classification task. What are the characteristics of features (attributes) you might remove from the dataset prior to learning the classification algorithm? Redundant attributes; irrelevant attributes • How is region discovery in spatial datasets different from traditional clustering? Clustering is performed in the subspace of spatial attributes; e.g. clusters are contiguous in the space of the spatial attributes; separation between spatial and non-spatial attributes---non-spatial attributes are on used by objective functions/cluster evaluation measures. • What are the unique characteristics of hierarchical clustering? Computes a dendrogram and multiple clusterings; a dendrogram captures hierarchical relationships between clusters; HC relies on agglomerative/divisive approaches to compute clusters based on an union operator: C=C1C2 • … • Compare Decision Trees, Support Vector Machines, and K-NN with respect to the number of decision boundary each approach uses! DT: many, rectangular for numerical attributes K-NN: many, convex polygons (Voronoi cells), SVM: one, a single hyperplane • Why do some support vector machine approaches map examples from a lower dimensional space to a higher dimensional space? To make them linearly separable.
Answers2 Review Nov. 22, 2011 6. K-NN is a lazy approach; what does it mean? What are the disadvantages of K-NN’s lazy approach? Do you see any advantages in using K-NN’s lazy approach. Lazy: postpones “learning a model” disadvantages: slow as the model is “obtained” when classifying an object and not beforehand…; (other disadvantage (not asked for here)): there is no explicit model and therefore no way to show it to a domain expert which makes it difficult to establish trust into the learnt model. Advantage of “being lazy”: for quickly changing streaming data learning the model might be a waste of time as the model changes quickly over time and a lazy approach might be better… 8. What is the role of slack variables in the Linear/SVM/Non-separable approach (textbook pages 266-270)—what do they measure? What properties of hyperplanes are maximized by the objective function f(w) (on page 268) in the approach? The slack variables imeasure the distance of an object i to the object’s class hyperplane if it is on the wrong side of the object’s class hyperplane, and 0 if the example is on the correct side of the hyperplane. imeasures the error associated with the i’s example. Soft Margin SVM solve a 2-objective optimization problem, trying to minimize the errors associated with the examples (i) while keeping the margins as wide as possible (minimizing ||w||)—the parameter C determines how much emphasis is put on each of the two objectives.