1 / 19

K Nearest Neighbors Classifier & Decision Trees

X. K Nearest Neighbors Classifier & Decision Trees. Content. K Nearest Neighbors Decision Trees Binary Decision Trees Linear Decision Trees Chi-Squared Automatic Interaction Detector (CHAID) Classification and Regression Trees (CART). K Nearest Neighbors. K Nearest Neighbors Advantage

Download Presentation

K Nearest Neighbors Classifier & Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. X K Nearest Neighbors Classifier & Decision Trees

  2. Content • K Nearest Neighbors • Decision Trees • Binary Decision Trees • Linear Decision Trees • Chi-Squared Automatic Interaction Detector (CHAID) • Classification and Regression Trees (CART)

  3. K Nearest Neighbors • K Nearest Neighbors • Advantage • Nonparametric architecture • Simple • Powerful • Requires no training time • Disadvantage • Memory intensive • Classification/estimation is slow

  4. K Nearest Neighbors • The key issues involved in training this model includes setting • the variable K • Validation techniques (ex. Cross validation) • the type of distant metric • Euclidean measure

  5. X Stored training set patterns X input pattern for classification --- Euclidean distance measure to the nearest three patterns Figure K Nearest Neighbors Example

  6. Store all input data in the training set For each pattern in the test set Search for the K nearest patterns to the input pattern using a Euclidean distance measure For classification, compute the confidence for each class as Ci /K, (where Ciis the number of patterns among the K nearest patterns belonging to class i.) The classification for the input pattern is the class with the highest confidence.

  7. Training parameters and typical settings • Number of nearest neighbors • The numbers of nearest neighbors (K) should be based on cross validation over a number of K setting. • When k=1 is a good baseline model to benchmark against. • A good rule-of-thumb numbers is k should be less than the square root of the total number of training patterns.

  8. Training parameters and typical settings • Input compression • Since KNN is very storage intensive, we may want to compress data patterns as a preprocessing step before classification. • Using input compression will result in slightly worse performance. • Sometimes using compression will improve performance because it performs automatic normalization of the data which can equalize the effect of each input in the Euclidean distance measure.

  9. Root node • Nodes of the tree • Leaves (terminal nodes) of the tree • Branches (decision point) of the tree C A A B B B B Decision trees • Decision trees are popular for pattern recognition because the models they produce are easier to understand.

  10. BMI<24 No Yes No Yes Yes No Decision trees-Binary decision trees • Classification of an input vector is done by traversing the tree beginning at the root node, and ending the leaf. • Each node of the tree computes an inequality (ex. BMI<24, yes or no) based on a single input variable. • Each leaf is assigned to a particular class.

  11. B C Decision trees-Binary decision trees • Since each inequality that is used to split the input space is only based on one input variable. • Each node draws a boundary that can be geometrically interpreted as a hyperplane perpendicular to the axis.

  12. aX1+bX2 No Yes No Yes Yes No Decision trees-Linear decision trees • Linear decision trees are similar to binary decision trees, except that the inequality computed at each node takes on an arbitrary linear from that may depend on multiple variables.

  13. Branch#2 Branch#3 Branch#1 Decision trees-Chi-Squared Automatic Interaction Detector (CHAID) • CHAID is a non-binary decision tree. • The decision or split made at each node is still based on a single variable, but can result in multiple branches. • The split search algorithm is designed for categorical variables.

  14. Decision trees-Chi-Squared Automatic Interaction Detector (CHAID) • Continuous variables must be grouped into a finite number of bins to create categories. • A reasonable number of “equal population bins” can be created for use with CHAID. • ex. If there are 1000 samples, creating 10 equal population bins would result in 10 bins, each containing 100 samples. • A Chi2 value is computed for each variable and used to determine the best variable to split on.

  15. Decision trees-Classification and regression trees (CART) • CLASSIFICATION AND REGRESSION TREES (CART) are binary decision trees, which split a single variable at each node. • The CART algorithm recursively goes though an exhaustive search of all variables and split values to find the optimal splitting rule for each node.

  16. Decision trees-Classification and regression trees (CART) • The optimal splitting criteria at a specific node can be found as follow: • Φ(s’/t)=Maxi (Φ(s/t))

  17. Training set Root node t tL tR Class j Class j CART Φ(s’/t)=Maxi (Φ(s/t)) tL= left offspring of node tR= right offspring of node

  18. Decision trees-Classification and regression trees (CART) • Pruning rule • Cut off the branches of the tree R(t)=r(t)p(t) The sub-tree with the smallest g(t) can then be pruned form the tree.

  19. End

More Related