1 / 34

Understanding Classification: Decision Trees & Naïve Bayes

Learn about two essential classification methods, decision trees & Naïve Bayes, for predictive analytics. Decision trees involve tree structures and predictive modeling, while Naïve Bayes relies on Bayes’ theorem to predict outcomes based on independent features.

dixonn
Download Presentation

Understanding Classification: Decision Trees & Naïve Bayes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chap 4: ClassificationChapter Sections • Decision Trees • Naïve Bayes • Diagnostics of Classifiers • Additional Classification Models • Summary

  2. Classification • Classification is widely used for prediction • Most classification methods are supervised • This chapter focuses on two fundamental classification methods • Decision trees • Naïve Bayes

  3. Decision Trees • Tree structure specifies sequence of decisions • Given input X={x1, x2,…, xn}, predict output Y • Input attributes/features can be categorical or continuous • Node = tests a particular input variable • Root node, internal nodes, leaf nodes return class labels • Depth of node = minimum steps to reach node • Branch (connects two nodes) = specifies decision • Two varieties of decision trees • Classification trees: categorical output, often binary • Regression trees: numeric output

  4. Decision TreesOverview of a Decision Tree • Example of a decision tree • Predicts whether customers will buy a product

  5. Decision TreesOverview of a Decision Tree • Example: will bank client subscribe to term deposit?

  6. Decision TreesThe General Algorithm • Construct a tree T from training set S • Requires a measure of attribute information • Simplistic method (data from previous Fig.) • Purity = probability of corresponding class • E.g., P(no)=1789/2000=89.45%, P(yes)=10.55% • Entropy methods • Entropymeasures the impurity of an attribute • Information gain measures purity of an attribute

  7. Decision TreesThe General Algorithm • Entropy methods of attribute information • Hx = the entropy of X • Information gain of an attribute = base entropy – conditional entropy

  8. Decision TreesThe General Algorithm • Construct a tree T from training set S • Choose root node = most informative attribute A • Partition S according to A’s values • Construct subtrees T1, T2… for the subsets of S recursively until one of following occurs • All leaf nodes satisfy minimum purity threshold • Tree cannot be further split with min purity threshold • Other stopping criterion satisfied – e.g., max depth

  9. Decision TreesDecision Tree Algorithms • ID3 Algorithm T=training set, P=output variable, A=attribute

  10. Decision TreesDecision Tree Algorithms • C4.5 Algorithm • Handles missing data • Handles both categorical and sontinuous variables • Uses bottom-up pruning to address overfitting • CART (Classification And Regression Trees) • Also handles continuous variables • Uses Gini diversity index as info measure

  11. Decision TreesEvaluating a Decision Tree • Decision trees are greedy algorithms • Best option at each step, maybe not best overall • Addressed by ensemble methods: random forest • Model might overfit the data Blue = training set Red = test set Overcome overfitting: Stop growing tree early Grow full tree, then prune

  12. Decision TreesEvaluating a Decision Tree • Decision trees -> rectangular decision regions

  13. Decision TreesEvaluating a Decision Tree • Advantages of decision trees • Computationally inexpensive • Outputs are easy to interpret – sequence of tests • Show importance of each input variable • Decision trees handle • Both numerical and categorical attributes • Categorical attributes with many distinct values • Variables with nonlinear effect on outcome • Variable interactions

  14. Decision TreesEvaluating a Decision Tree • Disadvantages of decision trees • Sensitive to small variations in the training data • Overfitting can occur because each split reduces training data for subsequent splits • Poor if dataset contains many irrelevant variables

  15. Decision TreesDecision Trees in R # install packages rpart,rpart.plot # put this code into Rstudio source and execute lines via Ctrl/Enter library("rpart") library("rpart.plot") setwd("c:/data/rstudiofiles/") banktrain <- read.table("bank-sample.csv",header=TRUE,sep=",") ## drop a few columns to simplify the tree drops<-c("age", "balance", "day", "campaign", "pdays", "previous", "month") banktrain <- banktrain [,!(names(banktrain) %in% drops)] summary(banktrain) # Make a simple decision tree by only keeping the categorical variables fit <- rpart(subscribed ~ job + marital + education + default + housing + loan + contact + poutcome,method="class",data=banktrain,control=rpart.control(minsplit=1), parms=list(split='information')) summary(fit) # Plot the tree rpart.plot(fit, type=4, extra=2, clip.right.labs=FALSE, varlen=0, faclen=3)

  16. Naïve Bayes • The naïve Bayes classifier • Based on Bayes’ theorem (or Bayes’ Law) • Assumes the features contribute independently • Features (variables) are generally categorical • Discretization of continuous variables is the process of converting continuous variables into categorical ones • Output is usually class label plus probability score • Log probability often used instead of probability

  17. Naïve BayesBayes Theorem • Bayes’ Theorem where C = class, A = observed attributes • Typical medical example • Used because doctor’s frequently get this wrong

  18. Naïve Bayes Classifier • Conditional independence assumption • And dropping common denominator, we get Find cj that maximizes P(cj|A)

  19. Naïve Bayes Classifier • Example: client subscribes to term deposit? • The following record is from a bank client. Is this client likely to subscribe to the term deposit?

  20. Naïve Bayes Classifier • Compute probabilities for this record

  21. Naïve Bayes Classifier • Compute Naïve Bayes classifier outputs: yes/no • The client is assigned the label subscribed = yes • The scores are small, but the ratio is what counts • Using logarithms helps avoid numerical underflow

  22. Smoothing • A smoothing technique assigns a small nonzero probability to rare events that are missing in the training data • E.g., Laplace smoothing assumes every output occurs once more than occurs in the dataset • Smoothing is essential – without it, a zero conditional probability results in P(cj|A)=0

  23. Diagnostics • Naïve Bayes advantages • Handles missing values • Robust to irrelevant variables • Simple to implement • Computationally efficient • Handles high-dimensional data efficiently • Often competitive with other learning algorithms • Reasonably resistant to overfitting • Naïve Bayes disadvantages • Assumes variables are conditionally independent • Therefore, sensitive to double counting correlated variables • In its simplest form, used only for categorical variables

  24. Naïve Bayes in R • This section explores two methods of using the naïve Bayes Classifier • Manually compute probabilities from scratch • Tedious with many R calculations • Use naïve Bayes function from e1071 package • Much easier – starts on page 222 • Example: subscribing to term deposit

  25. Naïve Bayes in R • Get data and e1071 package > setwd("c:/data/rstudio/chapter07") > sample<-read.table("sample1.csv",header=TRUE,sep=",") > traindata<-as.data.frame(sample[1:14,]) > testdata<-as.data.frame(sample[15,]) > traindata #lists train data > testdata #lists test data, no Enrolls variable > install.packages("e1071", dep = TRUE) > library(e1071) #contains naïve Bayes function

  26. Naïve Bayes in R • Perform modeling > model<-naiveBayes(Enrolls~Age+Income+JobSatisfaction+Desire,traindata) > model # generates model output > results<-predict(model,testdata) > Results # provides test prediction Using a Laplace parameter gives same result

  27. Diagnostics of Classifiers • The book covered three classifiers • Logistic regression, decision trees, naïve Bayes • Tools to evaluate classifier performance • Confusion matrix

  28. Diagnostics of Classifiers • Bank marketing example • Training set of 2000 records • Test set of 100 records, evaluated below

  29. Diagnostics of Classifiers • Evaluation metrics

  30. Diagnostics of Classifiers • Evaluation metrics on bank marketing 100 test set poor poor

  31. Diagnostics of Classifiers • ROC curve: good for evaluating binary detection Bank marketing: 2000 training set + 100 test set > banktrain<-read.table("bank-sample.csv",header=TRUE,sep=",") > drops<-c("balance","day","campaign","pdays","previous","month") > banktrain<-banktrain[,!(names(banktrain) %in% drops)] > banktest<-read.table("bank-sample-test.csv",header=TRUE,sep=",") > banktest<-banktest[,!(names(banktest) %in% drops)] > nb_model<-naiveBayes(subscribed~.,data=banktrain) > nb_prediction<-predict(nb_model,banktest[,-ncol(banktest)],type='raw') > score<-nb_prediction[,c("yes")] > actual_class<-banktest$subscribed=='yes' > pred<-prediction(score,actual_class) # code problem

  32. Diagnostics of Classifiers • ROC curve: good for evaluating binary detection • Bank marketing: 2000 training set + 100 test set

  33. Additional Classification Methods • Ensemble methods that use multiple models • Bagging: bootstrap method that uses repeated sampling with replacement • Boosting: similar to bagging but iterative procedure • Random forest: uses ensemble of decision trees • These models usually have better performance than a single decision tree • Support Vector Machine (SVM) • Linear model using small number of support vectors

  34. Summary • How to choose a suitable classifier among • Decision trees, naïve Bayes, & logistic regression

More Related