1 / 12

Data Mining CSCI 307, Spring 2019 Lecture 14

This lecture discusses how to classify new data with numeric attributes using the Naive Bayes algorithm. It covers calculating probabilities with mean and standard deviation, using the normal probability distribution, and constructing decision trees.

mkelley
Download Presentation

Data Mining CSCI 307, Spring 2019 Lecture 14

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data MiningCSCI 307, Spring 2019Lecture 14 Naive Bayes

  2. We want to Classify a New Day Outlook Temperature Humidity Windy Play Sunny 66 90 True ? • But what to do with this is numeric data? • We cannot just "count" up the occurrence of a particular numeric value to calculate probabilities. • We calculate the mean (the average) and standard deviation (a measure of how spread out the data is, low std dev means that the data points are pretty close the mean, high std dev means the points are spread out) and then...

  3. Numeric Attributes Usual assumption: attributes have a normal or Gaussian probability distribution (given the class) The probability density function for the normal distribution is defined by two parameters: Sample mean m Standard deviation s Then the density function f(x) is

  4. Statistics for Weather data http://easycalculation.com/statistics/normal-pdf.php Outlook Temperature Humidity Windy Play . Yes No Yes No Yes No Yes No Yes No . Sunny 23 64,68,65,71,65,70, 70,85, False 6295 Overcast 4069,70, 72,80,70,75, 90,91, True 3 3 Rainy 3272,75, 8580,80, 95 80,83, 86,90, 90 96 Sunny 2/93/5 m=73 m=74.6m=79.1 m=86.2 False 6/92/59/145/14 Overcast 4/90/5 s=6.2 s=7.9s=10.2 s=9.7 True 3/9 3/5 Rainy 3/92/5 f[temperature=66|yes] = f[humidity=90|yes] = f[temperature=66|no] = f[humidity=90|no] =

  5. PDF Calculator http://easycalculation.com/statistics/normal-pdf.php m= 73 s= 6.2 What is the PDF?

  6. Classifying a New Day Outlook Temperature Humidity Windy Play Sunny 66 90 True ? Likelihood of "yes" = Note: Missing values during training are not included in calculation of mean and standard deviation Likelihood of "no" = P("yes") = P("no") =

  7. Naive Bayes: Discussion • Naive Bayes works surprisingly well (even if independence assumption is clearly violated) • Why? Because classification doesn’t require accurate probability estimates as long as maximum probability is assigned to correct class • However: adding too many redundant attributes will cause problems (e.g. identical attributes) • Note also: many numeric attributes are not normally distributed, if know the distribution for an attribute,I it can be use, otherwise --> kernel density estimators

  8. Constructing Decision Trees • Strategy: top down • Recursive divide-and-conquer fashion • First: select attribute for root node Create branch for each possible attribute value • Then: split instances into subsets One for each branch extending from the node • Finally: repeat recursively for each branch, using only instances that reach the branch • Stop if all instances have the same class

  9. Which Attribute to Select?

  10. Criterion for Attribute Selection Which is the best attribute? • Want to get the smallest tree • Heuristic: choose the attribute that produces the “purest” nodes • Popular impurity criterion: information gain • Information gain increases with the average purity of the subsets • Strategy: choose attribute that gives greatest information gain

  11. Information Theory: Measure Information in Bits Information gain Amount of information gained by knowing the value of the attribute (Entropy of distribution before the split) – (Entropy of distribution after it) Claude Shannon, American mathematician and scientist 1916–2001, came up with the whole idea of information theory and quantifying entropy, which measures information in bits. He could ride a unicycle and juggle clubs at the same time -- when he was in his 80's. That's pretty impressive. He was living in Massachusetts when he died of Alzheimer's disease.

  12. Computing Information Measure information in bits • Given a probability distribution, the information required to predict an event is the distribution’s entropy • Entropy gives the information required in bit form (can involve fractions of bits.) Formula for computing the entropy: entropy(p1,p2, ... pn) = − p1log2 p1 − p2 log2 p2 ... − pnlog2 pn Example: Weather Data What do we know before we split? There are 9 yes and 5 no outcomes. Calculate the information: info([9,5]) = entropy(9/14, 5/14) = ?

More Related