1 / 20

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques. Bayesian Classifiers. Bayes Classifier. A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem:. Example of Bayes Theorem. Given:

effie
Download Presentation

Data Mining Classification: Alternative Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Classification: Alternative Techniques Bayesian Classifiers

  2. Bayes Classifier • A probabilistic framework for solving classification problems • Conditional Probability: • Bayes theorem:

  3. Example of Bayes Theorem • Given: • A doctor knows that meningitis causes stiff neck 50% of the time • Prior probability of any patient having meningitis is 1/50,000 • Prior probability of any patient having stiff neck is 1/20 • If a patient has stiff neck, what’s the probability he/she has meningitis?

  4. Using Bayes Theorem for Classification • Consider each attribute and class label as random variables • Given a record with attributes (X1, X2,…, Xd) • Goal is to predict class Y • Specifically, we want to find the value of Y that maximizes P(Y| X1, X2,…, Xd ) • Can we estimate P(Y| X1, X2,…, Xd ) directly from data?

  5. Using Bayes Theorem for Classification • Approach: • compute posterior probability P(Y | X1, X2, …, Xd) using the Bayes theorem • Maximum a-posteriori: Choose Y that maximizes P(Y | X1, X2, …, Xd) • Equivalent to choosing value of Y that maximizes P(X1, X2, …, Xd|Y) P(Y) • How to estimate P(X1, X2, …, Xd | Y )?

  6. Naïve Bayes Classifier • Assume independence among attributes Xi when class is given: • P(X1, X2, …, Xd |Yj) = P(X1| Yj) P(X2| Yj)… P(Xd| Yj) • Can estimate P(Xi| Yj) for all Xi and Yj from data • New point is classified to Yj if P(Yj)  P(Xi| Yj) is maximal.

  7. Conditional Independence • X and Y are conditionally independent given Z if P(X|YZ) = P(X|Z) • Example: Arm length and reading skills • Young child has shorter arm length and limited reading skills, compared to adults • If age is fixed, no apparent relationship between arm length and reading skills • Arm length and reading skills are conditionally independent given age

  8. Estimate Probabilities from Data • Class: P(Y) = Nc/N • e.g., P(No) = 7/10, P(Yes) = 3/10 • For discrete attributes: P(Xi | Yk) = |Xik|/ Nc • where |Xik| is number of instances having attribute value Xi and belonging to class Yk • Examples: P(Status=Married|No) = 4/7P(Refund=Yes|Yes)=0 k

  9. Estimate Probabilities from Data • For continuous attributes: • Discretization: Partition the range into bins: • Replace continuous value with bin value • Attribute changed from continuous to ordinal • Probability density estimation: • Assume attribute follows a normal distribution • Use data to estimate parameters of distribution (e.g., mean and standard deviation) • Once probability distribution is known, use it to estimate the conditional probability P(Xi|Y) k

  10. Estimate Probabilities from Data • Normal distribution: • One for each (Xi,Yi) pair • For (Income, Class=No): • If Class=No • sample mean = 110 • sample variance = 2975

  11. Example of Naïve Bayes Classifier Given a Test Record: • P(X|Class=No) = P(Refund=No|Class=No) P(Married| Class=No) P(Income=120K| Class=No) = 4/7  4/7  0.0072 = 0.0024 • P(X|Class=Yes) = P(Refund=No| Class=Yes) P(Married| Class=Yes) P(Income=120K| Class=Yes) = 1  0  1.2  10-9 = 0 Since P(X|No)P(No) > P(X|Yes)P(Yes) Therefore P(No|X) > P(Yes|X)=> Class = No

  12. Naïve Bayes Classifier • If one of the conditional probabilities is zero, then the entire expression becomes zero • Probability estimation: c: number of classes p: prior probability of the class m: parameterNc: number of instances in the class Nic: number of instances having attribute value Ai in class c

  13. Example of Naïve Bayes Classifier A: attributes M: mammals N: non-mammals P(A|M)P(M) > P(A|N)P(N) => Mammals

  14. Naïve Bayes (Summary) • Robust to isolated noise points • Handle missing values by ignoring the instance during probability estimate calculations • Robust to irrelevant attributes • Independence assumption may not hold for some attributes • Use other techniques such as Bayesian Belief Networks (BBN)

  15. Bayesian Belief Networks • Provides graphical representation of probabilistic relationships among a set of random variables • Consists of: • A directed acyclic graph (dag) • Node corresponds to a variable • Arc corresponds to dependence relationship between a pair of variables • A probability table associating each node to its immediate parent

  16. Conditional Independence • A node in a Bayesian network is conditionally independent of all of its nondescendants, if its parents are known D is parent of C A is child of C B is descendant of D D is ancestor of A

  17. Conditional Independence • Naïve Bayes assumption:

  18. Probability Tables • If X does not have any parents, table contains prior probability P(X) • If X has only one parent (Y), table contains conditional probability P(X|Y) • If X has multiple parents (Y1, Y2,…, Yk), table contains conditional probability P(X|Y1, Y2,…, Yk)

  19. Example of Bayesian Belief Network

  20. Example of Inferencing using BBN • Given: X = (E=No, D=Yes, CP=Yes, BP=High) • Compute P(HD|E,D,CP,BP)? • P(HD=Yes| E=No,D=Yes) = 0.55P(CP=Yes| HD=Yes) = 0.8P(BP=High| HD=Yes) = 0.85 • P(HD=Yes|E=No,D=Yes,CP=Yes,BP=High)  0.55  0.8  0.85 = 0.374 • P(HD=No| E=No,D=Yes) = 0.45P(CP=Yes| HD=No) = 0.01P(BP=High| HD=No) = 0.2 • P(HD=No|E=No,D=Yes,CP=Yes,BP=High)  0.45  0.01  0.2 = 0.0009 Classify X as Yes

More Related