Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques Bayesian Classifiers

Bayes Classifier • A probabilistic framework for solving classification problems • Conditional Probability: • Bayes theorem:

Example of Bayes Theorem • Given: • A doctor knows that meningitis causes stiff neck 50% of the time • Prior probability of any patient having meningitis is 1/50,000 • Prior probability of any patient having stiff neck is 1/20 • If a patient has stiff neck, what’s the probability he/she has meningitis?

Using Bayes Theorem for Classification • Consider each attribute and class label as random variables • Given a record with attributes (X1, X2,…, Xd) • Goal is to predict class Y • Specifically, we want to find the value of Y that maximizes P(Y| X1, X2,…, Xd ) • Can we estimate P(Y| X1, X2,…, Xd ) directly from data?

Using Bayes Theorem for Classification • Approach: • compute posterior probability P(Y | X1, X2, …, Xd) using the Bayes theorem • Maximum a-posteriori: Choose Y that maximizes P(Y | X1, X2, …, Xd) • Equivalent to choosing value of Y that maximizes P(X1, X2, …, Xd|Y) P(Y) • How to estimate P(X1, X2, …, Xd | Y )?

Conditional Independence • X and Y are conditionally independent given Z if P(X|YZ) = P(X|Z) • Example: Arm length and reading skills • Young child has shorter arm length and limited reading skills, compared to adults • If age is fixed, no apparent relationship between arm length and reading skills • Arm length and reading skills are conditionally independent given age

Estimate Probabilities from Data • For continuous attributes: • Discretization: Partition the range into bins: • Replace continuous value with bin value • Attribute changed from continuous to ordinal • Probability density estimation: • Assume attribute follows a normal distribution • Use data to estimate parameters of distribution (e.g., mean and standard deviation) • Once probability distribution is known, use it to estimate the conditional probability P(Xi|Y) k

Estimate Probabilities from Data • Normal distribution: • One for each (Xi,Yi) pair • For (Income, Class=No): • If Class=No • sample mean = 110 • sample variance = 2975

Naïve Bayes Classifier • If one of the conditional probabilities is zero, then the entire expression becomes zero • Probability estimation: c: number of classes p: prior probability of the class m: parameterNc: number of instances in the class Nic: number of instances having attribute value Ai in class c

Example of Naïve Bayes Classifier A: attributes M: mammals N: non-mammals P(A|M)P(M) > P(A|N)P(N) => Mammals

Naïve Bayes (Summary) • Robust to isolated noise points • Handle missing values by ignoring the instance during probability estimate calculations • Robust to irrelevant attributes • Independence assumption may not hold for some attributes • Use other techniques such as Bayesian Belief Networks (BBN)

Bayesian Belief Networks • Provides graphical representation of probabilistic relationships among a set of random variables • Consists of: • A directed acyclic graph (dag) • Node corresponds to a variable • Arc corresponds to dependence relationship between a pair of variables • A probability table associating each node to its immediate parent

Conditional Independence • A node in a Bayesian network is conditionally independent of all of its nondescendants, if its parents are known D is parent of C A is child of C B is descendant of D D is ancestor of A

Conditional Independence • Naïve Bayes assumption:

Probability Tables • If X does not have any parents, table contains prior probability P(X) • If X has only one parent (Y), table contains conditional probability P(X|Y) • If X has multiple parents (Y1, Y2,…, Yk), table contains conditional probability P(X|Y1, Y2,…, Yk)

Example of Bayesian Belief Network

Data Mining Classification: Alternative Techniques