150 likes | 401 Views
Robust Bayesian Classifier. Presented by Chandrasekhar Jakkampudi. Classification. Classification consists of assigning a class label to a set of unclassified cases. 1. Supervised Classification The set of possible classes is known in advance. 2. Unsupervised Classification
E N D
Robust Bayesian Classifier Presented by Chandrasekhar Jakkampudi
Classification Classification consists of assigning a class label to a set of unclassified cases. 1. Supervised Classification The set of possible classes is known in advance. 2. Unsupervised Classification Set of possible classes is not known. After classification we can try to assign a name to that class. Unsupervised classification is called clustering.
Supervised Classification • The input data, also called the training set, consists of multiple records each having multiple attributes or features. • Each record is tagged with a class label. • The objective of classification is to analyze the input data and to develop an accurate description or model for each class using the features present in the data. • This model is used to classify test data for which the class descriptions are not known. (1)
Bayesian Classifier • Assumptions : • The classes are mutually exclusive and exhaustive. • The attributes are independent given the class. • Called “Naïve” classifier because of these assumptions. • Empirically proven to be useful. • Scales very well.
Bayesian Classifier Bayesian classifier is defined by a set C of classes and a set A of attributes. A generic class belonging to C is denoted by cj and a generic attribute belonging to A as Ai. Consider a database D with a set of attribute values and the class label of the case. The training of the Bayesian Classifier consists of the estimation of the conditional probability distribution of each attribute, given the class.
Bayesian Classifier Let n(aik|cj) be the number of cases in which Ai appears with value aik and the class is cj. Then p(aik|cj) = n(aik|cj)/n(aik|cj) Also p(cj) = n(cj)/n This is only an estimate based on frequency. To incorporate our prior belief about p(aik|cj) we add αj imaginary cases with class cj of which αjk is the number of imaginary cases in which Ai appears with value aik and the class is cj.
Bayesian Classifier Thus p(aik|cj) = (αjk + n(aik|cj))/(αj + n(cj)) Also p(cj) = (αj + n(cj))/(α+ n) where αis the prior global precision. Once the training (estimation of the conditional probability distribution of each attribute, given the class) is complete we can classify new cases. To find p(cj|ek) we begin by calculating p(cj|a1k) = p(a1k|cj)p(cj)/Σp(a1k|ch) p(ch) p(cj|a1k ,a2k) = p(a2k|cj)p(cj|a1k)/Σp(a2k|ch) p(ch|a1k) and so on.
Bayesian Classifier • Works well with complete databases. • Methods exist to classify incomplete databases • Examples include EM algorithm, Gibbs sampling, Bound and Collapse (BC) and Robust Bayesian Classifier etc.
Robust Bayesian Classifier Incomplete databases seriously compromise the computational efficiency of Bayesian classifiers. One approach is to throw away all the incomplete entries. Another approach is to try to complete the database by allowing the user to specify the pattern of the data. Robust Bayesian Classifier makes no assumption about the nature of the data. It provides probability intervals that contain estimates learned from all possible completions of the database.
Training • We need to estimate the conditional probability p(aik/cj) • We have three types of incomplete cases. • Ai is missing. • C is missing • Both are missing. • Consider the case where value of Ai is not known. • Fill in the all values of Ai with aik and calculate • pmax(aik/cj). • Fill none of the values of Ai with aik and calculate • pmin(aik/cj). • Actual value of p(aik/cj) lies somewhere between these two • extremes.
Prediction Prediction involves computing p(cj/ek). Since we now have an interval for p(aik/cj) we will now calculate pmax(cj/ek) and pmin(cj/ek). To make the actual prediction of the class, the authors have introduced two criteria. 1. Stochastic dominance : Assign class label as cj if pmin(cj/ek) is greater than pmax(ch/ek) for all h≠j. 2. Weak Dominance : Arrive at a single probability for p(cj/ek) by assigning a score that will fall in the interval between pmax(cj/ek) and pmin(cj/ek)
Prediction Stochastic dominance criteria reduces coverage because the probability intervals may be overlapping. This is a more conservative and safe method. Weak dominance criteria improves coverage. Classification depends on the score used to arrive at a single probability for p(cj/e). Score used by the authors = (pmin(cj/e)(c-1)/c) + (pmax(cj/e)/c) where c is the total number of classes.
Results Robust Bayesian Classifier was tested on the Adult database which consists of 14 attributes over 48841 cases from the US Census of 1994. 7% of the database is incomplete. The database is divided into two classes: People who earn more than $50000 a year and people who don’t. Bayesian classifier gave an accuracy of 81.74% with a coverage of 93%. Robust Bayesian classifier under the Stochastic Dominance criteria gave an accuracy of 86.51% with a coverage of 87% Robust Bayesian classifier under the weak dominance criteria gave an accuracy of 82.5% with 100% coverage.
Conclusion Retains or improves upon the accuracy of the Naïve Bayesian Classifier. Stochastic dominance criterion should be the method used when accuracy is more important as compared to the coverage achieved. For more general databases, the weak stochastic dominance criterion should be used because it maintains the accuracy of the classification while improving the coverage.
Bibliography 1. SLIQ: A fast scalable Classifier for Data Mining; Manish Mehta, Rakesh Agarwal and Jorma Rissanen 2. An Introduction to the Robust Bayesian Classifier; Marco Ramoni and Paola Sebastiani 3. A Bayesian Approach to Filtering Junk E-mail; Mehran Sahami, Susan Dumais, David Heckerman, Eric Horvitz 4. Bayesian Networks without Tears; Eugene Charniak 5. Bayesian networks basics; Finn V. Jensen