Test-Cost Sensitive Naïve Bayes Classification

Test-Cost Sensitive Naïve Bayes Classification C. Ling Dept. of Computer Science The University of Western Ontario X. Chai, L. Deng, Q. Yang Dept. of Computer Science The Hong Kong University of Science and Technology

blood test pressure essay ? ? ? cardiogram temperature ? 39oc Example – Medical Diagnosis Is the patient healthy? Which test should be taken first? Which test to perform next? Concern: cost the patient as little as possible while maintaining low mis-diagnosis risk

Test-Cost Sensitive Learning • Great success of traditional inductive learning techniques. (decision trees, NB) –do not handle different types of costs during classification • Misclassification costs (Cmc): the costs incurred by classification errors – distinguish different types of classification errors – neglect the possibility of obtaining missing values in a test case through performing attribute tests • Test costs (Ctest): the costs incurred by obtaining missing values of attributes. • Minimize the total costsCtotal= Cmc + Ctest

Some Related Work • MDP-based cost-sensitive learning (Zubek and Dietterich 2002) • Cast as a Markov decision process • Solutions are given in terms of optimal policies • Very high computational cost to conduct the search • Decision trees with minimal cost (Ling et al 2004) Consider both misclassification and test costs in tree building • Splitting criterion: minimal total cost instead of InfoGain • Attributes not appearing on the testing branch are ignored, although they are still informative for classification • Not suitable for batch tests due to its sequential nature

Decision trees with minimal cost (Ling et al 2004) • Attribute selection criterion: minimal total cost(Ctotal = Cmc + Ctest) instead of minimal entropy in C4.5 • If growing a tree has a smaller total cost, then choose an attribute with minimal total cost. Otherwise, stop and form a leaf. • Label leaf also according to minimal total cost: • Suppose the leaf have P positive examples and N negative examples • FP denotes the cost of a false positive example and FN false negative • If (P×FN N×FP)THEN label = positive ELSE label = negative

A Tree Building Example Cmc = min(P×FN, N×FP) Ctest = 0Ctotal = Cmc + Ctest P:N Attribute A with a test cost C Consider attribute A for a potential splitting attribute A = v1 A = v2 C’mc= min(P1×FN, N1×FP) + min(P2×FN, N2×FP) C’test = (P1 + N1 + P2 + N2) × C C’total = C’mc + C’test P2:N2 P1:N1 • If C’total < Ctotal, splitting on A would reduce the total cost Choose an attribute with the minimal total cost for splitting • If C’totalCtotal for all remaining attributes, no further sub-tree will be built, and the set will become a leaf.

Sequential Test Strategy • Optimal Sequential Test (OST): • each test example goes down the tree until an attribute whose value is unknown is met in the test example. • Then the test is done and the missing value is revealed. • The process continues until it falls into a leaf node. • The leaf node label is used as prediction. • The total cost is the sum of misclassification cost and test cost. • Problems with the OST strategy: • The algorithm chooses a locally optimal attribute without backtracking. Thus the OST strategy is not globally optimal. • Attributes not appearing on the testing branch are ignored, although they are still informative for classification • Not suitable for batch tests due to its sequential nature

Problem Formulation • Given: • D – a training dataset of N samples {x1,…,xN} from P classes {c1,…,cP}, where each sample xi is described by M attributes (A1,…,AM) among whom there can be missing values. • C – a misclassification cost matrix. Cij = C(i,j) specifies the cost of classifying a sample from ci as belong to class cj • T – a test-cost vector. Tk = T(k) specifies the cost of taking a test on attribute Ak (1kM) • Build: • csNB – a cost sensitive naïve Bayes classifier • S – a test strategy for every new case with the aim to minimize the sum of the misclassification cost Cmc and test cost Ctest

csNB classification • Two procedures: Learning and prediction • Learning a csNB classifier • Same as learning a traditional NB classifier • Estimate prior probabilities P(cj) and P(Am=vm,k|cj) from the training dataset D. • Missing values are simply ignored in likelihood computation. • Prediction • Sequential test strategy • Batch test strategy

Sequential Test Strategy v.s. Batch Test Strategy • What is a sequential test strategy? – decisions are made sequentially on whether a further test on an unknown attribute should be performed, and if so, which attribute to select based on the values of the attributes initiallyknown or previously tested. – a test strategy that is designed on the fly during classification. • What is a batch test strategy? – selection of tests on unknown attributes must be determined in advance before any test is carried out. – a test strategy that is designed beforehand. • Both are aimed to minimize the sum of misclassification and test costs.

…… test spleen (?,?,?,pos) test ascites classify …… test spiders (?,?,?,neg) Test{spleen, spiders, ascites} Example: Diagnosis of Hepatitis • Test costs and likelihoods of each attribute: • Assume: – 21% patients are positive (c1) (have hepatitis) P(c1)=21% – 79% patients are negative(c2)(healthy) P(c2)=79% – Classification costs: C12=450, C21=150, C11=C12=0 – Four attributes to describe a patient • Suppose a patient comes with all attribute values unknown: (?,?,?,?) • Sequential test: • Batch test: (?,?,?,?) (?,?,?,?) (?,neg,neg,pos)

is the test cost attribute given by Ti Prediction with Sequential Test Strategy • Suppose x is a test example. Let denote the set of known attributes and the unknown attributes. • We define the utility of testing unknown attribute is defined as: • Where: is the reduction in the expected misclassification cost if we know ’s true value

is the expected Cmc based on takes expectation over all possible values of Prediction with Sequential Test Strategy • Gain( , )is defined as: • Where:

Prediction with Sequential Test Strategy • Overall, an attribute is worth testing on if testing it offers more gain than the cost it brings. • By calculating all the utilities of testing unknown attributes in , we can decide: • Whether a further test is needed? • Which attribute to test? • After attribute is tested, its true value is revealed and it is removed from to . The same procedure continues until: • no unknown attribute is left ( ) • or the utility of testing any unknown attribute is non-positive • Finally, the example is predicted as classand Ctest is the total costs of the tests performed.

… … Compute the utility of testing every unknown attribute No Yes Select the unknown attributewith the highest utility to test furthertest? classify csNB-sequential-predict Algorithm

Prediction with Batch Test Strategy • A natural extension from the sequential test algorithm of csNB • All the attributes with non-negative utility are selected. The batch of attributes selected are, and the test cost • After is selected, the values of these attributes are revealed and the class label is then predicted.

Experiments • Experiments were carried out on eight datasets from UCI ML repository (Ecoli, heart, Australia, Voting, Breast, … ). • Four algorithms were implemented for comparison: • csNB– the test-cost sensitive naïve Bayes • csDT – the cost-sensitive decision trees proposed in Ling et al 2004. • LNB – lazy naïve Bayes, which predicts based only on the known attributes and requires no tests to be done on any unknown attribute • ENB – Exacting naïve Bayes, which requires all the missing values to be made up before prediction. • The performance of the algorithms is measured in terms of the total cost Ctotal = Cmc + Ctest, where Cmc can be obtained by comparing the predicted and true labels of the test examples.

Experimental Results – Sequential Test Average total costs comparisons on datasets: Ecoli, Breast, Heart, Thyroid ENB csNB LNB csDT

Experimental Results – Sequential Test Average total costs comparisons on datasets: Australia, Cars, Voting, Mushroom

Experimental Results – Sequential Test • Comparison of LNB, csNB and csDT with increasing percentage of unknown attributes Mushroom dataset

Experimental Results – Sequential Test • Comparison of csNB and csDT with varying test costs (missing rates are set to 20% and 60%) on the Mushroom dataset Compared with csDT, csNB is more effective at balancing the misclassification and test costs.

Experimental Results – Batch Test • Overall, csNB incurs 29.6% less total cost than csDT. • csDT is inflexible to derive batch test strategies due to its sequential nature in tree building. • csNB has no such constraints and all the attributes can be evaluated at the same level.

Conclusion and future work • We proposed a test-cost sensitive naïve Bayes algorithm for designing classifiers that minimize the sum of the misclassification cost and test costs • In the framework of csNB, attributes can be intelligently selected to design both sequential and batch test strategies. • In the future, we plan to develop more effective algorithms and consider more complicated situations where the test cost of an attribute may be conditional on other attributes. • It is also interesting to consider the cost of finding the missing values for training data

THANK YOU! Q & A

Test-Cost Sensitive Naïve Bayes Classification