Active Cost-sensitive Learning (Intelligent Test Strategies)

Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario, Canada cling@csd.uwo.ca http://www.csd.uwo.ca/faculty/cling Joint work with Victor Sheng, Qiang Yang, …

Outline • Introduction • Cost-sensitive decision trees • Test strategies • Sequential Test • Single Batch Test • Sequential Batch Test • Conclusions and future work

Everything has a cost/benefit! • Materials, products, services • Disease, working/living condition, waiting, … • Happiness, love, life, … • Money, Sex and Happiness: An Empirical Study, by David G. Blanchflower & Andrew J. Oswald, in Journal The Scandinavian Journal of Economics. 106:3, 2004. Pages: 393-415 • Lasting/happy marriage is worth about $100,000 in happiness • Utility-based learning: optimization; unifies many issues & is ultimate goal

Everything has a cost/benefit! • In medical diagnosis… • Tests have costs: temperature ($1), X-ray ($30), biopsy ($900) • Diseases have costs: flu ($100), diabetes (100k), cancer (108) • Misdiagnosis has (different) costs • Cost of false alarm ($500) << cost of missing a cancer ($500,000) • Doctors: balance the cost of tests and misdiagnosis • Our goal: to minimize the total cost • Many other similar applications… • Model this process • Cost-sensitive learning • Intelligent test strategies

Review of Previous Work • Cost-sensitive learning: a survey (Turney 2000) • Active research, also for imbalanced data problem • CS meta learning (wrapper): thresholding, sampling, weighting, … • CS learning algorithms. CSNB, our CS trees • …but all consider misclassification costs only • Some work considers test costs only • A few previous works consider both test costs and misclassification costs (Turney 1995, Zubek and Dietterich 2002, Lizotte et al 2003); all computationally expensive

Review of Previous Work • Active learning: actively seeking for extra info • Pool-based: a pool of unlabeled examples, which ones to label • Membership query: Is this instance positive? • Feature value acquisition • During training. But “missing is useful!” • During testing: our work • Human learning is active in many ways

Review of Previous Work • Diagnosis: wide applications in medicine, mechanical systems, software, … • Most previous AI-based diagnosis systems… • Manually built (partially) • Does not incorporate costs/benefit • Cannot actively suggest the processes • Our work: cost-sensitive and active; useful for diagnosis and policy setting

T2 Low Med T3 T1 1 2 >=36 <36 T6 0 1 0 a c b 0 1 0 Cost-sensitive Decision Tree Advantages: tree structure, comprehensiblity Objective: minimizing the total cost of tests and misclassification.

1 1 3 3 2 2 E1 C1 C2 E2 E3 C3 Attribute Splitting Criteria • Previous methods: C4.5 reduces the entropy (randomness), performs badly on cost sensitive tasks • New (ICML’04): we reduce the total expected cost E Choose T such that E – (E1+E2+E3) is max C Choose T such that C – (C1+C2+C3+C_Test) is max

Case Study: Heart Disease • Predict coronary artery disease • Class 0: less than 50% artery narrowing; Class 1: more than 50% artery narrowing • ~300 patients, collected from hospitals • 13 non-invasive tests on patients

13 Tests (Heart Disease)

cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 restecg ($15.5) 1 1 2 1 2 0 1 0 1 Cost-sensitive tree for Heart Disease • Naturally prefer tests with small cost • Balance cost and discriminating power • Local heart-failure specialist thinks this tree is reasonable.

Considering Group Discount Discount: $2.10 Discount: $101.90 Discount: $86.30

cp ($1) cp ($1) 1 1 2 2 3 3 4 4 sex ($1) sex ($1) slope ($87.3) slope ($87.3) fbs ($5.2) fbs ($5.2) thal ($102.9) thal ($102.9) 3 3 2 2 1 1 1 1 2 2 2 2 1 1 1 1 2 2 0 0 age ($1) age ($1) 0 0 chol ($7.27) chol ($7.27) thal ($102.9) thal ($102.9) 1 1 0 0 1 1 1 1 2 2 1 1 1 1 2 2 1 1 3 3 2 2 1 1 0 0 restecg ($15.5) restecg ($15.5) 1 1 0 0 thalach ($1) restecg ($15.5) 1 1 1 1 2 2 1 1 2 2 0 0 1 1 0 0 1 1 Different trees without/with group discount individual cost: $102.9 Before After

Algorithm of Cost-sensitive Decision Tree CSDT(Examples, Attributes, TestCosts) • If all examples are positive, return root with label=+ • If all examples are negative, return root with label=- • If maximum cost reduction <0, return root with label according to min(PTP+ NFP, NTN+ PFN) • Let A be an attribute with maximum cost reduction • root  A • Update TestCosts if discount applies • For each possible value viof the attribute A • Add a new branch A=vi below root • Segment the training examples Example_vi into the new branch • Call CSDT(examples_vi, Attributes-A, TestCosts) to build subtree

T2 Low Med T3 T1 1 2 >=36 <36 T6 0 1 0 a c b 0 1 0 • Three categories of intelligent test strategies • Sequential Test: one test, wait, … then predict • Single Batch Test: one batch of tests, then predict • Sequential Batch Test: batch 1, batch 2, … then predict • Minimize total cost of tests and misclassification, not trivial • Our methods: utilizing the minimum-cost tree structure

Sequential Test • Use tree structure to guide test sequence • “Optimal” because tree is (locally) optimal

cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Sequential Test

Experimental Comparison • Using 10 datasets from UCI

Comparing Sequential Test • Eager learning: Sequential Test (OST) (ICML’04) • Lazy learning: Lazy Sequential Test (LazyOST) (TKDE’05) • Cost-sensitive Naïve Bayes (CSNB) (ICDM’04)

Single Batch Test • Only one batch – not an easy task • If too few, important tests not requested; prediction is not accurate; total cost high • If too many, some tests are wasted; total cost high • The test example may not be classified by a leaf

1 3 2 j1 j2 j3 Single Batch Test • Expected cost reduction: if a test is done, what are the possible outcomes and cost reduction • R(.): all reachable unknown nodes and leaves i

Single Batch Test • A*-like search algorithm • Form a candidate list (L) and a batch list (B) • Choose a test with maximum positive expected cost reduction from L, add it to B • Update L: add all reachable unknowns to L • Efficient with tree structure • until expected cost reduction is 0

L = empty /* list of reachable and unknown attributes */ B = empty /* the batch of tests */ u = the first unknown attribute when classifying a test case Add u into L Loop For each i L, calculate E(i): E(i)= misc(i) – [c(i) + ] E(t) = max E(i) /* t has the maximum cost reduction */ If E(t) > 0 then add t into B, delete t from L, add r(t) into L else exit Loop /* No positive cost reduction */ Until L is empty Output B as the batch of tests Single Batch Test

cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Single Batch Test ]

cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Single Batch Test cp is unknown. cp has positive expected cost reduction. cp is added to the batch. cp’s reachable unknown nodes are added into the candidate list. ]

cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Single Batch Test From the candidate list, choose one with maximum positive expected cost reduction. Add it to the batch, and update the candidate list. Repeat. After 7 steps, expected cost reduction is 0. ]

cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Single Batch Test Do all tests in the batch ]

cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Single Batch Test Make a prediction. Some tests are wasted. ] Predict by internal node

Comparing Single Batch Tests • Naïve Single Batch (NSB) (ICML’04) • Cost-sensitive Naïve Bayes Single Batch (CSNB-SB) (ICDM’04) • Greedy Single Batch (GSB) (TKDE’05) • Single Batch Test (OSB) (TKDE’05)

Sequential Batch • Batch 1, batch 2, … , prediction • Must include the cost of waiting in tests • Wait cost of a batch: maximum wait cost in the batch • Less than the sum • Combines Sequential Test and Single Batch Test • If all waiting costs =0, it becomes Sequential Test • If all waiting costs very large, Single Batch

Sequential Batch • The wait cost is derived from wait time Test wait time in hours

Sequential Batch • Extending the Single Batch to include the batch cost • An additional constraint: cumulative ROI No more batches!

Loop L = empty /* list of reachable and unknown attributes */ B = empty /* the batch of tests */ u = the first unknown attribute when classifying a test case Add u into L Loop For each i L, calculate E(i): E(i)= misc(i) – [c(i) + ] E(t) = max E(i) /* t has the maximum cost reduction */ If E(t) > 0 & ROI increases then add t into B, delete t from L, add r(t) into L else exit Loop /* No positive cost reduction */ Until L is empty If (B is not empty) then Output B as the current batch of tests; obtain their values at a cost Classify the test example further, until encountering another unknown test Else exit the first Loop Sequential Batch

Comparing Sequential Batch Test

Future Work • Deal with different test examples differently • Consider more costs: acquiring new examples • If $10 for each new example, how many do I need? • For $10, tell me if this patient has cancer • If test is not accurate (e.g. 90%), how to build trees and how to do tests (will I do it again)? • From cost-sensitive trees, derive medical policy for expensive/risky or cheap/effective tests

Conclusions • Cost-sensitive decision tree: effective for learning with minimal total cost • Can be used to model learning from data with costs • Design and compare various test strategies • Sequential Test: one test, wait, …: low cost but long wait • Single Batch Test: one batch of tests: quick but higher cost • Sequential Batch Test: batch, wait, batch, …: best tradeoff • Our methods perform better than previous ones • Can be readily applied to real-world diagnoses

References C.X. Ling, Q. Yang, J. Wang, and S. Zhang. Decision Trees with Minimal Costs. ICML'2004. X. Chai, L. Deng, Q. Yang, and C.X. Ling. Test-Cost Sensitive Naive Bayes Classification. ICDM'2004. C.X. Ling, S. Sheng, Q. Yang. “Intelligent Test Strategies for Cost-sensitive Decision Trees. IEEE TKDE, to appear, 2005. S. Zhang, Z. Qin, C.X. Ling, S. Sheng. "Missing is Useful": Missing Values in Cost-sensitive Decision Trees. IEEE TKDE, to appear, 2005. Turney, P.D. 2000. Types of cost in inductive concept learning. Workshop on Cost-Sensitive Learning at ICML’2000. Zubek, V.B., and Dietterich, T. 2002. Pruning improves heuristic search for cost-sensitive learning. ICML’2002. Turney, P.D. 1995. Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. JAIR, 2:369-409. Lizotte, D., Madani, O., and Greiner R. 2003. Budgeted Learning of Naïve-Bayes Classifiers. In Uncertainty in AI.

Active Cost-sensitive Learning (Intelligent Test Strategies)

Active Cost-sensitive Learning (Intelligent Test Strategies)

Presentation Transcript

Active Learning Strategies and Techniques

Class Imbalance vs. Cost-Sensitive Learning

Active Learning

Active Learning Strategies

Issues in Adapting Active Learning Strategies

Temperature-Sensitive Loads and Class Cost Allocation

Active Learning Strategies

Chapter Two Active Learning

Active learning Query Strategies

Active Learning Strategies

LEARNING STRATEGIES Learning to learn

Ensembles for Cost-Sensitive Learning

Paired Sampling in Density-Sensitive Active Learning

ACTIVE LEARNING STRATEGIES

Test-Cost Sensitive Naïve Bayes Classification

Personalized Active Learning for Collaborative Filtering

Social Studies Active Learning Strategies

Active Learning

Classroom Strategies for Active Learning

Class Imbalance vs. Cost-Sensitive Learning

Active Learning Strategies

Proactive Learning: Cost-Sensitive Active Learning with Multiple Imperfect Oracles