1 / 45

Active Cost-sensitive Learning (Intelligent Test Strategies)

Active Cost-sensitive Learning (Intelligent Test Strategies). Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario, Canada cling@csd.uwo.ca http://www.csd.uwo.ca/faculty/cling Joint work with Victor Sheng, Qiang Yang, …. Outline. Introduction

amie
Download Presentation

Active Cost-sensitive Learning (Intelligent Test Strategies)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Active Cost-sensitive Learning (Intelligent Test Strategies) Charles X. Ling, PhD Department of Computer Science University of Western Ontario, Ontario, Canada cling@csd.uwo.ca http://www.csd.uwo.ca/faculty/cling Joint work with Victor Sheng, Qiang Yang, …

  2. Outline • Introduction • Cost-sensitive decision trees • Test strategies • Sequential Test • Single Batch Test • Sequential Batch Test • Conclusions and future work

  3. Outline • Introduction • Cost-sensitive decision trees • Test strategies • Sequential Test • Single Batch Test • Sequential Batch Test • Conclusions and future work

  4. Everything has a cost/benefit! • Materials, products, services • Disease, working/living condition, waiting, … • Happiness, love, life, … • Money, Sex and Happiness: An Empirical Study, by David G. Blanchflower & Andrew J. Oswald, in Journal The Scandinavian Journal of Economics. 106:3, 2004. Pages: 393-415 • Lasting/happy marriage is worth about $100,000 in happiness • Utility-based learning: optimization; unifies many issues & is ultimate goal

  5. Everything has a cost/benefit! • In medical diagnosis… • Tests have costs: temperature ($1), X-ray ($30), biopsy ($900) • Diseases have costs: flu ($100), diabetes (100k), cancer (108) • Misdiagnosis has (different) costs • Cost of false alarm ($500) << cost of missing a cancer ($500,000) • Doctors: balance the cost of tests and misdiagnosis • Our goal: to minimize the total cost • Many other similar applications… • Model this process • Cost-sensitive learning • Intelligent test strategies

  6. Review of Previous Work • Cost-sensitive learning: a survey (Turney 2000) • Active research, also for imbalanced data problem • CS meta learning (wrapper): thresholding, sampling, weighting, … • CS learning algorithms. CSNB, our CS trees • …but all consider misclassification costs only • Some work considers test costs only • A few previous works consider both test costs and misclassification costs (Turney 1995, Zubek and Dietterich 2002, Lizotte et al 2003); all computationally expensive

  7. Review of Previous Work • Active learning: actively seeking for extra info • Pool-based: a pool of unlabeled examples, which ones to label • Membership query: Is this instance positive? • Feature value acquisition • During training. But “missing is useful!” • During testing: our work • Human learning is active in many ways

  8. Review of Previous Work • Diagnosis: wide applications in medicine, mechanical systems, software, … • Most previous AI-based diagnosis systems… • Manually built (partially) • Does not incorporate costs/benefit • Cannot actively suggest the processes • Our work: cost-sensitive and active; useful for diagnosis and policy setting

  9. Outline • Introduction • Cost-sensitive decision trees • Test strategies • Sequential Test • Single Batch Test • Sequential Batch Test • Conclusions and future work

  10. T2 Low Med T3 T1 1 2 >=36 <36 T6 0 1 0 a c b 0 1 0 Cost-sensitive Decision Tree Advantages: tree structure, comprehensiblity Objective: minimizing the total cost of tests and misclassification.

  11. 1 1 3 3 2 2 E1 C1 C2 E2 E3 C3 Attribute Splitting Criteria • Previous methods: C4.5 reduces the entropy (randomness), performs badly on cost sensitive tasks • New (ICML’04): we reduce the total expected cost E Choose T such that E – (E1+E2+E3) is max C Choose T such that C – (C1+C2+C3+C_Test) is max

  12. Case Study: Heart Disease • Predict coronary artery disease • Class 0: less than 50% artery narrowing; Class 1: more than 50% artery narrowing • ~300 patients, collected from hospitals • 13 non-invasive tests on patients

  13. 13 Tests (Heart Disease)

  14. cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 restecg ($15.5) 1 1 2 1 2 0 1 0 1 Cost-sensitive tree for Heart Disease • Naturally prefer tests with small cost • Balance cost and discriminating power • Local heart-failure specialist thinks this tree is reasonable.

  15. Considering Group Discount Discount: $2.10 Discount: $101.90 Discount: $86.30

  16. cp ($1) cp ($1) 1 1 2 2 3 3 4 4 sex ($1) sex ($1) slope ($87.3) slope ($87.3) fbs ($5.2) fbs ($5.2) thal ($102.9) thal ($102.9) 3 3 2 2 1 1 1 1 2 2 2 2 1 1 1 1 2 2 0 0 age ($1) age ($1) 0 0 chol ($7.27) chol ($7.27) thal ($102.9) thal ($102.9) 1 1 0 0 1 1 1 1 2 2 1 1 1 1 2 2 1 1 3 3 2 2 1 1 0 0 restecg ($15.5) restecg ($15.5) 1 1 0 0 thalach ($1) restecg ($15.5) 1 1 1 1 2 2 1 1 2 2 0 0 1 1 0 0 1 1 Different trees without/with group discount individual cost: $102.9 Before After

  17. Algorithm of Cost-sensitive Decision Tree CSDT(Examples, Attributes, TestCosts) • If all examples are positive, return root with label=+ • If all examples are negative, return root with label=- • If maximum cost reduction <0, return root with label according to min(PTP+ NFP, NTN+ PFN) • Let A be an attribute with maximum cost reduction • root  A • Update TestCosts if discount applies • For each possible value viof the attribute A • Add a new branch A=vi below root • Segment the training examples Example_vi into the new branch • Call CSDT(examples_vi, Attributes-A, TestCosts) to build subtree

  18. Outline • Introduction • Cost-sensitive decision trees • Test strategies • Sequential Test • Single Batch Test • Sequential Batch Test • Conclusions and future work

  19. T2 Low Med T3 T1 1 2 >=36 <36 T6 0 1 0 a c b 0 1 0 • Three categories of intelligent test strategies • Sequential Test: one test, wait, … then predict • Single Batch Test: one batch of tests, then predict • Sequential Batch Test: batch 1, batch 2, … then predict • Minimize total cost of tests and misclassification, not trivial • Our methods: utilizing the minimum-cost tree structure

  20. Outline • Introduction • Cost-sensitive decision trees • Test strategies • Sequential Test • Single Batch Test • Sequential Batch Test • Conclusions and future work

  21. Sequential Test • Use tree structure to guide test sequence • “Optimal” because tree is (locally) optimal

  22. cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Sequential Test

  23. Experimental Comparison • Using 10 datasets from UCI

  24. Comparing Sequential Test • Eager learning: Sequential Test (OST) (ICML’04) • Lazy learning: Lazy Sequential Test (LazyOST) (TKDE’05) • Cost-sensitive Naïve Bayes (CSNB) (ICDM’04)

  25. Outline • Introduction • Cost-sensitive decision trees • Test strategies • Sequential Test • Single Batch Test • Sequential Batch Test • Conclusions and future work

  26. Single Batch Test • Only one batch – not an easy task • If too few, important tests not requested; prediction is not accurate; total cost high • If too many, some tests are wasted; total cost high • The test example may not be classified by a leaf

  27. 1 3 2 j1 j2 j3 Single Batch Test • Expected cost reduction: if a test is done, what are the possible outcomes and cost reduction • R(.): all reachable unknown nodes and leaves i

  28. Single Batch Test • A*-like search algorithm • Form a candidate list (L) and a batch list (B) • Choose a test with maximum positive expected cost reduction from L, add it to B • Update L: add all reachable unknowns to L • Efficient with tree structure • until expected cost reduction is 0

  29. L = empty /* list of reachable and unknown attributes */ B = empty /* the batch of tests */ u = the first unknown attribute when classifying a test case Add u into L Loop For each i L, calculate E(i): E(i)= misc(i) – [c(i) + ] E(t) = max E(i) /* t has the maximum cost reduction */ If E(t) > 0 then add t into B, delete t from L, add r(t) into L else exit Loop /* No positive cost reduction */ Until L is empty Output B as the batch of tests Single Batch Test

  30. cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Single Batch Test ]

  31. cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Single Batch Test cp is unknown. cp has positive expected cost reduction. cp is added to the batch. cp’s reachable unknown nodes are added into the candidate list. ]

  32. cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Single Batch Test From the candidate list, choose one with maximum positive expected cost reduction. Add it to the batch, and update the candidate list. Repeat. After 7 steps, expected cost reduction is 0. ]

  33. cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Single Batch Test Do all tests in the batch ]

  34. cp ($1) 1 2 3 4 sex ($1) slope ($87.3) fbs ($5.2) thal ($102.9) 3 2 1 1 2 2 1 1 2 0 age ($1) 0 chol ($7.27) thal ($102.9) 1 0 1 1 2 1 1 2 1 3 2 1 0 restecg ($15.5) 1 0 thalach ($1) 1 1 2 1 2 0 1 0 1 Single Batch Test Make a prediction. Some tests are wasted. ] Predict by internal node

  35. Comparing Single Batch Tests • Naïve Single Batch (NSB) (ICML’04) • Cost-sensitive Naïve Bayes Single Batch (CSNB-SB) (ICDM’04) • Greedy Single Batch (GSB) (TKDE’05) • Single Batch Test (OSB) (TKDE’05)

  36. Outline • Introduction • Cost-sensitive decision trees • Test strategies • Sequential Test • Single Batch Test • Sequential Batch Test • Conclusions and future work

  37. Sequential Batch • Batch 1, batch 2, … , prediction • Must include the cost of waiting in tests • Wait cost of a batch: maximum wait cost in the batch • Less than the sum • Combines Sequential Test and Single Batch Test • If all waiting costs =0, it becomes Sequential Test • If all waiting costs very large, Single Batch

  38. Sequential Batch • The wait cost is derived from wait time Test wait time in hours

  39. Sequential Batch • Extending the Single Batch to include the batch cost • An additional constraint: cumulative ROI No more batches!

  40. Loop L = empty /* list of reachable and unknown attributes */ B = empty /* the batch of tests */ u = the first unknown attribute when classifying a test case Add u into L Loop For each i L, calculate E(i): E(i)= misc(i) – [c(i) + ] E(t) = max E(i) /* t has the maximum cost reduction */ If E(t) > 0 & ROI increases then add t into B, delete t from L, add r(t) into L else exit Loop /* No positive cost reduction */ Until L is empty If (B is not empty) then Output B as the current batch of tests; obtain their values at a cost Classify the test example further, until encountering another unknown test Else exit the first Loop Sequential Batch

  41. Comparing Sequential Batch Test

  42. Outline • Introduction • Cost-sensitive decision trees • Test strategies • Sequential Test • Single Batch Test • Sequential Batch Test • Conclusions and future work

  43. Future Work • Deal with different test examples differently • Consider more costs: acquiring new examples • If $10 for each new example, how many do I need? • For $10, tell me if this patient has cancer • If test is not accurate (e.g. 90%), how to build trees and how to do tests (will I do it again)? • From cost-sensitive trees, derive medical policy for expensive/risky or cheap/effective tests

  44. Conclusions • Cost-sensitive decision tree: effective for learning with minimal total cost • Can be used to model learning from data with costs • Design and compare various test strategies • Sequential Test: one test, wait, …: low cost but long wait • Single Batch Test: one batch of tests: quick but higher cost • Sequential Batch Test: batch, wait, batch, …: best tradeoff • Our methods perform better than previous ones • Can be readily applied to real-world diagnoses

  45. References C.X. Ling, Q. Yang, J. Wang, and S. Zhang. Decision Trees with Minimal Costs. ICML'2004. X. Chai, L. Deng, Q. Yang, and C.X. Ling. Test-Cost Sensitive Naive Bayes Classification. ICDM'2004. C.X. Ling, S. Sheng, Q. Yang. “Intelligent Test Strategies for Cost-sensitive Decision Trees. IEEE TKDE, to appear, 2005. S. Zhang, Z. Qin, C.X. Ling, S. Sheng. "Missing is Useful": Missing Values in Cost-sensitive Decision Trees. IEEE TKDE, to appear, 2005. Turney, P.D. 2000. Types of cost in inductive concept learning. Workshop on Cost-Sensitive Learning at ICML’2000. Zubek, V.B., and Dietterich, T. 2002. Pruning improves heuristic search for cost-sensitive learning. ICML’2002. Turney, P.D. 1995. Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. JAIR, 2:369-409. Lizotte, D., Madani, O., and Greiner R. 2003. Budgeted Learning of Naïve-Bayes Classifiers. In Uncertainty in AI.

More Related