1 / 28

Treatment Learning: Implementation and Application

Treatment Learning: Implementation and Application. Ying Hu Electrical & Computer Engineering University of British Columbia. Outline. An example Background Review TAR2 Treatment Learner TARZAN: Tim Menzies TAR2: Ying Hu & Tim Menzies TAR3: improved tar2 TAR3: Ying Hu

omar
Download Presentation

Treatment Learning: Implementation and Application

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Treatment Learning:Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia

  2. Outline • An example • Background Review • TAR2 Treatment Learner • TARZAN: Tim Menzies • TAR2: Ying Hu & Tim Menzies • TAR3: improved tar2 • TAR3: Ying Hu • Evaluation of treatment learning • Application of Treatment Learning • Conclusion Ying Hu http://www.ece.ubc.ca/~yingh 2

  3. low high • C4.5’s decision tree: • Treatment learner: 6.7 <= rooms < 9.8 and 12.6 <= parent teacher ratio < 15.9 0.6 <= nitric oxide < 1.9 and 17.16 <= living standard < 39 First Impression • Boston Housing Dataset (506 examples, 4 classes) Ying Hu http://www.ece.ubc.ca/~yingh 3

  4. Review: Background • What is KDD ? • KDD = Knowledge Discovery in Database [fayyad96] • Data mining: one step in KDD process • Machine learning: learning algorithms • Common data mining tasks • Classification • Decision tree induction (C4.5) [quinlan86] • Nearest neighbors [cover67] • Neural networks [rosenblatt62] • Naive Baye’s classifier [duda73] • Association rule mining • APRIORI algorithm [agrawal93] • Variants of APRIORI Ying Hu http://www.ece.ubc.ca/~yingh 4

  5. Input: classified dataset Assume: classes are ordered Output: Rx=conjunction of attribute-value pairs Size of Rx = # of pairs in the Rx confidence(Rx w.r.t Class) = P(Class|Rx) Goal: to find Rx that have different level of confidence across classes Evaluate Rx: lift Visualization form of output Treatment Learning: Definition Ying Hu http://www.ece.ubc.ca/~yingh 5

  6. Motivation: Narrow Funnel Effect • When is enough learning enough? • Attributes: < 50%, accuracy: decrease 3-5% [shavlik91] • 1-level decision tree is comparable to C4 [Holte93] • Data engineering: ignoring 81% features result in 2% increase of accuracy [kohavi97] • Scheduling: random sampling outperforms complete search (depth-first) [crawford94] • Narrow funnel effect • Control variables vs. derived variables • Treatment learning: finding funnel variables Ying Hu http://www.ece.ubc.ca/~yingh 6

  7. TAR2: The Algorithm • Search + attribute utility estimation • Estimation heuristic: Confidence1 • Search: depth-first search • Search space: confidence1 > threshold • Discretization: equal width interval binning • Reporting Rx • Lift(Rx) > threshold • Software package and online distribution Ying Hu http://www.ece.ubc.ca/~yingh 7

  8. The Pilot Case Study • Requirement optimization • Goal: optimal set of mitigations in a cost effective manner Risks Cost relates Requirements incur reduce achieve Mitigations Benefit • Iterative learning cycle Ying Hu http://www.ece.ubc.ca/~yingh 8

  9. Compared to Simulated Annealing The Pilot Study (continue) • Cost-benefit distribution (30/99 mitigations) Ying Hu http://www.ece.ubc.ca/~yingh 9

  10. Problem of TAR2 • Runtime vs. Rx size • To generate Rx of size r: • To generate Rx from size [1..N] Ying Hu http://www.ece.ubc.ca/~yingh 10

  11. TAR3: the improvement • Random sampling • Key idea: • Confidence1 distribution = probability distribution • sample Rx from confidence1 distribution • Steps: • Place item (ai) in increasing order according to confidence1 value • Compute CDF of each ai • Sample a uniform value u in [0..1] • The sample is the least ai whose CDF>u • Repeat till we get a Rx of given size Ying Hu http://www.ece.ubc.ca/~yingh 11

  12. Runtime vs. Data size • Runtime vs. TAR2 • Runtime vs. Rx size Comparison of Efficiency Ying Hu http://www.ece.ubc.ca/~yingh 12

  13. pilot2 dataset (58 * 30k ) • Mean and STD in each round Comparison of Results • 10 UCI domains, identical best Rx • Final Rx: TAR2=19, TAR3=20 Ying Hu http://www.ece.ubc.ca/~yingh 13

  14. learning Compare Accuracy some attributes learning External Evaluation C4.5 Naive Bayes • FSS framework All attributes (10 UCI datasets) Feature subset selector TAR2less Ying Hu http://www.ece.ubc.ca/~yingh 14

  15. The Results • Number of attributes • Accuracy using Naïve Bayes • Accuracy using C4.5 (avg decrease 0.9%) (Avg increase = 0.8% ) Ying Hu http://www.ece.ubc.ca/~yingh 15

  16. Compare to other FSS methods • # of attribute selected (Naive Bayes) • # of attribute selected (C4.5 ) • 17/20, fewest attributes selected • Another evidence for funnels Ying Hu http://www.ece.ubc.ca/~yingh 16

  17. Applications of Treatment Learning • Downloading site: http://www.ece.ubc.ca/~yingh/ • Collaborators: JPL, WV, Portland, Miami • Application examples • pair programming vs. conventional programming • identify software matrix that are superior error indicators • identify attributes that make FSMs easy to test • find the best software inspection policy for a particular software development organization • Other applications: • 1 journal, 4 conference, 6 workshop papers Ying Hu http://www.ece.ubc.ca/~yingh 17

  18. Main Contributions • New learning approach • A novel mining algorithm • Algorithm optimization • Complete package and online distribution • Narrow funnel effect • Treatment learner as FSS • Application on various research domains Ying Hu http://www.ece.ubc.ca/~yingh 18

  19. ====================== • Some notes follow Ying Hu http://www.ece.ubc.ca/~yingh 19

  20. Input example classified dataset Output example: Rx=conjunction of attribute-value pairs confidence(Rx w.r.t C) = P(C|Rx) Rx Definition example Ying Hu http://www.ece.ubc.ca/~yingh 20

  21. TAR2 in practice • Domains containing narrow funnels • A tail in the confidence1 distribution • A small number of variables that have disproportionally large confidence1 value • Satisfactory Rx of small size (<6) Ying Hu http://www.ece.ubc.ca/~yingh 21

  22. Background: Classification • 2-step procedure • The learning phase • The testing phase • Strategies employed • Eager learning • Decision tree induction (e.g. C4.5) • Neural Networks (e.g. Backpropagation) • Lazy learning • Nearest neighbor classifiers (e.g. K-nearest neighbor classifier) Ying Hu http://www.ece.ubc.ca/~yingh 22

  23. Possible Rule: B => C,E [support=2%, confidence= 80%] Where support(X->Y) = P(X) confidence(X->Y) = P(Y|X) Representative algorithms APRIORI Apriori property of large itemset Max-Miner More concise representation of the discovered rules Different prune strategies. Background: Association Rule Ying Hu http://www.ece.ubc.ca/~yingh 23

  24. Background: Extension • CBA classifier • CBA = Classification Based on Association • X=>Y, Y = class label • More accurate than C4.5 (16/26) • JEP classifier • JEP = Jumping Emerging Patterns • Support(X w.r.t D1) = 0, Support(X w.r.t D2) > 0 • Model: collection of JEPs • Classify: maximum collective impact • More accurate than both C4.5 & CBA (15/25) Ying Hu http://www.ece.ubc.ca/~yingh 24

  25. Background: Standard FSS Method • Information Gain attribute ranking • Relief • Principle Component Analysis (PCA) • Correlation based feature selection • Consistency based subset evaluation • Wrapper subset evaluation Ying Hu http://www.ece.ubc.ca/~yingh 25

  26. Comparison • Relation to classification • Class boundary / class density • Class weighting • Relation to association rule mining • Multiple classes / no class • Confidence-based pruning • Relation to change detecting algorithm • support: |P(X|y=c1)-P(X|y=c2)| • confidence: |P(y=c1|X)-P(y=c2|X)| • Baye’s rule Ying Hu http://www.ece.ubc.ca/~yingh 26

  27. Confidence Property • Universal-extential upward closure R1: Age.young -> Salary.low R2: Age.young, Gender.m -> Salary.low R2: Age.young, Gender.f -> Salary.low • Long rule tend to have high confidence • Large Rx tend to have high lift value Ying Hu http://www.ece.ubc.ca/~yingh 27

  28. TAR3: Usability • Usability: more user-friendly • Intuitive, default setting Ying Hu http://www.ece.ubc.ca/~yingh 28

More Related