1 / 35

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods. Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN {hidenao,yamaguti}@ks.cs.inf.shizuoka.ac.jp. Contents. Introduction Constructive meta-learning based on method repositories

astrid
Download Presentation

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN {hidenao,yamaguti}@ks.cs.inf.shizuoka.ac.jp

  2. Contents • Introduction • Constructive meta-learning based on method repositories • Case study with common data sets • Conclusion Parallel and Distributed Computing for Machine Learning 2003

  3. 学習 データセット 学習 Data Set 学習 Learning Algorithms Overview of meta-learning scheme Meta-learning algorithm Meta Knowledge Meta-Learning Executor A Better result to the given data set than its done by each single learning algorithm Parallel and Distributed Computing for Machine Learning 2003

  4. Selective meta-learning scheme • Integrating base-level classifiers, which are learned with different training data sets generating by • “Bootstrap Sampling” (bagging) • weighting ill-classified instances (boosting) • Integrating base-level classifiers, which are learned with different algorithms, with • simple voting (voting) • constructing meta-classifier with a meta data set and a meta-level learning algorithm (stacking, cascading) They don’t work well, when no base-level learning algorithm works well to the given data set !! -> Because they don’t de-compose base-level learning algorithms. Parallel and Distributed Computing for Machine Learning 2003

  5. Contents • Introduction • Constructive meta-learning based on method repositories • Basic idea of our constructive meta-learning • An implementation of the constructive meta-learning called CAMLET • Parallel execution and search for CAMLET • Case Study with common data sets • Conclusion Parallel and Distributed Computing for Machine Learning 2003

  6. De-composition& Organization Search & Composition + C4.5 CS VS AQ15 Basic Idea of ourConstructive Meta-Learning Analysis of Representative Inductive Learning Algorithms Parallel and Distributed Computing for Machine Learning 2003

  7. De-composition& Organization Search & Composition + CS AQ15 Basic Idea ofConstructive Meta-Learning Analysis of Representative Inductive Learning Algorithms Parallel and Distributed Computing for Machine Learning 2003

  8. De-composition& Organization Search & Composition + Automatic Composition of Inductive Applications Organizing inductive learning methods, control structures and treated objects. Basic Idea ofConstructive Meta-Learning Analysis of Representative Inductive Learning Algorithms Parallel and Distributed Computing for Machine Learning 2003

  9. Analysis of Representative Inductive Learning Algorithms We have analyzed the following 8 learning algorithms: • Version Space • AQ15 • ID3 • C4.5 • Boosted C4.5 • Bagged C4.5 • Neural Network with back propagation • Classifier Systems We have identified 22 specific inductive learning methods. Parallel and Distributed Computing for Machine Learning 2003

  10. Inductive Learning Method Repository Parallel and Distributed Computing for Machine Learning 2003

  11. Data Type Hierarchy Organization of input/output/reference data types for inductive learning methods Objects classifier-set If-Then rules tree (Decision Tree) network (Neural Net) data-set training data set validation data set test data set Parallel and Distributed Computing for Machine Learning 2003

  12. Identifying Inductive Learning Methods and Control Structures Modifying training and validation data sets Generating Training & Validation Data Sets Evaluating classifier sets to test data set Generating a classifier set Evaluating a classifier set Modifying a classifier set Modifying a classifier set Parallel and Distributed Computing for Machine Learning 2003

  13. CAMLET:a Computer Aided Machine Learning Engineering Tool Data Sets, Goal Accuracy CAMLET Method Repository Data Type Hierarchy Control Structures Construction Instantiation Compile User Go to the goal accuracy? Go & Test Yes No Refinement Inductive application, etc. Parallel and Distributed Computing for Machine Learning 2003

  14. CAMLET with a parallel environment Data Sets, Goal Accuracy CAMLET Method Repository Data Type Hierarchy Control Structures Composition Level Construction Specifications Data Sets Execution Level Instantiation Compile User Go to the goal accuracy? Go & Test Yes No Results Refinement Inductive application, etc. Parallel and Distributed Computing for Machine Learning 2003

  15. Parallel Executions of Inductive Applications Initializing Composition Level is necessary one process Constructing specs Sending specs Execution Level is necessary one or more process elements R R R R Refining & Sending Executing Receiving Refining the spec Sending refined one Waiting Receiving a result Ex Ex Ex R Ex Ex Ex Ex Ex Ex S Ex R: receiving ,Ex: executing, S: sending Parallel and Distributed Computing for Machine Learning 2003

  16. Refinement of Inductive Applications with GA Executed Specs & Their Results t Generation • Transform executed specifications to chromosomes • Select parents with “Tournament Method” • Crossover the parents and Mutate one of their children • Execute children’s specification, transforming chromosomes to specs • * If CAMLET can execute more than two I.A at the same time, some slow I.A will be added to t+2 or later Generation. Selection execute & add Parents Crossover and Mutation Children t+1 Generation X Generation Parallel and Distributed Computing for Machine Learning 2003

  17. CAMLET on a parallel GA refinement Data Sets, Goal Accuracy CAMLET Method Repository Data Type Hierarchy Control Structures Composition Level Construction Specifications Data Sets Execution Level Instantiation Compile User Go to the goal accuracy? Go & Test Yes No Results Refinement Inductive application, etc. Parallel and Distributed Computing for Machine Learning 2003

  18. Contents • Introduction • Constructive meta-learning based on method repositories • Case Study with common data sets • Accuracy comparison with common data sets • Evaluation of parallel efficiencies in CAMLET • Conclusion Parallel and Distributed Computing for Machine Learning 2003

  19. Set up of accuracy comparison with common data sets • We have used two kinds of common data sets • 10 Statlog data sets • 32 UCI data sets (distributed with WEKA) • We have input goal accuracies to each data set as the criterion. • CAMLET has output just one specification of the best inductive application to each data set and its performance. • CAMLET has searched about 6,000 inductive applications for the best one, executing up to one hundred inductive applications. • 10 “Execution Level” PEs have been used to each data set. • Stacking methods implemented in WEKA data mining suites Parallel and Distributed Computing for Machine Learning 2003

  20. Statlog common data sets To generate 10-fold cross validation data sets’ pairs, We have used SplitDatasetFilter implemented in WEKA. Parallel and Distributed Computing for Machine Learning 2003

  21. Setting up of Stacking methods to Statlog common data sets Meta-level Learning Algorithms J4.8 (with pruning, C=0.25) Naïve Bayes Classification with Linear Regression (with all of features) J4.8 (with pruning, C=0.25) Naïve Bayes Part (with pruning, C=0.25) IBk (k=5) Bagged J4.8 (Sampling rate=55%, 5 Iterations) Bagged J4.8 (Sampling rate=55%, 10 Iterations) Boosted J4.8 (5 Iterations) Boosted J4.8 (10 Iterations) Base-level Learning Algorithms Parallel and Distributed Computing for Machine Learning 2003

  22. Evaluation on accuracies to Statlog common data sets Inductive applications composed by CAMLET have shown us as good performance as that of stacking with Classification via Linear Regression. Parallel and Distributed Computing for Machine Learning 2003

  23. Setting up of Stacking methods to UCI common data sets Meta-level Learning Algorithms J4.8 (with pruning, C=0.25) Naïve Bayes Linear Regression (with all of features) J4.8 (with pruning, C=0.25) Naïve Bayes Part (with pruning, C=0.25) IBk (k=5) Bagged J4.8 (Sampling rate=55%, 10 Iterations) Boosted J4.8 (10 Iterations) Base-level Learning Algorithms Parallel and Distributed Computing for Machine Learning 2003

  24. Evaluation on accuracies to UCI common data sets Parallel and Distributed Computing for Machine Learning 2003

  25. Analysis of parallel efficiency of CAMLET • We have analyzed parallel efficiencies of executions of the case study with “Statlog common data sets”. • We have used 10 processing elements to execute each inductive applications. • Parallel Efficiency is calculated as the following formula: 1<=x<100, n:#fold, e:Execution Time of Each PE Parallel and Distributed Computing for Machine Learning 2003

  26. Parallel Efficiency of the case study with StatLog data sets Parallel Efficiencies of 10-fold CV data sets Parallel Efficiencies of training-test data sets Parallel and Distributed Computing for Machine Learning 2003

  27. Why the parallel efficiency of “satimage” data set was so low? • Because: • - CAMLET could not find satisfactory inductive application to this • data set within 100 executions of inductive applications. • In the end of search, inductive applications with large computational • cost badly affected to parallel efficiency. Parallel and Distributed Computing for Machine Learning 2003

  28. Contents • Introduction • Constructive meta-learning based on method repositories • Case Study with common data sets • Conclusion Parallel and Distributed Computing for Machine Learning 2003

  29. Conclusion • CAMLET has been implemented as a tool for “Constructive Meta-Learning” scheme based on method repositories. • CAMLET shows us a significant performance as a meta-learning scheme. • We are extending the method repository to construct data mining application of whole data mining process. • The parallel efficiency of CAMLET has been achieved greater than 5.93 times. • It should be improved • with methods to equalize each execution time such as load balancing or/and three-layered architecture. • with more efficient search method to find satisfactory inductive application faster. Parallel and Distributed Computing for Machine Learning 2003

  30. Thank you! Parallel and Distributed Computing for Machine Learning 2003

  31. ベースレベル 分類器 ベースレベル 分類器 ベースレベル 分類器 ベースレベル 分類器 Base-level classifiers Base-level classifiers Stacking: training and prediction Training phase Prediction phase Training Set Repeating with CV Training Set Test Set Learning of Base-level classifiers Training Set (Int.) Test Set (Int.) Learning of Base-level classifiers Transformation Meta-level Test Set Transformation Prediction Meta-level Training Set Meta-level classifiers Meta-level learning Meta-level classifier Results of Preduction Parallel and Distributed Computing for Machine Learning 2003

  32. How to transforme base-level attributes to meta-level attributes Learning with Algorithm 1 Classifier by Algorithm 1 • Getting prediction probability to each class value with the • classifier. • 2. Adding base-level class value of each base-level instance A algorithm C class values #Meta-level Att. = AC Parallel and Distributed Computing for Machine Learning 2003

  33. Meta-level classifiers (Ex.) The prediction probability of”0” With classifier by algorithm 1 >0.1 ≦0.1 The prediction probability of”1” With classifier by algorithm 2 The prediction probability of”1” With classifier by algorithm 1 >0.95 ≦0.95 ≦0.9 >0.9 The prediction probability of”0” With classifier by algorithm 3 Prediction “1” Prediction “1” Prediction “0” .... Parallel and Distributed Computing for Machine Learning 2003

  34. Accuracies of 8 base-level learning algorithms to Statlog data sets CAMLET Base-level algorithms Stacking Base-level algorithms Accuracy(%) Accuracy(%) Parallel and Distributed Computing for Machine Learning 2003

  35. C4.5 Decision Tree without pruning (reproduction of CAMLET) Generating training and validation data sets with a void validation set 1. Select just one control structure 2. Fill each method with specific methods 3. Instantiate this spec. 4. Compile the spec. 5. Execute its executable code 6. Refine the spec. (if needed) Generating a classifier set (decision tree) with entropy + information ratio Evaluating a classifier set with set evaluation Evaluating classifier sets to test data set with single classifier set evaluation Parallel and Distributed Computing for Machine Learning 2003

More Related