120 likes | 271 Views
Active Learning based on Bayesian Networks. Luis M. de Campos, Silvia Acid and Moisés Fernández. Index of Contents. 1. Introduction The scenario is pool-based active learning cycle. 2. Data and evaluation
E N D
Active Learning based on Bayesian Networks Luis M. de Campos, Silvia Acid and Moisés Fernández
Index of Contents 1. Introduction • The scenario is pool-based active learning cycle. 2. Data and evaluation • We have participated in 5 from the six datasets considered. The evaluation is realized with AUC and ALC. 3. Methods • Features, modules implemented, general procedure, how to query labels and a practical example. 4. Results • The best result is in sixth position. 5. Conclusions 6. Acknowledgments
2. Data and evaluation • There are 6 datasets of test-final phase. We have participated in five from the six: A, C, D, E and F. • These datasets are from different application domains: • Chemoinformatics. • Embryology. • Marketing • Text ranking. • Evaluation with: • Area under the ROC curve (AUC) • Area under the Learning Curve (ALC).
3. Methods. Features • Hardware used: laptop with platform Ubuntu 8.10, 4GB of memory and Intel core duo to 2.53GHz. • We have used three base classifiers from Bayesian Networks: • Naive Bayes. It was used in dataset D. • TAN (Tree Augmented Network) with score BDeu. It was used in dataset F. • CHillClimber. New classifier that moves in a reduced search space centered on the node class. It was used with score BDeu and in dataset A, C and E. • Method of discretization for numerical variables: • Fayyad & Irani MDL in TAN and CHillClimber. • None in Naive Bayes.
2 3 1 5 4 3. Methods. Features and Modules • Active learning method: uncertainty sampling. • We didn’t use unlabeled data for training. • Software implemented (several modules): • Matlab: main module. It calls the module C++. • C++: intermediate module. It calls the module Weka-Java. • Weka-Java: final module. It’s implemented with Java in Weka with several modifications.
… Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 … (n/2)/10 (n/2)/10 (n/2) (n/2)/10 (n/2)/10 … (n/2)/10 (n/2)/10 (n/2) … 1 2 4 8 16 32 64 Iteration 1 Iteration 2 Iteration 3 Iteration 4 … 3. Methods. Procedure • The procedure is as follows: • Algorithm trains with all known instances, initially it only has got the seed. • It selects new examples to query using a particular method (a,b,c). See the following transparency. • It joins all of known instances. • Are they all instances known? No: go to 1. Yes: end. • Number of instances to query in each iteration is fixed (three different ways): • Exponencial. • Equal10-All. • All-Equal10. “n” is the total labels of dataset.
3. Methods. How to query examples (a, b or c) • For each iteration we sort the examples in increasing ordering of the probabilities of the most probable class. Then we choose “x” examples with the particular method elected: • We query the “x” examples having the lowest probabilities. • We query “x1” and “x2” examples having the lowest probabilities corresponding to class -1 and to class 1 respectively maintaining the proportion of examples of each class known so far.. x = x1 + x2. • like method b, but “x1” and “x2” are calculated using the proportion of examples of each class estimated from both the tags returned by the oracle and values returned by our classifier.
Select Max probability Sort 3. Methods. An example. • Prior knowledge: 6 examples corresponding to class -1 and 4 to class 1. In addition, our classifier shows the next probabilities: • Our strategy of type exponencial indicates that we have to choose 4 examples (we are in the iteration three): • With method a: we would choose examples 3,5,4,6. • With method b: we would choose examples 3,5,2,1. • With method c: we would choose examples 3,5,4,2.
4. Results • Our results are rather modest, obtaining reasonable performance only in two datasets, C and E. • To the left we can see the plot of dataset E and to the right the plot of dataset C.
5. Conclusions • We can improve our process if we apply further processing by clustering when we have a few instances. • Advantages: • Simple. • No time consuming. • Disadvantages: • Static behavior. • Lack of knowledge in early stages of the process.
Acknowledgments • This work has been supported by the Spanish research programme Consolider Ingenio 2010: MIPRCV (CSD2007-00018).