220 likes | 346 Views
Genetic Programming With Boosting for Ambiguities in Regression Problem. Grégory Paris Laboratoire d’informatique du Littoral Université du Littoral-côte d’Opale 62228 Calais Cedex, France. Paris@lil.univ-littoral.fr. What Are Ambiguities?.
E N D
Genetic Programming With Boosting for Ambiguities in Regression Problem Grégory Paris Laboratoire d’informatique du Littoral Université du Littoral-côte d’Opale 62228 Calais Cedex, France Paris@lil.univ-littoral.fr
What Are Ambiguities? For a given x, several values are possible for f(x).
Contents • Boosting to get several values • Boosting in few words • GPboost: our algorithm for regression problem • Boosting deals with ambiguities, clusters the data • Dealing with several values: Dendrograms • Presentation • Application • Results and conclusion
Presentation of Boosting • Introduced by Freund and Schapire in 90’s • Improvement of machine learning methods • For weak learners methods (methods that perform better than a random search) • Decrease of error on learning set is assured • Makes several hypothesis on different distributions • Makes them vote to get a final hypothesis
Boosting and GP • Iba’s version in 1999 • Distributions are used to build the fitness set • Our version in 2001 • Distribution is included in the fitness function
Fitness set: Distribution: Each example has a weight Initial weight is for each example will be run T times (T rounds of boosting) with different distributions GPboost(notation) « Weak Learner » : : a GP algorithm including distribution in its fitness Fitness function:
For do Run using The best-of-run is denoted is the confidence given to function : error on : Normalization factor GPboost (main loop) Update distribution for the next round: End For
Each function gives a value for x A median weighted by confidence values is computed Others medians provide similar results GPboost(final hypothesis)
Using Boosting (1) • Principle of boosting is to focus on points which have not been matched on previous round • In ambiguities, all the points can not be matched with one function • Using weights to alternatively focus on ambiguities.
Target rms • e.g. Using Boosting(2) • We are seeking a fitness function which will focus on extrema rather than average points
Application • We run GPboost on this ambiguities problem • We use our fitness function • We set T=6, the number of rounds
Merging the data • We are given 6 functions • For a given x, we can provide 6 values • We have to find a way to pick up 2 values among the 6. • We propose dendrograms to solve this problem
Dendrogram • T values • Cluster the set of values and take the median of each cluster • To cluster the values, we build dendrogram • Start with T clusters • At each step, group the two nearest clusters
Dendrogram (Building) S={-1.1; -1; 0; 0.15; 1; 1.05}
Cut the dendrogram • The dendrogram must be cut off at a height corresponding to the number of values we want.
Computing cut-off values • A fixed cut-off value gives better results but needs a priori knowledge of the problem • Dynamic cut-off value • The number of values will be computed in order to reduce the error made on fitness set on each ambiguity
Results With dynamic cut-off value With static cut-off value
Other Benchmarks • Inverting
Other Benchmarks • Inverting
Conclusion and Future Work • Good results on classical and simple problems To do • Improving cut-off value • Applying to real problems