10 likes | 157 Views
Gene Expression Programming for Data Mining and Knowledge Discovery. Investigators: Peter Nelson, CS; Xin Li, CS; Chi Zhou, Motorola Inc. Prime Grant Support: Physical Realization Research Center of Motorola Labs. Genotype: sqrt.*.+.*.a.*.sqrt.a.b.c./.1.-.c.d.
E N D
Gene Expression Programming for Data Mining and Knowledge Discovery Investigators: Peter Nelson, CS; Xin Li, CS; Chi Zhou, Motorola Inc. Prime Grant Support: Physical Realization Research Center of Motorola Labs Genotype: sqrt.*.+.*.a.*.sqrt.a.b.c./.1.-.c.d • Real world data mining tasks: large data set, high dimensional feature set, non-linear form of hidden knowledge; in need of effective algorithms. • Gene Expression Programming (GEP): a new evolutionary computation technique for the creation of computer programs; capable of producing solutions of any possible form. • Research goal: applying and enhancing GEP algorithm to fulfill complex data mining tasks. Phenotype: Mathematical form: Figure 1. Representations of solutions in GEP • Overview: improving the problem solving ability of the GEP algorithm by preserving and utilizing the self-emergence of structures during its evolutionary process. • Constant Creation Methods for GEP: local optimization of constant coefficients given the evolved solution structures to speed up the learning process. • A new hierarchical genotype representation: natural hierarchy in forming the solution and more protective genetic operation for functional components. • Dynamic substructure library: defining and reusing self-emergent substructures in the evolutionary process. • Have finished the initial implementation of the proposed approaches. • Preliminary testing has demonstrated the feasibility and effectiveness of the implemented methods: constant creation methods have achieved significant improvement in the fitness of the best solutions; dynamic substructure library helps identify meaningful building blocks to incrementally form the final solution following a faster fitness convergence curve. • Future work include investigation for parametric constants, exploration of higher level emergent structures, and comprehensive benchmark studies.