100 likes | 228 Views
Distributed Genetic Algorithm for feature selection in Gaia RVS spectra. Application to ANN parameterization. D.Fustes , D.Ordóñez , C.Dafonte , M.Manteiga and B. Arcay. Introduction.
E N D
DistributedGeneticAlgorithmforfeatureselection in Gaia RVS spectra Applicationto ANN parameterization D.Fustes, D.Ordóñez, C.Dafonte, M.Manteiga and B. Arcay
Introduction • GGG (Galician Group for Gaia): Part of CU8 in DPAC. Involved in classification and parameterization tasks using AI techniques • Work with simulated data of the RVS instrument: • Estimation of physical parameters: • Effective temperatures • Superficial gravities • Metallicities • Abundancies of alpha elements
Gaia RVS simulated data • Library compiled by A. Recio, P. de Laverny and B. Plez • 971 points per spectra. • Different SNR levels: 5,10,50, 200, .. • 70% data to train the Network and 30% to test the model • Use of ANN networks to perform the parameterization
Discrete Wavelet Transform • Redundant filtering process: • High-pass filters to generate Details • Low-pass filters to generate Approximations • Use of level 3 DWT: A3+D3+D2+D1, 997 points
Feature selection • Reduce the spectra to fewer dimensionality • Reduce the complexity of the models • Reduce the computational needs • Variability-based methods: Reduce the dimensionality of a set capturing most of its variability (PCA) • They can not be specialized to capture the features relevant to the estimation of each parameter • Genetic Algorithm to select relevant areas for each parameter
Genetic algorithm • Based on the Evolution’s Theory • Best individuals reproduce and pass to the next generation • Fitness function: Train the ANN, test it and inverse the mean error. Computationally expensive!!!
Distributed computation • Huge computation needs lead to scalable solutions • Multicomputers are cheaper than supercomputers • Ways to distribute the algorithm • Low level: Distribute the ANN computation: • It should be performed in hardware • Medium level: Distribute the ANN learning • Possible with batch learning • Online learning perform better in this case • High level: Distribute the fitness computation • It was implemented in C++ with MPI and OpenMp
Results(1) • SNR 200 • Original spectra
Results(2) • SNR 200 • Wavelet domain