10 likes | 126 Views
Experiments with Distributed Training of Neural Networks on the Grid. Maciej Malawski 1 Marian Bubak 1,2 Elżbieta Richter-Wąs 3,4 Grzegorz Sala 3,5 Tadeusz Szymocha 3 1 Institute of Computer Science AGH, Mickiewicza 30, 30-059 Kraków, Poland
E N D
Experiments with Distributed Training of Neural Networks on the Grid Maciej Malawski1 Marian Bubak1,2Elżbieta Richter-Wąs3,4Grzegorz Sala3,5 Tadeusz Szymocha3 1Institute of Computer Science AGH, Mickiewicza 30, 30-059 Kraków, Poland 2Academic Computer Centre CYFRONET, Nawojki 11, 30-950 Kraków, Poland 3Institute of Nuclear Physics, Polish Academy of Sciences, Krakow, Poland 4Institute of Physics, Jagiellonian University, Kraków, Poland 5Faculty of Physics and Applied Computer Science AGH, Kraków, Poland {bubak,malawski}@agh.edu.pl, elzbieta.richter-was@cern.ch, sala@fatcat.ftj.agh.edu.pl, Tadeusz.Szymocha@ifj.edu.pl • Target application • High Energy Physics • Discrimination between signal and background events coming from the particle detector (simulation) • ROOT and Athena as basic data analysis tools • Challenges • Neural network training is a highly compute-intensive task – may need High Performance Computing • Finding optimal configuration may be time consuming: many experiments with various parameters – may need High Throughput Computing • Why neural networks • Once trained, are efficient and accurate • Applicable for classification and prediction • Proven in wide area of applications • Solution: The Grid • The distribution of the computation on a cluster of machines can lead to significant improvement in decreasing computation time. • Utilizing resources (multiple clusters) available on the Grid can make this task less time consuming for researcher. • Observation • Training of neural networks on the Grid requires many repeated tasks: • job preparation, • submission, • monitoring of status, • gathering results. • Performing them manually is time consuming for the researcher • → Preparation of tools for automating such tasks can facilitate the whole process considerably. • Our Goals • Develop the tools facilitating usage of Grid for multiple classification experiments • Investigate and validate algorithms for distributed neural network training • Allow seamless integration with data analysis tools such as ROOT • Testbed for our experiments: EGEE project • Virtual Organization for Central Europe • CYFRONET Kraków, PSNC Poznań, KFKI Budapest, CESNET Prague, TU Kosice Grid sites • Support for MPI applications