210 likes | 313 Views
GaiaGrid – A three Year Experience. Salim Ansari Toulouse 20 th October, 2005. Why Grid?. The GDAAS Study had underestimated the necessary computational power to carry out the Gaia Data Analysis prototype.
E N D
GaiaGrid – A three Year Experience Salim Ansari Toulouse 20th October, 2005 S.G. Ansari
Why Grid? • The GDAAS Study had underestimated the necessary computational power to carry out the Gaia Data Analysis prototype. • The number of parallel activities spun out of control, as algorithm providers began delivering algorithms that could not be implemented on the limited infrastructure dedicated to GDAAS • A clear need for a collaborative environment was inevitable S.G. Ansari
Objectives • to increase computational power whenever and wherever needed at low cost • to provide a framework of developing Shell task algorithms for Gaia and • to establish a collaborative environment, where the community may share and exchange results S.G. Ansari
Constraints Moto: Low cost, high return on investment • Low cost hardware budget: reusability of low end PC’s • Small investment in industrial effort: [0.5 FTE] • System Administration: 1 junior staff + maintenance [1 FTE] S.G. Ansari
Core vs. Shell Tasks As a result of the GDAAS Study, two categories of algorithms had been established: Core Tasks: • Initial Data Treatment • Global Iterative Solution • Cross-correlations Acts upon the totality of the data Shell Tasks: • Classification • Photometric analysis • Spectroscopic analysis Any data analysis involving remote expertise and acts upon a portion of the data at a time Centralised S.G. Ansari
Gaia Virtual Organisation June 2005 S.G. Ansari
The Processing ScopeMichael Perryman, GAIA-MP-009, 17 August 2004, Version 1.1 • Top Tasks • - GIS processing: 125 days (CPU processing on 2012 machine) • - first-look: 125 days (assumed equal to GIS at present) • - spectro PSF fitting: 71 days • - variability period: 33 days • - various DMS classes: 60 days • DMS: ASM analysis • - multiples: ASM * Assuming a 40 GFlop machine today extrapolated to 2012 with Moore’s Law S.G. Ansari
The first months • Setting up the hardware and nodes was easy and took 2 man months • Globus was installed on: • ESTEC nodes • ESRIN was already up and running • CESCA node in Barcelona • ULB node in Brussels • ARI node in Heidelberg • GridAssist tool was identified as a potential workflow tool S.G. Ansari
Shell Task Shell Task Globus Node Globus Node } Barcelona Task distribution on GaiaGrid ULB ESRIN GridAssist Controller ESTEC Data Access Layer Core Processing GDAAS DB Initial Data Treatment Gaia Simulator S.G. Ansari
Current Infrastructure 9 Infrastructures in 7 countries (voluntary)[51 CPUs]: • ESTEC [14 CPUs] (SCI-CI) + 1 Gigabit dedicated link to Surfnet • ESAC [ 4 CPUs] (SCI-SD) + 8 Mb link to REDIRIS • ESRIN [16 CPUs] (EOP) + 155 Mb link to GARR • CESCA [ 5 CPUs] (Barcelona) + REDIRIS connectivity • ARI [ 2 CPUs] (Heidelberg) + Academic backbone • ULB [ 1 CPU] (Brussels) + Academic backbone • DutchSpace [7 CPU] (Leiden) + Commercial link • IoA [1 CPU] (Cambridge) + Academic Backbone • UGE [1 CPU] (Geneva) + Academic Backbone 2 Data Storage Elements: • CESCA [5 Terabytes] • ESTEC [2 Terabytes] • ESAC [upto 4 Terabytes) The current infrastructure has been created on an experimental basis and should not yet be considered part of an operational environment S.G. Ansari
Current Applications • Gaia Simulator • Astrometric Binary Star Shell Task • Variability Star Analysis Shell Task • RVS Cross Correlation Shell Task S.G. Ansari
Global Gaia Data Processing S.G. Ansari
The GridAssist ClientPerformance Grid Computation Heidelberg Rome Leiden Barcelona S.G. Ansari
The GridAssist ClientDistributed Grid Computation Barcelona Brussels Noordwijk S.G. Ansari
Results Gaia Simulator profited tremendously from GaiaGrid, which accelerated the simulations of the Astrometric Binary Stars. This would have otherwise needed to be scheduled on a single infrastructure at CESCA, which was at the same time running GDAAS tasks. The Astrometric Binary Star Analysis for a single HTM cell (383 systems) is down to 15 minutes (and falling) on 2 infrastructures from a single CPU in Brussels, which was taking 3 hours. S.G. Ansari
Possible Implementation:The Gaia Collaboration Environment Core Interface The Gaia Community would develop, analyse and update the data transparently, without having any notion of where each component is running, or have to worry about CPU and storage limitatons. Binary Star Analysis Variable Star Analysis Variable Star Analysis Core Interface Radial Velocity Cross Correlations Binary Star Analysis Photometric Analysis Classification Classification Gaia Data Results GaiaLib S.G. Ansari
Security Issues • All ESTEC and ESRIN Grid machines lie outside the ESA firewall • Security is controlled via ESA Grid certification • Currently no distinction is made between projects (e.g. GaiaGrid and PlanckGrid.) • The GridAssist tool provides basic functionality to distinguish an administrator (person who may add/remove sources) from a workflow user. S.G. Ansari
Certification Certification Authority for ESA Grid lies currently with ESTEC (SCI-C) This is under review in light of higher-level discussions within EIROForum Grid Group S.G. Ansari
Future Activities • The GaiaGrid environment is available to anyone wishing to experiment with parallelization and distribution of tasks • In the current Gaia Data Processing framework, the environment can only be used as standalone. • The possibility of using the Grid environment to also carry out some core tasks is being investigated. • GaiaGrid can be considered the testbed for all algorithms under development S.G. Ansari
Conclusions • GaiaGrid has demonstrated that it is easy to setup a Grid environment. • GaiaGrid has also demonstrated the collaborative capabilities by allowing the sharing of results amongst multiple institutes • The deployment of the Gaia Simulator has led programmer to think more “portable” S.G. Ansari
Lessons learned • The development of Gaia algorithms is a task that involves a community of people dispersed across Europe • No single group should believe that they can implement all of these algorithms without the proper support by the community • A sound collaboration environment is essential to ensure that everyone in a single community has a common understanding of the problematics. • Processing is cheap and the technology is simple, but cumbersome to maintain. Each shell task has to be installed on all the Grid machines used in any Virtual Organisation. • There is no magic to Grid! • The main hurdles in Grid involve security and certification. Who should be allowed to run jobs on my machine(s)? • Grid should always be considered as “added value”, but should not be considered within the scope of day-to-day operations like the data processing in Gaia (if it becomes that, you have underestimated the effort of carrying out your project and should review your internal resources for the long term.) S.G. Ansari