540 likes | 558 Views
Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt. Iniciativas GRID en el CSIC. Valencia, 15 de Julio de 2008 Jesús Marco de Lucas Profesor de Investigación del CSIC Instituto de Física de Cantabria. Outline. Why Grid in CSIC? An impressive record track:
E N D
Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt IniciativasGRIDen elCSIC Valencia, 15 de Julio de 2008 Jesús Marco de Lucas Profesor de Investigación del CSIC Instituto de Física de Cantabria
Outline • Why Grid in CSIC? • An impressive record track: • Data Grid times and CrossGrid • LHC Computing Grid & EGEE • Interactive European Grid i2g • EGEE-III, DORII, EUFORIA • GRID-CSIC • NGI, EGI Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Collaboration • An increasing problem: fragmentation of knowledge • Too many fields • Large information • Complex modelling • Why collaboration is so important? • Projects: big success in Industry and in Science • Add Globalization… • Why collaboration is so difficult? • who were Newton collaborators? • How do you understand collaboration for • Engineers • Scientists • How can we support collaboration in the (post)-Internet era? Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Collaboration: my experience as researcher • My experience as researcher: • Physicist with (some) computing background and (some) maths background and (some) electronics background and NO management/collaboration background • Working ALWAYS in medium (>10)-large (<2000) research collaborations • DELPHI experiment at LEP in CERN (~500 physicists) • Long term collaboration: • 1985: building and setup of detector (10m. High) • 1989: first electron-positron collisions: there are ONLY 3 Neutrino generations in nature! • 1995: first secondary collision vertex reconstructed from three meter long tracks crossing with 10 micron precision in 3D • 2000: best world results in Higgs boson search • E-Infrastructure projects: CrossGrid, Interactive European Grid, EGEE, (~50-500 physicists & computer scientists) • CMS experiment at LHC in CERN (~2500 physicists) • Even longer term collaboration: • Since 1995: building and setup of detector • 2008: first positron-positron collisions • 2010-2015: results in Higgs boson search Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Collaboration: any answer? • An increasing problem: fragmentation of knowledge • Too many fields • Large information • Complex modeling • Why collaboration is so important? • Projects: big success in Industry and in Science • Add Globalization… • Why collaboration is so difficult? • who were Newton collaborators? • How do you understand collaboration for • Engineers • Scientists • How can we support collaboration in the (post)-Internet era? Join [distributed/multidisciplinary] forces to make a project REAL • Collaborative & managerial tools • share resources in an open framework • support interaction • recognize efforts and contributions • get REAL added value Where & What for was the WEB born? Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
What does this man do here? Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
A good example: Flood management • Problem: Flooding Crisis in Slovakia • Solution: • Monitoring • Forecasting • Simulation • Real-time actions Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Flood Management: Data, Models, Simulations • Precipitation forecasts based on meteorological simulations of different resolution from the meso-scale to the storm-scale. • For flash floods, high-resolution (1 km) regional atmospheric models have to be used along with remote sensing data (satellite, radar) • From the quantitative precipitation forecast, hydrological models are used to determine the discharge from the affected area. • Then hydraulic models simulate water flow through various river structures to predict the impact of the flood Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Computing intensive science • Science is becoming increasingly digital and needs to deal with increasing amounts of data • Simulations get ever more detailed • Nanotechnology –design of new materials from the molecular scale • Modelling and predicting complex systems (weather forecasting, river floods, earthquake) • Decoding the human genome • Experimental Science uses ever moresophisticated sensors to make precisemeasurements • Need high statistics • Huge amounts of data • Serves user communities around the world Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
INTERACTIVE EUROPEAN GRID i2g • Provide an advanced grid empowered infrastructure for scientific computing targeted to support demanding interactive and parallel applications. • Provide services to integrate computing resources into a grid • Coordinate the deployment, maintenance and operation of the grid infrastructure • Provide support for Virtual Organizations and resource providers • Coordinate resource providers and virtual organizations • Provide a development infrastructure for research activities • Test and validation of new middleware components • Ensure adequate network support Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Grid Operations Management Grid Infrastructure • 12 sites • 7 countries • ~ 1000 COREs • Xeon • Opteron • Pentium • ~ 77 TB of storage • Resources shared with other infrastructures • ~10 FTE Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
PRODUCTION DEVELOPMENT Grid Infrastructure • Two sets of sites • Production • 9 sites • Development • 4 sites Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
R-GMA for development • Security • coordination • Production Core • Services • CrossBroker • RAS • BDII • VOMS • LFC • MyProxy • APEL accounting • GridICE • R-GMA • SAM • Network • monitoring • Development Core • Services • CrossBroker • RAS • BDII • VOMS • LFC • MyProxy • Pure gLite WMS • Autobuild • Repository • Production Core • Services • CrossBroker • RAS • BDII • VOMS • LFC • MyProxy Services • 12 sites • 3 management centres • Core services • Distributed services • Taking advantage of the partners expertise • Redundancy • Better use of resources Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Capacity • CPU capacity is higher than technical annex • Storage is higher than technical annex Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Overall usage • Job submissions • Job submissions by job type Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Usage sites and users • Jobs per sites • Registered users Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
VO usage • Most active application VOs: • ienvmod • ihep • ibrain • ifusion • iplanck • iusct Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Added value: CrossBroker • MPI and interactive job scheduling • Schedule MPI jobs with a gLite compatible broker • Decoupled from the MPI implementation • Enables MPI inside clusters and across clusters • Select the best possible site or set of sites for running • Support for policy “extensions” in the information system • Enable interactivity transparently • Built-in support for i2g visualization and steering mechanisms • Priority for interactive jobs • Flexible support for interactive agents • Fast application start-up • Glide-ins for application fast startup execution • Agents are submitted together with jobs to enable injection of interactive applications on cluster nodes • Heart of the i2g workload management Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
PACX-MPI PACX-MPI Added value: CrossBroker User Interface Migrating Desktop User Friendly GUI RAS CrossBroker gLite catalogue gLite Myproxy gLite Info Index Parallel Parallel i2g MPI cluster Sequential Sequential i2g MPI cluster Standard gLite cluster OpenMPI i2g MPI cluster OpenMPI Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Batch job + glide-in Interactive Job Batch Interactive Added value: CrossBroker User Interface Migrating Desktop RAS CrossBroker CE i2g LRMS WN Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Added value: MPI • MPI support in gLite based grids • Enable a gLite cluster to run MPI jobs properly • MPI_START (developed by i2g) • Common layer to handle MPI application startup at the cluster level • Hide cluster and MPI implementation details from the user • Provide hooks for application management • Now adopted by EGEE and other infrastructures as the method to start MPI applications • MPI • OpenMPI implementation with excellent characteristics for grid computing (modular) • PACX-MPI running jobs across sites • Debug tools integrated in the i2g framework • Marmot • MPI-Trace • Support for MPI in PBS and SGE grid clusters • CE and LRMS changes and configuration Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
MPI JOB Parameters for MPI_START Added value: MPI User Interface Migrating Desktop RAS CrossBroker MPI_START aware Encapsulation of: MPI implementation LRMS and cluster details CrossBroker is MPI implementation independent CE i2g WN MPI_START LRMS MPIEXEC Application Injection of MPI_START is also possible for sites without MPI_START Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Added value: Interactivity • Take the control of your application • Application steering and graphical visualization • Powerful graphical visualization (GVID) • Application steering while running remotely (GVID) • Support for OpenMPI and PACX-MPI applications • All from an easy to use desktop (Migrating Desktop) • Interactive terminal • i2glogin and glogin • SSH like access fully compatible with gLite and GSI security • Secure, low-latency, bi-directional connectivity • Excellent for debugging and working remotely • Used to tunnel GVID and application steering • Can be used to tunnel other applications and data Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Added value: Tools • Assist in the infrastructure management • Accounting • Accounting portal • Support for jobtype, parallel type, interactive type accounting • Support for MPI accounting • Collect data from APEL and Resource Brokers • Monitoring • SAM (Service Availability Monitoring) tests development • PACX-MPI • OpenMPI • Interactivity • i2g software versions • VO specific tests • Other tests and tools • Verify SSH passwordless connectivity for MPI • Improve reliability • Replication methods for VOMS and LFC Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Advanced strategies for interactive jobs • Immediate execution • Jobs either start immediately or fail • Implemented for SGE and in production at LIP • Faster application startup at the CE level with pbs jobmanager instead of LCG jobmanager • Prioritization of jobs • Method to preempt batch jobs • Tested in PBS Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Interoperability • i2g is committed to interoperability • Fundamental to: • Enlarge the infrastructure and grab users • Enable easier porting • Share resources • Study infrastructure interoperability • Define deployment scenarios for the deployment of i2g middleware on top of gLite • Enable other VOs and sites to use i2g developments: • Enabling users from other VOs to access i2g resources • Enabling sites from other infrastructures to join the i2g infrastructure • Enabling other projects or VOs to deploy the i2g developments on gLite based infrastructures • Involvement with national grid initiatives Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
LFC REGISTRY LFC REGISTRY lcg-RB Top-BDII I2G CrossBroker Top-BDII I2G WN software I2G WN software I2G WN software gLite WN gLite WN gLite WN Migrating Desktop Batch Server SE Interoperability Int.EU.Grid Infrastructure EGEE Infrastructure I2G WN software gLite WN MPI, Visualization Local Services UI LCG-CE I2G WN software MonBox Site-BDII gLite WN Site-BDII I2G UI MonBox LCG-CE I2G UI software I2G CE software I2G CE software I2G WN software gLite WN Batch Server SE Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Development Support • Savannah at FZK • Repository SVN+CVS • Bugtracker • Autobuild • SL3 and SL4 • Development testbed • Developers guide • Middleware Validation Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Source repository Autobuild Development repository Development testbed Validation repository Production repository Production Infrastructure From development to deployment • Development support • Development guidelines • Repositories • Autobuild • Development infrastructure • Integration • Packaging • Installation scripts • Integration in release • Validation • Verify installation • Test functionalities • Deployment • Coordinate the sites • Ensure proper deployment Validation Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
User and site support • Wiki • Web pages • SA1 mailing lists • VRVS • Support team • Contributions from all partners • Contribution from JRA1 and NA3 Drop Helpdesk tool, not much used and not well accepted Concentrate on wiki Concentrate on mailing lists for support Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Security • Authentication based on IGTF CAs • Good contacts with national CAs • Coordination with EUgridPMA • Authorization based on VOMS • Fault tolerant setup • Security policies • Follow JSPG policies • VOs can have more strict policies • Active security • Developed and tested distributed IDS • Incident and vulnerability management • Tracking vulnerabilities • Security contacts • Coordination in case of intrusion Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Grid Operations Management • Infrastructure management • Tasks coordination • Sites coordinating • Services coordination • Coordination with other activities • Coordination with VOs • Ensure the quality • SAM: • I2G specific tests • OpenMPI, PACXMPI, interactivity… • VO specific tests • site notifications • GridICE: • monitoring for Production and development • separate R-GMA infrastructures • Accounting: • job type analysis, MPI accounting, … • getting information from brokers Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Conclusions • The i2g infrastructure was successful: • Supported multiple applications from multiple domains • Showed that: • MPI and interactivity can be well supported in grids • There is a wide range of applications that can be supported with grid computing instead of traditional HPC • Interoperability is possible and is a desired feature • It can be done on top of gLite but requires time, effort, dedication, patience and a very good and supportive team Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Conclusions • i2g achievements and legacy: • Middleware components enabling: • Interactivity and visualization • Parallel computing support • User friendly access to the infrastructure • All on top of gLite • Procedures and methods to: • Enable interoperability across infrastructures • Enable MPI and interactivity with gLite grids • Immediate job execution • Experience in deploying and running infrastructures for: • both sequential / parallel • batch / interactive • Tools to assist in the operation of such infrastructures • Example for others to follow • A production infrastructure that is fully operational ! Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt Applications porting and developmentMPI support • Two levels of support for MPI applications • Support to already existing MPI applications • Compiler support issues • Infrastructure oriented services • Application specific • Modify serial applications to be used in the grid environment • Parametric simulations (sweeping over parameter spaces) • Intra-cluster versus Inter-cluster MPI • It is a question of latencies • Some applications can be adapted to work in such environment
Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt Applications porting and developmentintra-cluster MPI support • Compiler support • glite offers only limited support to Fortran (F77) • We have extended the support to F90 (Intel Compilers) to avoid static compilations • Applications in F90 are very spreaded. • Infrastructure oriented support • Low latency Infiniband cluster integrated with gLite • Flags for detailed hardware configuration • Application oriented support (see demo session) • Scripts and hooks have been developed for parametric MPI simulations
Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt Applications porting and developmentDemos of OpenMPI support Reacflow Simulation of large scale explosions (eg. Hydrogen-air mixtures) • Sequential Version already existing • European Commision Joint Research Centers at Ispra and Petten. • C++ and Fortran 77 • MPI Parallelisation has been done as a joint effort between JRC Petten and GUP Linz • Adaptive Mesh Refinement • Dynamic Load Balancing inteugrid capabilities employed • OpenMPI support • Interactivity • Monitoring simulation progress • Using SEs to store output • Using mpi hooks to upload simulation input
Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt Applications porting and developmentPACX-MPI support Spin Glass using Parallel Tempering • Simulation of Heisenberg Spin Glass • Many replicas of the same system need to be simulated: MPI distributed • The temperature of the replicas is controlled and set periodically for all of them by a master process: Parallel Tempering algorithm. Intensity-Modulated Ration Therapy • Distribution of the Montecarlo Simulations for optimization of radiation dose using MPI. • All MPI processes are independent
Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt Applications porting and development Interactivity + Visualization Evolution of Pollution clouds In the atmopshere Uses: • Open MPI • Interactivity • Visualization • Integrated in MD Visualization of Plasma in Fusion Devices Uses: • Open MPI • Interactivity • Visualization • Integrated in MD Visualization of Gaussian runs Uses: • Interactivity • Visualization
Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt Ultrasound Computing Tomography Method for breast cancer detection Data are taken by an Ultrasound scanner The method is based on image reconstruction from the data User Requirements Matlab environment Speed up algorithm development Resource gathering Using Gridsolve Inteugrid middleware is used to send gridsolve agents (pilot jobs) to WNs Integrated within Migrating Desktop Applications porting and developmentInteractivity with Gridsolve
Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt Fast Job Allocation using Glidein Glidein provides a mechanism to share computing resources on a per VO basis If all the resources of a VO are occupied (no free CPUs) The user can still submit an interactive job and get inmediately a CPU shared with a batch job of the same VO Applications porting and developmentFast Job Allocation with Glidein Analysis of water Quality in reservoirs Uses: • Interactivity • Visualization
Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt Future and Sustainability • Strong points of the inteugrid infrastructure from the user point of view • Reliable support for MPI parallel jobs on the grid • Support for interactivity: makes daily work easier • Speeds up development work • Easy and direct access to grid resources for test purposes • Support for pilot jobs, enabling Grid RPC via GridSolve • The support for applications developed in inteugrid is being used in FP7 projects • DORII Deployment of Remote Infrastructures • Euforia EU for ITER Applications
Proyecto GRID-CSIC • Origen: • experiencia del CSIC en proyectos GRID • área de posible colaboración con el CNRS • Oportunidad (iniciativa nacional de e-Ciencia, NGI, EGI) • impulso conjunto desde VORI-VICYT • Objetivo: poner en marcha una infraestructura avanzada de computación distribuida que permita realizar proyectos de investigación que requieren capacidades que no están al alcance de un solo usuario o grupo de investigación. • En particular se espera potenciar proyectos multidisciplinares o entre varios centros en los que los investigadores necesitan simular, analizar, procesar, distribuir o acceder a grandes volúmenes de datos. • Ejemplos (e-Ciencia): • Experimentos de Física de Partículas (CDF, CMS, ATLAS, ILC…) • Fenomenología (Modelos SUSY) y Lattice • Misiones Espaciales (XMM, Planck…) • Observaciones Astronómicas • Modelado del Cambio Climático • Química computacional • Biocomputación Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Bases del proyecto • El proyecto está basado en la utilización de tecnología Grid, que permite compartir y acceder a recursos distribuidos geográficamente de forma transparente. En particular se propone utilizar un software intermedio, o middleware, que permita la interoperabilidad con infraestructuras Grid Europeas, cómo la del proyecto EGEE y la del proyecto i2g (este coordinado por el CSIC). • En particular la infraestructura desarrollada podrá ser compartida con la iniciativa IberGrid en desarrollo con Portugal, y con la infraestructura del Institut des Grilles del CNRS en Francia, con la que se establecerá un acuerdo de colaboración. • El proyecto implica el desarrollo de una capacidad total de computación estimada de unos 8.000 procesadores y de una capacidad de almacenamiento on-line de 1.000 Terabytes (1 Petabyte). • Esta infraestructura se pondrá en marcha en tres fases a lo largo de un periodo de tres años (2008, 2009, 2010): • En el primer año la fase piloto incluirá tres centros que cuentan ya con experiencia en este tipo de proyectos (IFCA, IFIC, e IAA) • La segunda fase de extensión incluirá centros en Madrid y Cataluña. • Por último la fase de consolidación completará el mapa de cobertura a nivel nacional. Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Estructura • El proyecto consta de tres áreas de trabajo: • Infraestructura • instalación y operación del equipamiento informático, y su integración en el entorno Grid. • Aplicaciones y desarrollo • apoyará la adaptación de las mismas y del software específico • Coordinación del proyecto • gestión, organización interna y difusión. • Equipo Inicial: Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Actividades • Actividades previstas: Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Planificación Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Planificación II Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Estado actual • Equipamiento primer año adquirido: • IFCA: • Computación: • IBM blades, 182 (dual quad: 1456 cores) • 70 + 14 con Infiniband • Conexiones a red 3 x 10G • Almacenamiento: • Cabinas Discos SATA (~175 Terabytes) • 4 servidores GPFS • IFIC • Computación: • HP + DELL • Almacenamiento: • SUN • IAA • Computación: • servidores IBM x3850 M2, con tecnología de 4ª generación X-Architecture, que permite escalar desde 4 hasta 16 procesadores (Intel Quad Core Xeon X7350), y hasta 1TB de memoria RAM en la configuración de 16 procesadores • Almacenamiento: • DELL • Instalación debe estar finalizada en Septiembre • Contratos de personal (1 titulado superior + 1 doctor) en marcha en Septiembre • Contacto con el CNRS establecido • Próxima edición del curso Grid y e-Ciencia en Santander (en el marco de la UIMP) Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt
Interoperabilidad • La infraestructura GRID-CSIC mantendrá la interoperabilidad con otras infraestructuras existentes de computación Grid, en particular con la de los proyectos europeos Interactive European Grid (i2g), EGEE-II, DORII, EUFORIA, y la de los proyectos nacionales Tier-2 de las colaboraciones ATLAS y CMS • Así como con la nueva iniciativa Grid nacional dentro de la Red Española de e-Ciencia, en la que el CSIC tiene un papel relevante (coordinando la infraestructura Grid) • Se espera además que esta iniciativa permita al CSIC participar de modo directo en la futura Infraestructura Grid Europea (EGI). Thanks to: Jorge Gomes, Isabel Campos, Rafael Marco, Jose Salt