120 likes | 250 Views
DEISA : integrating HPC infrastructures in Europe Prof. Victor Alessandrini va@idris.fr. D istributed E uropean I nfrastructure for S upercomputing A pplications - Consortium. IDRIS – CNRS , France ( coordinator ) FZJ – Juelich , Germany RZG – Garching , Max Planck Society, Germany
E N D
DEISA : integrating HPC infrastructures in EuropeProf. Victor Alessandrini va@idris.fr
Distributed European Infrastructure for Supercomputing Applications - Consortium • IDRIS – CNRS, France (coordinator) • FZJ – Juelich, Germany • RZG – Garching, Max Planck Society, Germany • CINECA, Italy • EPCC, Edinburgh, UK • CSC, Helsinki, Finland • SARA, Amsterdam, The Netherlands • ECMWF (European Organization), Reading, UK
DEISA : mission statement • To contribute to a significant enhancement of capabilities and capacities of high performance computing (HPC) in Europe, by the integration of leading national supercomputing infrastructures. • To deploy and operate a distributed multi-terascale European computing platform, based on a strong coupling of existing national supercomputers. DEISA plans to operate as a virtual European supercomputing centre. • To contribute to the deployment of an extended, heterogeneous Grid computing environment for HPC in Europe, needed to interface the DEISA research infrastructure with the rest of the European IT infrastructures.
Strategic vision • DEISA is based on a deep integration and tightly coupled operation of high end computational resources, rather than a loose federation model. • Scientific impact (enabling new science) is the only criterion for success. • Integration of IT systems is mainly a strategic issue. Technology choices follow from the business and operational models of virtual organizations. • This is why DEISA puts forward an innovative vision for the integration of high end computing systems: promoting cluster software (global file systems, batch managers) to distributed super-cluster middleware (distributed global file systems, multi-cluster batch managers). • DEISA technology choices are fully open. DEISA is not tied to any specific pre-established technology.
The DEISA facility Dedicated bandwidth network. GEANT, RENATER, DFN, GARR, … National supercomputing platforms: IDRIS - France JULICH - Germany GARCHING - Germany CINECA - Italy … SARA – The Netherlands CSC - Finland Extended Grid services : Portals, Web-like services, … Interfacing the core platform to other virtual organizations. Hiding complex environments from end users
DEISA – Phase 1 (Q1 2005) 4000 processors (5 to 8 Gf per processor) 24 Teraflops integrated peak performance 125 cabinets spread over 3 countries (Germany, France, Italy) IBM systems : 690, 690+, 655+ Diversified configurations Phase 2 ( Q3 2005): Incorporation of other sites (CSC, SARA) and heterogeneous extension (SARA Linux ALTIX SGI platform to start, vector platforms, …) Higher VPN bandwidth (10 Gb or more across platforms). The Geant2 infrastructure is critical for DEISA.
The DEISA super-cluster (phase 1) Global, high performance, distributed file system with continental scope (GPFS). Dynamic pool of resources. VPN connecting computing platforms on NRENs – GEANT infrastructures.
Operational model • DEISA provides an integrated supercomputing environment, with efficient data sharing through high performance global file systems. This is highly transparent to end users. • DEISA enables job migration across sites (also transparent to end users). Exceptional resources for very demanding applications are made available by the operation of the global resource pool. We are load balancing computational workload at a European scale. • Huge, demanding applications can be run “as such”. • Support of Grid applications (which are distributed by design). • With this operational model, the DEISA super-cluster is not very different from a “true” monolithic European supercomputer (which must be partitioned in any case for fault tolerance and QoS). • The main difference comes from the coexistence of several independent administration domains. This requires, as in TeraGrid, coordinated production environments.
Managing the European resource pool • Each “core” partner contributes initially 10-15% of its computing capacity to a common resource pool. This pool benefits from a DEISA global file system. • Sharing model is based on simple exchanges: on the average each partner recovers as much as he contributes. This leaves the different business models of the partners organizations unchanged. • The pool is dynamic: in 2005, computing nodes will be able to joint or leave the pool in real time, without disrupting the national services. The pool can therefore be reconfigured to match users requirements and applications profiles. • Each DEISA site is a fully independent administration domain, with its own AAA policies. The VPN connects computing nodes – not sites. A network of trust can be established, to operate the pool. • Some DEISA services require replication of users accounts across sites. But most of them don’t. They are transparent to end users, an only require actions from system administrators.
Resource allocation - 1 • For users belonging to member organizations: • Each national supercomputing centre has a body acting as a Scientific Council that performs a scientific evaluation of the users demands, and that supervises the resource allocation. • The national Scientific Councils will select the projects that will carry the DEISA label, with access to the services of the distributed core infrastructure. • International collaborations will require the support of the Scientific Councils of all the organizations involved. • Notice however that the DEISA sites may use the global pool for the migration of non-DEISA jobs, if this is needed to create in a particular site a specific production environment for a DEISA project.
Resource allocation - 2 • For users belonging to external organizations: • DEISA partners reserve a fraction of computing resources – between 0.5 % to 3 % - for allocation to external partners (other organizations, industries, …). • Allocation of computing resources to external partners will be decided by the DEISA Executive Committee assisted by the DEISA Advisory Scientific Committee. • The Advisory Scientific Committee will supervise the whole scientific activity of the infrastructure, integrating of course the policies emanating from the national Scientific Councils.
Conclusions • DEISA adopts Grid technologies to integrate national supercomputing infrastructures. • This includes service activities supported by the coordinated action of the national center's staffs. DEISA operates as a virtual European supercomputing centre. • This is one possible strategy to contribute to the development of HPC for science and technology in Europe, based on existing national investments and human know how (which is the most critical IT resource today). • The big challenge ahead is to demonstrate the relevance of this strategy for the production of first class computational science. • “The proof of the pudding is in the eating”. Measurable scientific impact is the only criterion of success.