D-Grid and DEISA: Towards Sustainable Grid Infrastructures

D-Grid and DEISA: Towards Sustainable Grid Infrastructures Michael GerndtTechnische Universität Münchengerndt@in.tum.de

Technische Universität München • Founded in 1868 by Ludwig II, Bavarian King • 3 campuses, München, Garching, Freising • 12 faculties: with 20.000 students, 480 professors, 5000 employers • 5 nobel price winners and thre winners who studied at TUM • 26 Diplom study programs, 45 Bachelor and master programs

Faculty of Informatics • 19 Chairs • 30 professors, 3000 students • 340 beginners in WS 04/05 • Computer science: diplom, bachelor, master • Information systems: bachelor, masters • Bioinformatics: bachelor, masters • Applied informatics, computational science and engineering

Thanks • Prof. Gentzsch: Slides on D-Grid • My students in Grid Computing: Slides on DEISA • Helmut Reiser from LRZ: Discussion on status of D-Grid and DEISA

e-Infrastructure • Resources: Networks with computing and data nodes, etc. • Development/support of standard middleware & grid services • Internationally agreed authentication, authorization, and auditing infrastructure • Discovery services and collaborative tools • Data provenance • Open access to data and publications via interoperable repositories • Remote access to large-scale facilities: Telescopes, LHC, ITER, .. • Application- and community-specific portals • Industrial collaboration • Service Centers for maintenance, support, training, utility, applications, etc. Courtesy Tony Hey

Biomedical Scenario • Bioinformatics scientists have to execute complex tasks • There is the need to orchestrate these services in workflows Tools Storage and Data Services (SOA) Computational Power Courtesy Livia Torterolo

Gridified Scenario Appl. Grid Portal / Gateway • Grid technology leverages both the computational and data management resources • Providing optimisation, scalability, reliability, fault tolerance, QoS,… Tools Storage and Data Services (SOA) Grid Courtesy Livia Torterolo Computational Power

D-Grid e-Infrastructure *) • Building a National e-Infrastructure for Research and Industry 01/2003: Pre-D-Grid Working Groups  Recommendation to Government 09/2005: D-Grid-1: early adopters, ‘Services for Science’ 07/2007: D-Grid-2: new communities, ‘Service Grids’ 12/2008: D-Grid-3: Service Grids for research and industry D-Grid-1: 25 MEuro > 100 Orgs > 200 researchers D-Grid-2: 30 MEuro > 100 addl Orgs > 200 addl researchers and industry D-Grid-3: Call 05/2008 • Important: • Sustainable production grid infrastructure after funding stops • Integration of new communities • Evaluating business models for grid services

Structure of D-Grid D-Grid PlenumAll Partners in Projects Steering CommitteeCG, DGI, Leader of technical areas informs reports Advisory BoardExternalExperts and Industry advises CoordinatesCooperation Community Grids reviews DGI: Integration Project

Integration Project DGI-2 D-Grid -1, -2, -3 2005 - 2011 User-friendly Access Layer, Portals Astro-Grid C3-Grid HEP-Grid IN-Grid MediGrid Textgrid WISENT ONTOVERSE WIKINGER Im Wissensnetz Business Services, SLAs, SOA Integration, Virtualization . . . . . . Generic Grid Middleware and Grid Services

DGI Infrastructure Project • Goals • Scalable, extensible, generic grid platform for the future • Longterm, sustainable grid operation, SLA-based • Structure • WP 1: D-Grid basic software components • large storage, data interfaces, virtual organizations, management • WP 2: Develop, operate and support robust core grid • resource description, monitoring, accounting, and billing • WP 3: Network (transport protocols, VPN) • Security (AAI, CAs, Firewalls) • WP 4: Business platform and sustainability • project management, communication and coordination

DGI Services, Available Dec 2006 • Sustainable grid operation environment with a set of core D-Grid middleware services • Central registration and information management for all resources • Packaged middleware components for gLite, Globus and Unicore and for data management systems SRB, dCache and OGSA-DAI • D-Grid support infrastructure for new communities with installation and integration of new grid resources • Help-Desk, Monitoring System and Central Information Portal

DGI Services, Dec 2006, cont. • Tools for managing VOs based on VOMS and Shibboleth • Prototype for Monitoring & Accounting for Grid resources, and first concept for a billing system • Network and security support for Communities (firewalls in grids, alternative network protocols,...) • DGI operates „Registration Authorities“, with internationally accepted Grid certificates of DFN & GridKa Karlsruhe • Partners support new D-Grid members with building their own „Registration Authorities“

DGI Services, Dec 2006, cont. • DGI will offer resources to other Communities, with access via gLite, Globus Toolkit 4, and UNICORE • Portal-Framework Gridsphere can be used by future users as a graphical user interface • For administration and management of large scientific datasets, DGI will offer dCache for testing • New users can use the D-Grid resources of the core grid infrastructure upon request

D-Grid Middleware User Application Development and User Access GAT API GridSphere Plug-In UNICORE Nutzer High-levelGrid Services SchedulingWorkflow Management Monitoring LCG/gLite Data management Basic Grid Services AccountingBilling User/VO-Mngt Globus 4.0.1 Security Resourcesin D-Grid DistributedCompute Resources NetworkInfrastructur DistributedData Archive Data/Software

Die DGI-Infrastruktur (10/2007) 2.200 CPU-Cores, 800 TB Disk, 1.400 TB Tape

HEP-Grid: p-p collisions at LHC at CERN Online System Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Caltech ~1 TIPS HPSS HPSS HPSS HPSS HPSS 1 TIPS is approximately 25,000 SpecInt95 equivalents Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz ~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec Tier 0 CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) Tier 1 France Regional Centre Germany Regional Centre Italy Regional Centre FermiLab ~4 TIPS ~622 Mbits/sec Tier 2 ~622 Mbits/sec Institute ~0.25TIPS Institute Institute Institute Physics data cache ~1 MBytes/sec Tier 4 Physicist workstations Image courtesy Harvey Newman, Caltech

AstroGrid

MediGRID - Data flow in Life Science Grids

Grid-based Platform for Virtual Organisations inthe Construction Industry BauVOGrid

BIS-Grid: Grid Technologies for EAI Grid-based Enterprise Information System Use of Grid technologies to integrate distributed enterprise information systems

D-Grid: Towards a Sustainable Infrastructure for Science and Industry • 3nd Call: Focus on Service Provisioning for Sciences & Industry • Close collaboration with: Globus Project, EGEE, Deisa, CoreGrid, NextGrid, … • Application and user-driven, not infrastructure-driven => NEED • Focus on implementation and production, not grid research, in a multi-technology environment (Globus, Unicore, gLite, etc) • Govt is (thinking of) changing policies for resource acquisition (HBFG !) to enable a service model

DEISA • What is DEISA? • Framework 6 project. • Consortium of leading SC centers in Europe. • What are the main goals? • To deploy and operate a persistent, production quality, distributed supercomputing environment with continental scope • To enable scientific discovery across a broad spectrum of science and technology. Scientific impact (enabling new science) is the only criterion for success. Distributed European Infrastructure for Supercomputer Applications

DEISA: Principal Project Partners Project director Executive committee IDRIS-CNRS, Paris (F) Prof. Victor Alessandrini FZJ, Jülich (D) Dr. Achim Streit SARA, Amsterdam (NL) Dr. Axel Berg RZG, Garching (D) Dr. Stefan Heinzel ECMWF, Reading (GB) (Weather forecast) Mr.Walter Zwieflhofer CINECA, Bologna (I) Dr. Sanzio Bassini LRZ, München (D) Dr. Horst-Dieter Steinhoefer EPCC, Edinburgh (GB) Dr. David Henty BSC, Barcelona (ESP) Prof. Mateo Valero CSC, Espoo (FIN) Mr. Klaus Lindberg HLRS, Stuttgart, (D) Prof. Michael Resch

DEISA: Principal Project Partners 2004 2006 2008 • DEISA deployed UNICORE • Enabler for transparent access to distributed resources • Allows high performance data sharing at a continental scale as well as transparent job migration across similar platforms • A virtual dedicated 1 Gb/s internal network provided by GEANT • Deployment of a co-scheduling service • Synchronizing remote supercomputers • Allowing high performance data transfer services across sites. • Evolving star-like configuration 10 Gb/s Phase2 network.

DEISA Architecture DEISA environment incorporates different platforms and operating systems: AIX distributed Super clusters • IBM Linux on PowerPC, IBM AIX on Power4-5 • SGI Linux on Itanium • NEC vector systems Since 2007 DEISA infrastructure's aggregated computing power is close to 190 Teraflops. Vector Systems Linux clusters The GRID file system allows storage of data in heterogeneous environments avoiding data redundancy.

DEISA Operation and Services • Load balancing the computational workload across national borders • Huge, demanding applications are run by reorganizing the global operation in order to allocate substantial resources in one site • Runs “as such” with no modification. This strategy only relies on network bandwidths, which will keep improving in the years to come.

UNICORE Portal Jobs may be attached to different target sites and systems Dependencies between tasks / subjobs

UNICORE Infrastructure

DEISA – Extreme Computing Initiative • Launched in May 2005 by the DEISA Consortium, as a way to enhance its impact on science and technology. • Applications adapted to the current DEISA Grid • International collaborations involving scientific teams. • Workflow applications involving at least two platforms. • Coupled applications involving more than one platform. • The Applications Task Force (ATASKF) was created in April 2005. • It is a team of leading experts in high performance and Grid computing whose major objective is to provide the consultancy needed to enable the user’s adoption of the DEISA research infrastructure.

Common Production Environment • DEISA Common Production Environment (DCPE) • Defined and deployed on each computer integrated in the platform. • The DCPE includes: • shells (Bash and Tcsh), • compilers (C, C++, Fortran and Java), • libraries (for communication, data formatting, numerical analysis, etc.), • tools (debuggers, profilers, editors, batch and workflow managers, etc.), • and applications. • Accessible via the module command • list, load and unload each component.

Monitoring: The Inca System • The Inca system provides user-level Grid monitoring • Periodic, automated testing of the software and services required to support persistent, reliable grid operation. • Collect, archive, publish, and display data.

Potential Grid Inhibitors • Sensitive data, sensitive applications (medical patient records) • Different organizations have different ROI • Accounting, who pays for what (sharing!) • Security policies: consistent and enforced across the grid ! • Lack of standards prevent interoperability of components • Current IT culture is not predisposed to sharing resources • Not all applications are grid-ready or grid-enabled • SLAs based on open source (liability?) • “Static” licensing model don’t embrace grid • Protection of intellectual property • Legal issues (privacy, national laws, multi-country grids)

Lessons Learned and Recommendations • Large infrastructure update cycles • During development and operation, the grid infrastructure should be modified and improved in large cycles only: all applications depend on this infrastructure ! • Funding required after project • Continuity especially for the infrastructure part of grid projects is important. Therefore, funding should be available after the project, to guarantee services, support and continuous improvement and adjustment to new developments. • Interoperability • Use software components and standards from open-source and standards initiatives especially in the infrastructure and application middleware layer. • Close collaboration of Grid developers and users • Mandatory to best utilize grid services and to avoid application silos.

Lessons Learned and Recommendations • Management board steering collaboration • For complex projects (infrastructure and application projects), a management board (consisting of the leaders of the different projects) should steer coordination and collaboration among the projects. • Reduce re-invention of the wheel • New projects should utilize the general infrastructure, and focus on an application or on a specific service, to avoid complexity, re-inventing wheels, and building grid application silos. • Participation of industry has to be industry-driven. • Push from outside, even with government funding, is not promising. Success will come only from real needs e.g. through existing collaborations with research and industry, as a first step.

D-Grid and DEISA: Towards Sustainable Grid Infrastructures