250 likes | 420 Views
PNC 2007 Annual Conference Berkeley, 18 - 20 October 2007 Building and Operating Grid Infrastructures for e-Science Lessons Learned and Recommendations Wolfgang Gentzsch. What can we learn from our existing Service Infrastructures ? Water, Gas, Electrical Power, Transportation.
E N D
PNC 2007 Annual Conference Berkeley, 18 - 20 October 2007 Building and Operating Grid Infrastructures for e-Science Lessons Learned and Recommendations Wolfgang Gentzsch
What can we learn from our existingService Infrastructures ?Water, Gas, Electrical Power, Transportation • The Why: driving force, need, pain, lack, etc… • … or: desire for a ‘better’ life • The How: idea > prototype > architecture > implementation • The What: organizational and operational structure, market concepts, providers and consumers, business models (QoS, SLA, ROI, TCO), and so on . . . to finally result in a sustainable infrastructure
e-Infrastructure • Resources: Networks with computing and data nodes, etc. • Development/support of standard middleware & grid services • Internationally agreed AAA infrastructure • Discovery services and collaborative tools • Data provenance, curation and preservation • Open access to data and publications via interoperable repositories • Remote access to large-scale facilities: Telescopes, LHC, ITER, .. • Application- and community-specific portals • Industrial collaboration • Service Centers for maintenance, support, training, utility, applications, etc. Courtesy Tony Hey
Many Grid Projects: Grid5000
Before we started with D-Grid, we have studied other Grid Initiatives Initiative Time Funding, E People*)Users **) UK e-Science-I: 2001 - 2004 140M 900 Res. UK e-Science-II: 2004 - 2006 160M 1100 Res. Ind. TeraGrid-I: 2001 - 2004 70M 500 Res. TeraGrid-II: 2005 - 2007 120M *) 850 Res. ChinaGrid-I: 2003 - 2006 4M 400 Res. ChinaGrid-II: 2007 - 2010 4M *) 1000 Res. NAREGI-I: 2003 - 2005 25M 150 Res. NAREGI-II 2006 - 2010 40M *) 250 Res. Ind. EGEE-I: 2004 - 2006 30M 800 Res. EGEE-II: 2006 - 2008 35M 1000 Res. Ind. For Comparison: D-Grid-1: 2005 - 2008 25M 220 Res. D-Grid-2: 2007 - 2010 35M 220 (= 440) Res. Ind. D-Grid-3: 2008 - 2011 Ind. Res. *) estimated **) Res = Research, Ind = Industry
Before we started with D-Grid, we have studied other Grid Initiatives Initiative Time Funding People*)Users **) UK e-Science-I: 2001 - 2004 $180M 900 Res. UK e-Science-II: 2004 - 2006 $220M 1100 Res. Ind. TeraGrid-I: 2001 - 2004 $90M 500 Res. TeraGrid-II: 2005 - 2010 $150M 850 Res. ChinaGrid-I: 2003 - 2006 $4M 400 Res. ChinaGrid-II: 2007 - 2010 $5M *) 1000 Res. NAREGI-I: 2003 - 2005 $25M 150 Res. NAREGI-II 2006 - 2010 $40M *) 250 Res. Ind. EGEE-I: 2004 - 2006 $40M 800 Res. EGEE-II: 2006 - 2008 $45M 1000 Res. Ind. For Comparison: D-Grid-1: 2005 - 2008 $35M 220 Res. D-Grid-2: 2007 - 2010 $45M 220 (= 440) Res. Ind. D-Grid-3: 2008 - 2011 Ind. Res. *) estimated **) Res = Research, Ind = Industry Report available from www.RENCI.org
To 2, 3, 4: gLite Grid Middleware Access CLI API Security Information & Monitoring Authorization Auditing Information &Monitoring Application Monitoring Authentication Data Management Workload Management MetadataCatalog File & ReplicaCatalog JobProvenance PackageManager Accounting StorageElement DataMovement ComputingElement WorkloadManagement Site Proxy
To 6, 7, 8, Access: e.g. Biomedical Scenario • Bioinformatics scientists have to execute complex tasks • There is the need to orchestrate these services in workflows Tools Storage and Data Services (SOA) Computational Power Courtesy Livia Torterolo
Gridified Scenario Appl. Grid Portal / Gateway • Grid technology leverages both the computational and data management resources • Providing optimisation, scalability, reliability, faul tolerance, QoS,… Tools Storage and Data Services (SOA) Grid Computational Power Courtesy Livia Torterolo
Example D-Grid e-Infrastructure *) Building a National e-Infrastructure for Research and Industry • 01/2003: Pre-D-Grid Working Groups Recommendation to Government • 09/2005: D-Grid-1: early adopters, ‘Services for Science’ • 07/2007: D-Grid-2: new communities, ‘Service Grids’ • …/2008: D-Grid-3: Service Grids for research and industry • D-Grid-1: 25 MEuro > 100 Orgs > 200 researchers • D-Grid-2: 30 MEuro > 100 addl Orgs > 200 addl researchers and industry • D-Grid-3: Call in 2007 • Important: • Sustainable production grid infrastructure after the end of the funding • Integration of new communities • Evaluating business models (operational models) for grid services *) funded by the German Federal Ministry for Science and Education
D-Grid-12005 - 2008 Integration Project DGI User-friendly Access Layer, Portals Astro-Grid C3-Grid HEP-Grid IN-Grid MediGrid Textgrid WISENT ONTOVERSE WIKINGER Im Wissensnetz . . . . . . Generic Grid Middleware and Grid Services
D-Grid -1, -2, -32005 - 2011 Integration Project DGI-2 User-friendly Access Layer, Portals Astro-Grid C3-Grid HEP-Grid IN-Grid MediGrid Textgrid WISENT ONTOVERSE WIKINGER Im Wissensnetz Business Services, SLAs, SOA Integration, Virtualization . . . . . . Knowledge Management Generic Grid Middleware and Grid Services
D-Grid Middleware User Application Development and User Access GAT API GridSphere Plug-In UNICORE Nutzer High-levelGrid Services SchedulingWorkflow Management Monitoring LCG/gLite Data management Basic Grid Services AccountingBilling User/VO-Mngt Globus 4.0.1 Security Resourcesin D-Grid DistributedCompute Resources NetworkInfrastructur DistributedData Archive Data/Software
Die DGI-Infrastruktur (10/2007) 2.200 CPU-Cores, 800 TB Disk, 1.400 TB Tape D-Grid-Integrationsprojekt DGI 14
Challenges Sustainable Competitive Advantage TECHNICAL CULTURAL LEGAL & REGULATORY
Potential Grid Inhibitors • Sensitive data, sensitive applications (medical patient records) • Different organizations have different ROI • Accounting, who pays for what (sharing!) • Security policies: consistent and enforced across the grid ! • Lack of standards prevent interoperability of components • Current IT culture is not predisposed to sharing resources • Not all applications are grid-ready or grid-enabled • Open source is not equal open source (read the little print) • SLAs based on open source (liability?) • “Static” licensing model don’t embrace grid • Protection of intellectual property • Legal issues (privacy, national laws, multi-country grids)
Lessons Learned and Recommendations • During development, operation, the grid infrastructure should be modified and improved in large cycles only: all applications depend on this infrastructure ! • Continuity especially for the infrastructure part of grid projects is important. Therefore, funding should be available after the project, to guarantee services, support and continuous improvement and adjustment to new developments. • Interoperability: Use software components and standards from open-source and standards initiatives especially in the infrastructure and application middleware layer. • Close collaboration is mandatory between developers of the grid infrastructure and the applications to best utilize grid services and to avoid application silos. • Infrastructure should be user-friendly for easy adoption for new communities. The infrastructure group should offer installation/operation service and support. • Centers of Excellence should specialize on specific services, e.g. integration of new communities, grid operation, utility services, training, support, etc.
Lessons Learned and Recommendations • For complex projects (infrastructure and application projects), a management board (consisting of the leaders of the different projects) should steer coordination and collaboration among the projects. • New projects should utilize the general infrastructure, and focus on an application or on a specific service, to avoid complexity, re-inventing wheels, and building grid application silos. • Participation of industry has to be industry-driven. Push from outside, even with government funding, is not promising. Success will come only from real needs e.g. through existing collaborations with research and industry, as a first step. • Implement utility computing in small steps, enhancing existing service models moderately, testing utility models first as pilots. Often, today’s government funding models are counter-productive for utility services. • More Info: www.renci.org Publications Reports
… resulting in D-Grid-3 Call in 2007 • User-friendly access: intuitive, interactive, informative, participative, collaborative, collective => Portals und Web 2.0 • Community Service Grids: new application communities and service providers in research and industry; using the D-Grid platform as the basis; industrial consortium leader • Business Layer: Service Level Agreements; sustainable support of requirements of users in research and industry • Grid based Knowledge Layer: integration of content digital with suitable technologies and tools • Transformation of D-Grid into a sustainable service infrastructure for research and industry (DGI => DGI-2, gap-projects in agreements with DGI)
Challenge: D-Grid and Industry Grids vs SOA direction of technology adaptation researchactivity Grid industryinterest SOA department enterprise global
Challenge: D-Grid and User-Friendly AccessWeb 2.0: SciVee: YouTube for Scientists SciVee is about the free and widespread dissemination and comprehension of science. Created for scientists, by scientists, SciVee moves science beyond the printed word and lecture theater taking advantage of the internet as a communication medium where scientists have a place and a voice. SciVee is a collaboration between the - National Science Foundation - Public Library of Science - San Diego Supercomputing Center
disziplinübergreifende Werkzeuge und Infrastruktur Dienste-katalog, Service Registry Ontology Registry und Dienste Metadata Registry und Dienste Persistent Identifier Resolver Grid-/VO-Suche Info-Extraktion Visuali-sierung Repository Systeme LZA-Dienste Daten: Redundanz-vermeidung, Replikat- verwaltung Challenge: D-Grid and Knowledge Management „Informationsvermittlung“ Daten-Lebenszyklus-Management Diensteinfrastruktur Annotation und Referenzierung von Objekten und Objektteilen Courtesy Dr. Lossau
Challenge: D-Grid and new Application Communities • Sciences • Business • Healthcare • Education (K-20) • Sicial science, social systems • Arts and humanities • Web 2.0, from peer reviews to interactive masses • Grid service providers, Application service provider • Etc.
Challenge: Towards a Sustainable • Infrastructure for Science and Industry • D-Grid is the Core of the German e-Science Initiative • 3nd Call: Focus on Service Provisioning for Sciences & Industry • Close collaboration with: Globus Project, EGEE, Deisa, CrossGrid, CoreGrid, GridCoord, GRIP, UniGrids, NextGrid, …, EGI • Application and user-driven, not infrastructure-driven => NEED • Focus on implementation and production, not grid research, in a multi-technology environment (Globus, Unicore, gLite, etc) • Govt is (thinking of) changing policies for resource acquisition (HBFG ! ) to enable a service model
21th Century Grid Engine 20th Century Thank You ! Slides are available 19th Century Combustion Engine Steam Engine wgentzsch@d-grid.de Report is available at www.renci.org => Reports