670 likes | 684 Views
Explore the impact of grid computing on science and the role of CERN as a driving force. Understand different definitions of grid computing and the challenges of distinguishing significant developments from the hype.
E N D
Current trends in Grid computing Dobre Ciprian Mihai cipsm {at} cs.pub.ro “The Internet is about getting computers to talk together; grid computing is about getting computers to work together.” (from IBM’s Grid definition)
Outline of the presentation • What is Grid computing – sorting out the alphabet soup. • Impact of Grid computing to science. • CERN as a driving force in Grid computing. • Grids – where to ?
What is Grid ? • Many definitions of Grid computing • Term coined as analogy to electrical power grid • According to Ian Foster, the “father of grid computing”, the term grid has been hijacked to “embrace everything from advanced networking to artificial intelligence” • Marketers are applying grid labels to all sorts of products and services, adding to the confusion and hype • “From the wide ranging definitions of Grid, to the volume of standards bodies and organizations -- it can be a real challenge to distinguish the significant developments from the hype.” (Ian Foster, 2005)
Ian Foster’s Evolving definitions GGF: “A system that is concerned with the integration, virtualization, and management of services and resources in a distributed, heterogeneous environment that supports collections of users and resources (virtual organizations) across traditional administrative and organizational domains (real organizations).” SUN: “A way of managing and dynamically sharing disparate sets of resources” A hardware and software infrastructure that connects distributed computers, storage devices, databases, and software applications through a network, and is managed by distributed resource management software A dependable, universal information infrastructure that builds on the power of the Internet and enables more efficient computation, collaboration, and communication “ A Grid is a large, heterogeneous, system that coordinates resources spread over wide ares “ • “A computational grid is a hardware and software infrastructure that provides dependable, pervasive, and inexpensive access to high-end computing capabilities” • Ian Foster and Carl Kesselman, editors, “The GRID: Blueprint for a New Computing Infrastructure”, Morgan-Kaufman Publishers, 1999. • “The grid infrastructure consists of protocols, application programming interfaces, and software development kits to provide authentication, authorization, and resource location/access” • 2001: Foster, Kesselman, Tuecke: “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, http://www.globus.org/research/papers.html • “The grid integrates services across distributed, heterogeneous, dynamic ‘virtual organizations’ formed from the disparate resources within a single enterprise and/or from external resource sharing and service provider relationships in both e-business and e-science” • 2002: Foster, Kesselman, Nick, Tuecke: “The Physiology of the Grid”, http://www.globus.org/research/papers/ogsa.pdf IDC: “Set of independent computers combined into unified system through systems software and networking technologies” “ A Grid is a heterogeneous system that allows multiple entities to share and use resources, under various administrative policies,offering a transparent access to the user “ “ A Grid is a large, heterogeneous, system that allows sharing and coordinating resources in a dependable and pervasive manner “ CoreGRID: “A fully distributed, dynamically reconfigurable, scalable and autonomous infrastructure to provide location independent, pervasive, reliable, secure and efficient access to a coordinated set of services encapsulating and virtualizing resources (computing power, storage, instruments, data, etc.) in order to generate knowledge.” IBM: “ability, using a set of open standards and protocols, to gain access to applications and data, processing power, storage capacity, and a vast array of other computing resources over the Internet” Grid computing is a network of computation: tools and protocols for coordinated resource sharing and problem solving among pooled assets Application processing, distributed across multiple locations, and interconnected through a shared network such as the Internet “ A Grid is a heterogeneous system spreadover a wide geographical area, which allows multiple entities to share and use resources, under various administrative policies,offering a transparent access to the user, throughthe use of consistent access protocols and interfaces “ Gartner: “a collection of resources owned by multiple organizations coordinated in such a way as to allow them to solve a single common problem.”
Why so many definitions? • Computer science and software engineering sometimes do not have definitions as strict as those in the fields of physics or mathematics – this “lack of definitions” leads to many Grid researchers or people working with Grid technology having different views on what a Grid is. • Hardware discrepancies: for some a local cluster with a middleware system on top is a Grid whereas others believe that a wide-are network connection has to be involved. • Software problems: What actually makes a piece of software a “Grid software”? Is any kind of middleware using Grid security already Grid software? • Due to the recent advanced in Web and Grid service technologies, where to draw the line between Web services and Grid services?
So what is Grid after all? • In this Soup of grid definitions there are two that were widely accepted by the community: I. Foster, Research view: • “A Grid is a system that (1) coordinates resources that are not subject to centralized control (2) using standard, open, general-purpose protocols and interfaces (3) to deliver nontrivial qualities of service” A. Grimshaw, Industry view: • “From a hardware perspective, a Grid is a collection of distributed resources connected by a network. From a user perspective a Grid gathers together resources and makes them accessible in a secure manner to users and applications”
Describing the elephant A Grid infrastructure must provide a set of technical capabilities: • Resource modeling – describes available resources, their capabilities, and the relationships between them to facilitate discovery, provisioning, and quality of service management. • Monitoring and notification – provides visibility into the state of resources to enable discovery and maintain quality of service. Logging of significant events and state transitions is also needed to support accounting and auditing functions. • Allocation – Assures quality of service across an entire set of resources for the lifetime of their use by an application. This is enabled by negotiating the required level(s) of service and ensuring the availability of appropriate resources through some form of reservation—essentially, the dynamic creation of a service-level agreement. • Provisioning, life-cycle management, and decommissioning - enables an allocated resource to be configured automatically for application use, manages the resource for the duration of the task at hand, and restores the resource to its original state for future use. • Accounting and auditing - tracks the usage of shared resources and provides mechanisms for transferring cost among user communities and for charging for resource use by applications and users.” • In addition to that security is an important aspect. • Foster, Tuecke, “Describing the elephant: the different faces of IT as services”, ACM Queue, 2005.
The two key Grid computing groups • The Globus Alliance (www.globus.org) • Composed of people from: Argonne National Labs, University of Chicago, University of Southern California Information Sciences Institute, University of Edinburgh and others. • OGSA/I standards initially proposed by the Globus Group • Based off papers “Anatomy of the Grid” & “Physiology of the Grid” • The Global Grid Forum (www.ggf.org) • First meeting in June of 1999, Based off the IETF charter • Heavy involvement of Academic Groups and Industry (e.g. IBM Grid Computing, HP, United Devices, Oracle, UK e-Science Programme, US DOE, US NSF, Indiana University, and many others) • Meets three times annually • Solicits involvement from industry, research groups, and academics
More on Grids • The Grid relies on advanced software, called middleware, which ensures seamless communication between different computers and different parts of the world. • The Grid search engine finds the data the scientist needs, but also the data processing techniques and the computing power to carry them out. • It then distribute the computing task to wherever in the world there is spare capacity, and send the result to the scientist. • Why use the Grids? • Industrial and academic partners form an “extended enterprise” in which resources are intrinsically distributed, and only partially shared. • Partners may be prepared to share data, but not the hardware and proprietary software that produces the data.
Grid-like Vision • In 1969, Leonard Kleinrock, one of the chief scientists of the original ARPA project which seeded the Internet, wrote: • "As of now, computer networks are still in their infancy, but as they grow up and become sophisticated, we will probably see the spread of "computer utilities", which, like present electric and telephone utilities, will service individual homes and offices across the country“ • Despite major advances in hardware and software systems over the past 35 years, we are yet to realize this vision. How far are we still from delivering computing as a utility? • Let us look into the ICT evolution and project the future.
* HTC * P2P * PDAs Minicomputers * * PCs * Workstations * Mainframes * Grids COMPUTING * PC Clusters * Computing as Utility * Crays * MPPs * WS Clusters * XEROX PARC worm * e-Science * e-Business * IETF * W3C * TCP/IP Communication * Ethernet * HTML * Mosaic * Web Services * Email * Sputnik * SocialNet * Internet Era * WWW Era * XML * ARPANET 2010 1960 1970 1975 1980 1985 1990 1995 2000 Control Decentralised Centralised Computing and Communication Technologies Evolution: 1960-2010!
2100 2100 2100 2100 2100 2100 2100 2100 2100 Computing is Scaling: Towards Inter-Planetary Level SERV ICES + PERFORMANCE Administrative Barriers • Individual • Group • Department • Campus • State • National • Globe • Inter Planet • Universe Personal Device SMPs or SuperComputers Global Grid Inter Planet Grid Local Cluster Enterprise Cluster/Grid
A little bit more… • Benefits for Science: • More effective and seamless collaboration of dispersed communities, both scientific and commercial • Ability to run large-scale applications comprising thousands of computers, for wide range of applications. • Transparent access to distributed resources from your desktop, or even your mobile phone • The term “e-Science” has been coined to express these benefits – the application domain “Science” of Grid & Web • Impact : e-Science From the EPSRC e-Science web site: "In the future, e-Science will refer to the large-scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualisation back to the individual user scientists."
Healthy, Wealthy, and Wise? • e-Health: electronic patient records, distributed and/or remote diagnosis, collaborative surgical planning. • e-Business: streamline, distribute, and enhance business processes. • e-Commerce: use the Grid as a marketplace for both traditional and innovative goods and services. • e-Learning: remove barriers to education and training. • Grid applications for Science: • Medical/Healthcare(imaging, diagnosis and treatment ). • Bioinformatics(study of the human genome and proteome to understand genetic diseases). • Nanotechnology (design of new materials from the molecular scale). • Engineering(design optimization, simulation, failure analysis and remote Instrument access and control). • Natural Resources and the Environment(weather forecasting, earth observation, modeling and prediction of complex systems)
Started far apart in applications & technology Grid GT1 GT2 OGSi WS-I Compliant Technology Stack Have been converging WSRF BPEL WS-* WSDL, SOAP XML HTTP Web Grid and Web Services Convergence • Definition of Web Service Resource Framework (WSRF) makes explicit distinction between “service” and stateful entities acting upon service i.e. the resources • Means that Grid and Web communities can move forward on a common base!!!
Grid and Web Services • The Globus Grid Forum (GGF) standard was (2004) divided into: • Open Grid Services Architecture (OGSA) • Defines standard mechanisms for creating, naming, and discovering Grid service instances. • Addresses architectural issues relating to interoperable Grid services. • An open, service-oriented architecture (SOA): resources as first-class entities, dynamic service/resource creation and destruction • Built on a Web service infrastructure • Resource virtualization at the core • Build grids from small number of standards-based components (replaceable, coarse-grained) • Customizable: Support for dynamic, domain-specific content… within the same standardized framework • Described in “The Physiology of the Grid” http://www.globus.org/research/papers/ogsa.pdf • Open Grid Services Infrastructure (OGSI) • It was based upon Grid Service specification. It specifies the way clients interact with a grid service (service invocation management, data interface, security interface, ...). • In the new draft (2005-06) some mandatory specifications of OGSI are merged with OGSA and new WSRF is introduced (GT4) • WSRF : Web Services Resource Framework : defines a generic and open framework for modeling and accessing stateful resources using web services
The core elements of the Open Grid Services Architecture This layer eliminated in recent version of standard
Webservices Resources Virtualizing Resources Access Type-specific interfaces Storage Sensors Applications Information Computers Common Interfaces Resource-specific Interfaces
Grid middleware services Virtualized resources A Service-Oriented Grid Job-Submit Service Registry Service Advertise Brokering Service Notify CPU Resource ComputeService DataService ApplicationService Printer Service
CERN? • CERN is: • ~ 2500 staff scientists (physicists, engineers, …) • Some 6500 visiting scientists (half of the world's particle physicists) • They come from • 500 universities • representing • 80 nationalities. • CERN is the world's largest particle physics centre • Particle physics is about: • elementary particles which all matter in the Universe is made of • fundamental forces which hold matter together • Particles physics requires: • special tools to create and study new particles • With its 27 km circumference, the LHC accelerator will be the largest superconducting installation in the world.
Computing @ CERN • Latest trend is to federate national Grids to achieve a global Grid infrastructure – High Energy Physics is a driving force to this. • High-throughput computing based on reliable “commodity” technology • LHC Data Analysis requires a computing power equivalent to ~100,000 of today’s fastest PC processors ! • More than 2500 dual processor PCs • About 3 million Gigabytes of data on disk and tapes • PROBLEM: nowhere near enough! • SOLUTION: use the Grid to unite computing resources of particle physics institutes around the world. • CERN leads two major global Grid projects: • WLCG: World-wide LHC Computing Grid Collaboration • EGEE: Enabling Grid for E-sciencE project for all sciences • WLCG: All the Institutions participating in the provision of the Worldwide LHC Computing Grid with a Tier-1 and/or Tier-2 Computing Centre form the WLCG Collaboration. • The LHC Computing Grid project launched a service with 12 sites in 2003. Today 200 sites in 30 countries with 16,000 PCs.
Computing @ CERN • The LCG architecture consists of an agreed set of services and applications running on the Grid infrastructures provided by the LCG partners. • These infrastructures at the present consist of those provided by the Enabling Grids for E-sciencE (EGEE) project in Europe, the Open Science Grid (OSG) project in the U.S.A. and the Nordic Data Grid Facility in the Nordic countries. • Grid3 was the start-up of OSG • The LCG Project builds and maintains computing infrastructure for LHC experiments • Original (’02) LCG plan: “The LCG is not a middleware project” • Was to be delivered... too little, too late • Feature set, performance, scalability disappointing • New (’04) plan: Middleware “re-engineering” as part of the LCG program, in collaboration with EGEE
EGEE launched in 2004, already supports 20 applications in six scientific domains (biomedicine, geophysics, quantum chemistry…) EGEE brings together scientists and engineers of 90 institutions In over 30 countries worldwide To provide seamless GRID infrastructure for e-Science Available 24 h/day x 7days/week Funded by EU (European Commission) Two original scientifically fields: HEP and Life Sciences; but it integrates many other fields: from Geology up to Computing Chemistry Infrastructure: 30.000 CPUS , 5 PBbytes storage, 200 sites in 39 countries, 60 Virtual Organizations Maintains 10.000 concurrent jobs on average EGEE-II: Fast description of the project
Local “metacomputers“ • Distributed file systems • Site-wide single sign-on • "Metacenters" explore inter-organizational integration • Totally custom-made, top-to-bottom: proofs of concept 1 We are here! • Utilize software services and communications protocols developed by grid projects: • Condor, Globus, UNICORE, Legion, etc. • Need significant customization to deliver complete solution • Interoperability is still very difficult! 2 • Common interface specifications support interoperability of discrete, independently developed services • Competition and interoperability among applications, toolkits,and implementations of key services 3 Three Generations of Grid Standardization is key for third-generation grids! Source: Charlie Catlett
Grids – Where to ? • The commercial interest in Grids systems and related technologies is increasing. • Companies such as Sun Microsystems, IBM, Oracle, Intel, Microsoft, HP show particular interest in getting a piece of the $12 billion market predicted by IPC for 2007 (according to IDC).
Grids – Where to ? • After the year 2007, business popularity of Grid computing is expected to accelerate: • Especially, the financial services and ERP services is expected to take major parts in the expense (Source: Insight Research Corp.) Billions
Grids – Where to ? • An interesting prediction (the 451 Group analysts) is that grid technology will be slowly absorbed into enterprise fabrics… • One consequence for grid computing might be that term grid computing "will become both more relevant and less used […] It will be more relevant as grids are used to support far more than HPC tasks, but less used as vendors seek to be associated with far more activity, and far higher up the stack, than grid computing." • IBM and Oracle could drop "grid" from their products in favour of a broader term, while Microsoft has made it very clear that it will not use the term “grid”. • In the new era of Grid computing grids must support automated data, storage and service activities just as capably as handling computational tasks. • These challenges are being addressed by a new paradigm called “Grid 2.0”
Grids – Where to ? • Grid 1.0 – concerned with the virtualization, aggregation and sharing or compute resources • Grid 2.0 – focused on the virtualization, aggregation and sharing of all compute, storage, network and data resources • The key term is “virtualization” (encapsulation behind a common interface of diverse implementations) is being driven by the need to various enterprises to create a virtual resource market to allocate resources based on business demand. • Virtualization introduces a layer of abstraction: instead of having to snoop out what resources are available and try to adapt a problem to use them, a user can describe a resource environment (virtual workspace) and expect it to be deployed on the grid. The mapping between the physical resources and the virtual workspace will be handled using virtual machines, virtual appliances, distributed storage facilities and network overlays (“virtual grids”). • The promise is that in Grid 2.0 the resources will be easier to define, test, install, transport and adjust on demand.
Web 1.0 Napster Britannica On Line Akamai MP3.com Double Click Content Management Web 2.0 Google Wikipedia BitTorrent iTUNES or Napster Adsense Wikis Web 2.0 By Example Tim O’Reilly
Google Earth™ a Mega API for Web 2.0 • Illustrates the Benefits of SOA and GRID with a Web 2.0 Delivery Model • Distributed, re-usable core services on shared infrastructure • Shared data • Exposed interfaces • Application is streamed to client and works offline
Wikipedia is a Collaborative Dictionary Being Edited in Realtime by Anyone
Grid 2.0 Emerging • Grid 2.0* • Virtualized Compute, Storage, Network, Data • Service Oriented • Policy Driven Automation • Distributed across firewalls • Parallel, stateless, stateful and transactional apps SOA Software Services with SLA & QoS Metrics Grid 1.0 Compute Intensive Cycle Aggregation Virtualization Consolidation of Resources *The 451 Group: 'grid 2.0' is focused on the virtualization, aggregation and sharing of all compute, storage, network and data resources. It is both Service-oriented and automated.
Virtualization • Virtualization covers both, data (flat files, databases etc.) and computing resources. • Grid as workflow virtualization — the Grid computing services are used to execute and manage processes across multiple compute platforms. • Data Grid as data virtualization — the management of shared collections independently of the remote storage systems where the data is stored. • Semantic Grid as information virtualization — the ability to reason on inferred attributes from multiple independent information repositories. • Name space virtualization, logical names for resources, users, files, and metadata that are independent of the name spaces used on the remote resource. • Trust virtualization, the ability to manage authentication and authorization independently of the remote resource. • Constraint virtualization, the ability to manage access controls independently of the remote resource. • Access virtualization, the ability to port an arbitrary access mechanism on top of the Grid middleware. For Data Grids, this is the ability to support access through multiple loadable libraries, Java, Digital libraries, workflow actors, Web browsers, etc. • Network virtualization, the ability to manage transport in the presence of network devices such as firewalls, load levelers, private virtual networks. This typically requires multiple protocols to support client-initiated versus server-initiated I/O, bulk operations versus single-file operations. • Latency management, the ability to minimize the number of messages sent over wide area networks. Examples include execution of procedures at the remote resource when the complexity (ratio of operations to bytes transmitted) is sufficiently small. The standard case is data filtering or sub-setting. • Federation, the ability to interoperate across multiple grid environments. This requires the ability to share logical name spaces, and Shibboleth-style authentication. Grids establish trust mechanisms to allow assertions about the authenticity of an individual to be verified from the “home” Grid.
So, are we there yet ? • Will the Grid be available to all of you ? Hard to predict… Jules Piccard, a professor at the University of Basel, installed the first telephone in the city, around 1880, between his home and his institute. He showed it proudly to other scientists and got the comment:“Looks very good, but I doubt it will ever have any practical use”. "There is absolutely no need for a computer in the home"attributed to Ken Olsen, DEC (once a leading minicomputer manufacturer) "The world will only need five computers"attributed to Thomas J. Watson, IBM "640 kilobytes is all the memory you will ever need"attributed to Bill Gates, Microsoft
So, are we there yet ? • The complete success of the Grid hype depends on at least three conditions: • The Grid can be considered a success when there are no more “Grid papers”, but only a footnote in the work that states, “This work was achieved using the Grid”. • The Grid can be considered a success when supercomputer centres don't give a user the choice of using their machines or using the Grid, they just use the Grid. • The Grid can be considered a success when a SuperComputing demo can be run any time of the year. We are not yet there…
What’s holding us ? • Organizational politics act very much like a barrier to implementing Grid computing: • “server-hugging” – organizations have a sense of ownership over the resources bought or allocated for their use. • unrealistic expectations from Grid computing – marketing departments have run amuck and have marketed the grid “nirvana” and not the grid that exists and is possible today. • perceived loss of control or access over resources. • loss or reduction of budget dollars. • lack of data security among departments. • fear of external data leaks, • reduced priority of projects - sometimes users believe that they need dedicated IT resources to complete their work accurately and efficiently. • risks associated with enterprise-wide deployment - how do different geographies and cultures come together to agree on global priorities, configurations, standards, and policies.
In the end… • One of the biggest fears for Grid computing is that it might be seen as today’s sexy technology that will quickly get replaced by tomorrow’s sexy technology. • The Grid researchers and technologists have to start to point to results/applications that utilize the Grid to solve problems or enable new applications that would have be unachievable without the Grid. • Contemporary Grid implementations are still far from initially described image and from being widely adopted.
Grid computing in pictures • Thanks to GridCafe (http://gridcafe.web.cern.ch/gridcafe - i strongly recommend that you also visit this link), it is now MOVIE time.
Thank you ! Questions? Observation?
Grid characteristics • Collaboration - Grid is sharing of resources in a distributed fashion. A Grid spans multiple administrative domains seamlessly. • Aggregation - A Grid is more than the sum of all parts. A Grid aggregates many resources and therefore provides an aggregation of the capacity of the individual resources into a higher capacity virtual resource. The capability of individual resources is preserved. As a consequence, from a global standpoint the Grid enables running larger applications faster (aggregation capacity), while from a local standpoint the Grid enables running new applications • Virtualization – Grid services are often provided with a certain interface that hides the complexity of the underlying resources. Virtualization provides an abstract “layer” between clients and resources, Therefore, a Grid provides the ability to virtualize the sum of parts into a singular wide-area programming model.
Grid characteristics • Service orientation - Grids provide services, following the concept of a service orient architecture. In the widest sense all large scale collections of services can be viewed as Grids. • Heterogeneity - A Grid typically consists of heterogeneous computing resources, i.e. there is a variety of different hardware and software components with different performance and latency characteristics. • Decentralized control - components are under control of multiple entities, i.e. the key difficulties in Grids lay exactly in not having a single ‘owner’ of the whole system. One of the requirements of a Grid is the use of distributed control mechanisms • Standardization and interoperability - A Grid promotes standard interface definitions for services that need to interoperate to create a general distributed infrastructure to fulfill users’ tasks and provide user level utilities. Grid is exposing the need for increased levels of integration of distinct technologies and for increased agreements in the standardization of services. The success of the implementation of the Grid very much depends on these aspects. Furthermore, the Grid should provide uniform access to heterogeneous resources through virtualization.
Grid characteristics • Access transparency - The Grid should allow its users to access the computing infrastructure without having to be intimately aware of the underlying architecture or network topology]. This is sometimes considered the most distinctive aspect of Grid Computing, that is, the levels of transparency provided for the end-user, through the virtualization of resources. • Scalability - Even if Grid implementations and infrastructures sometimes do not solve a new problem, it is often the scale of data, resources and users that contributes to the additional complexity of a Grid. • Reconfigurability - A Grid should be “dynamically reconfigurable” (CoreGRID definition). • Security - Grid security is one of the first things that real Grid users have to deal with and therefore is essential for any Grid software system that spans multiple administrative domains. • Application support – Applications should also be part of the Grid and the whole Grid environment (where for environment I mean the hardware, middleware, and applications) should be data-driven. In particular, it should be able to react to changes of the system and application behaviors captured by application and system data.