400 likes | 712 Views
Grid Computing. The Grid Dream Grid Electricity Social Perspective Uptake Example Hadron Collider Virtual Organisations VO concepts A Grid Criteria Globus An brief overview of the Globus toolkit Basic Execution Service Web Service interface to Grid processes. The Grid.
E N D
Grid Computing • The Grid Dream • Grid Electricity • Social Perspective • Uptake • Example • Hadron Collider • Virtual Organisations • VO concepts • A Grid • Criteria • Globus • An brief overview of the Globustoolkit • Basic Execution Service • Web Service interface to Grid processes
The Grid • Gridtakes its name from analogy with electrical power Grid: • electricity on demand via wall socket • source unknown but reliable • transparency and resilience are keys to its success The Grid dream is to allow users to tap into resources off the Internet as easily as electrical power can be drawn from a wall socket
Can this Happen? • To make this happen, what do we need: • Pervasive deployment of infrastructure • security • accountancy i.e pay for what you use… • transparent access • The user is not aware (and doesn’t care) what computing resources are used to solve their problem … just as one has with power.
the history of the evolution of infrastructures e.g. Electricity Grid, shows that returns on initial investments are an important factor in providing access to the capital required for further roll-outs (and hence a reduction in the 'Digital Divide'). • This is why Edison chose Wall Street in New York • Needed dense population • Needed to find capital for switching costs .. Social Uptake .. • same needed for Grid computing • strong industrial involvement (and profit) • pervasive uptake – need standards based infrastructure
Infrastructure • The US National Science Foundation committed: • 2001: $53 million on the TeraGrid - 13.6 teraflops of computing power, over 450 terabytes of data storage, and high-resolution visualisation systems, interconnected by a 40Gbps network. • 2002: $35 Million supplement • 2003: further $10 Million supplement Early Uptake of the Grid Standards Based All Grid software is based on open software and tries to be standards based through OGF (Open Grid Forum – www.ogf.org) and other organizations
Example: UK e-Science Program • Spending Reviews • 2000 : £120m for 3 years • 2002 : Further £115m for years 4 & 5 • 2003 : Further £16.2 million • 2004 : Further £18 Million • Development of key IT infrastructure to support e-Science • Managed by Research Councils & DTI • Application specific Pilot Projects • Core programme to identify and develop generic Grid middleware
UK e-Science Network Cardiff not officially an eScience Centre now…
Example – Large Hadron Collider • 15 Petabytes (15 million Gigabytes) of data generated per year • Raw data: per event = 1 Mb @ ~40 million events per second. • Needs to be stored, distributed and analyzed • Worldwide LHC Computing Grid • More than 8,000 physicists • Four main experiments using the collider
Example – Large Hadron Collider • In grid computing a process is called a job. • Users submit jobs to the system via one of the participating institutions • Currently about one million jobs per day • peaks of 10 gigabytes per second data transfer
Grid Applications • Grid jobs are applications with particular needs: • Processing huge data • Potentially huge computation • Two examples: • Monte Carlo experiments • For problems which it is infeasible to compute completely • Select input domain • Randomly select inputs • Run deterministic algorithm on the random inputs • Parameter Sweep applications • Run many parameters on the same data • In parallel
What is required? • Resource sharing • Computers, data, instruments, networks • Security infrastructure • Not anyone can play • Expensive Kit • Executing code on remote machines belonging to other organizations • Multi-institutional “virtual organisations” • Overlaying traditional organisational structures • Large or small, static or dynamic • In terms of the WLHC grid, the virtual organizations are the experiments
Brief History • First Generation: Early Metacomputing environments, such as FAFNER (http://www.npac.syr.edu/factoring.html) and the I-WAY (see next slide) • Second Generation: • Core Grid technologies like the Globus toolkit (www.globus.org – later) and Legion (http://legion.virginia.edu/download/) • Distributed object systems e.g. Jini (www.jini.org) and CORBA (www.corba.org) • Grid resource brokers and Schedulers e.g. • Condor (http://www.cs.wisc.edu/condor/) and • SGE (http://wwws.sun.com/software/gridware/sge.html) • Integrated systems including Cactus (cactuscode.org), DataGrid, UNICORE (www.unicore.org) andgLite • Application user interfaces for remote steering and visualization e.g. Portals and Grid Computing Environments (later..) • The Third Generation: • introduction of a service-oriented approach (e.g. Web services) • Increasing use of metadata (giving more detailed information describing services)
The I-Way • connected supercomputers and other resources at 17 sites across North America based on ATM • consisted of a number of I-POP (point of presence) • Connected by the internet or ATM networks • I-Soft software could access the configured I-POP machines and provided an environment that consisted of a number of services, including: • scheduling (jobs) • security (authentication and auditing), • Parallel programming support (process creation and communication) • distributed file system (using AFS, the Andrew File System). • I-WAY became Globus ..
The I-WAY Local Resource Local Resource Local Resource ATM Switch ATM Switch ATM Switch I-POP I-POP I-POP Local Resource Local Resource Local Resource AFS AFS AFS Kerberos Kerberos Kerberos Schedluer Schedluer Schedluer Possible Firewall Possible Firewall Possible Firewall Internet or ATM
Grid Definition Foster I, Kesselman C and Tuecke S, (2001) The Anatomy of the Grid: Enabling Scalable Virtual Organizations • “The Grid is flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources • The concept of Virtual Organisations
Virtual Organisations (VOs) Virtual Organisations provide a highly controlled environment to allow each resource provider to specify exactly what they want to share, who is allowed to share it and the conditions whereby this sharing occurs. The set of individuals and/or institutions that provide such sharing rules are collectively known as a virtual organisation (VO).
A VO Overview Users/Clients Internet Routing Middleware Virtual Organization (VO) Resources • in Grid computing you can execute your own code on remote resources • Must be secure !! • The VO provide blanket security policy for sharing between organizations • VO is implemented by Middlewaree.g. Globus
To be or not to be a Grid The Criteria ! A Grid must • coordinate resources that are not subject to centralized control • uses standard, open, general-purpose protocols and interfaces • delivers non-trivial qualities of service (QoS)
Decentralized Control The first point in the check list is talking about how the resources that make up the distributed system are controlled, whether they are: • centrally controlled by one administrator (a non-Grid) • consist of a number of interacting administrative domainsthat pull resource together using common policies. Therefore, computational Grids should connect resources together at different administrative domains
Standard, Open, General-Purpose Protocols Grid computing is aiming to help standardize the way we do distributed computing rather than having a multitude of non-interoperable distributed systems. A standards-based open architecture promotes extensibility, interoperability and portability because they have general agreement within the community. To help with this standardization process, the Grid community has theOpen Grid Forum (OGF)
QoS There are three types of quality support that can be provided: • None: No QoS is supported at all... • Soft: You can specify QoS requirements and these will try to be met but they cannot be guaranteed. This is the most common form of QoS implemented in Grid applications. • Hard: This is where all nodes on the Grid support and guarantee the level of QoS requested. A Grid should be able to deliver non-trivial QoS, whether for example this is measured by: • performance • Service, data availability • data transfer etc QoS is application specific - it depends on the needs of the application… For example, in a Physics experiment, the QoSmay be specified in terms of computational throughput but on other experiments, the QoS maybe specified in terms of reliability of file transfers or data content.
Globus – globus.org most deployed Grid middleware. Consists of three elements: • Information Services: to provide information about Grid services • Data Management: involves accessing and managing data • Resource Management: to allocate resources provided by a Grid. And, of course, security: • Security: to provide authentication, delegation and authorization
Grid Security Infrastructure • Allows for mutual authentication • we looked at X509 Certificates • GSI uses these at both the server and client side • To use Grid resources, you usually need a certificate signed by a well-known Grid CA. • Delegation • to access remote machines with certain privileges delegation is used. • a user identity on one machine delegates its privileges to another identity • proxy certificates are used for this: • new public/private key pair generated • short lifespan • signed by the user • trusted because the user’s certificate is signed by the CA – certificate chain. • The above allows GSI to support single sign-on. • once I am trusted within a realm, I don’t need to authenticate again • My trusted certificate is checked, then I run my code on another machine under a different, locally trusted user
Grid Information Services • Monitoring and Discovery Service (MDS) • umbrella for underlying protocols • GRIS (Grid Resource Information Servers) that collect data on each resource and can be located on a well-known port (i.e., 2135). • IP (Information Providers) which provide the interface between local data collection service and the GRIS servers. • GIIS (Grid Index Information Services) that collect information from one or more GRIS servers and act as a lookup service for resource information.
Data Management • GridFTP • optimized data transfer • Data Replication • replica catalogue • replication manager • important for sharing resources, storing closer to computation, and slicing data. • GASS • Global Access to Secondary Storage • provides URL based interface to variety of data transfer protocols.
Resource Management • GRAM • Globus Resource Allocation Manager • main component – Gatekeeper • checks user credentials, creates job manager, gets files using GASS, monitors job using GRIS
The Globus Grid Users/Clients Internet Routing GSI X. 509 VO VO Resources MDS GRAM MDS Middleware (Globus) MDS GridFTP Mutual Authentication Single Sign-on VO
OGSA • Open Grid Services Architecture • There are a number of toolkits and interfaces to computational and data resources, e.g • Globus Toolkit • UNICORE • gLite • They do not interoperate, which means different institutions cannot easily collaborate by creating VOs • OGSA stresses open, standard protocols to facilitate interoperability • Web Services standards provide a technology set to achieve this • By ‘wrapping’ legacy interfaces with WS interfaces
OGSA Basic Execution Service (BES) • A service to which clients can send requests to initiate, monitor, and manage computational activities • A basic requirement of any Grid - run a job on some resource • BES aims to provide a WS interface for a variety of job submission systems. • Defines an extensible state model for activities • Uses another specification for defining activities - JSDL • What is JSDL?
Job Submission Description Language (JSDL) • This is an XML language for defining a Job (process) to be run somewhere • Independent of BES • e.g. supported by GridSAM - Grid Job Submission and Monitoring Web Service (London eScience Centre) • Describes everything you need to run an executable.
JSDL Elements • <JobDefinition> • <JobDescription> • <JobIdentification>? • Describes the job - name, description, project • <Application>? • Describes the executable - name, version, description, args • <Resources>? • Describes resources needed by the application - possible hosts, OS, CPU architecture, CPU speed, CPU number, memory etc, etc. • <DataStaging>* • Describes the files that need to be pre and post staged
BES basic state model BES defines a state model for activities, aka jobs The model is extensible within certain boundaries, i.e. the extension should not break the basic model if the client does not understand the extension
Operations of the BES • With a service called ‘Basic Execution Service’ you might expect an operation called ‘execute(activity)’, but no. • BES defines two WSDL portTypes with operations relating to activites • BES-Management PortType • StopAcceptingNewActivities • StartAcceptingNewActivities • BES-Factory PortType • create an activity (hence ‘factory’) • terminate an activity • Get the current state as defined by the state model. • Get the attributes of Factory service
BES-Factory Operations • CreateActivity • Input: • a JSDL document as input • Output: • a WS-Addressing EndpointReference (EPR) referring to the activity created from the JSDL document • This EPR is used in subsequent calls - but explicitly in the operation input - NOT in the way WS-RF uses EPRs
BES-Factory Operations • GetActivityStatuses • Input: • An array of WS-Addressing EndpointReferences (EPRs) • Output: • An array of ActivityStatus XML types • These contain a value encapsulating the current state of activity, e.g. TERMINATED or RUNNING • As well as the EPR associated with the activity (i.e. the EPR the client sent in the request)
BES-Factory Operations • TerminateActivities • Input: • An array of WS-Addressing EndpointReferences (EPRs) • Output: • An array of TerminateActivityResponse XML types • These contain the EPR associated with the activity • As well as a boolean indicating whether or not the activity has been terminated
BES-Factory Operations • GetActivityDocuments • Input: • An array of WS-Addressing EndpointReferences (EPRs) • Output: • An array of GetActivityDocumentResponse XML types • These contain the EPR associated with the activity • As well as the JSDL document initially sent by the client when creating the activity
BES-Factory Operations • GetFactoryAttributesDocument • Input: • None • Output: • A BESResourceAttributesDocument XML type • This contains the attributes of the Factory service, e.g. number of running activities, CPU type, OS etc.
Summary • The Grid Dream • A read, write, execute space • Types of applications • Big data, big computation • Virtual Organisations • Inter-organizational • No central control