2.21k likes | 2.37k Views
CS 602 — eScience and Grids. John Brooke j.m.brooke@man.ac.uk Donal Fellows donal.fellows@man.ac.uk. Lecture 1: What is a Grid.
E N D
CS 602 — eScience and Grids John Brooke j.m.brooke@man.ac.uk Donal Fellows donal.fellows@man.ac.uk
Lecture 1: What is a Grid We examine how the Grid concept arose, what its relation is to other concepts such as e-Science and CyberInfrastructure. We examine a more precise definition for a Computational Grid. There are other types of Grid but this is the main focus of this module
e-Science “In the future, e-Science will refer to the large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualisation back to the individual user scientists.” Dr John Taylor, Director General of the Research Councils, OST CS602
Cyber Infrastructure • Term coined by US Blue Ribbon panel - describes the emergence of an infrastructure linking high-performance computers, experimental facilities, data repositories. • Seems to be distinguished from term Grid, which is considered more to apply directly to computation and cluster style computing. • May or may not be the same thing as eScience. • eScience focuses on the way that science is done, cyber-infrastructure on how the infrastructure is provided to support this way of working. CS602
Grids as Virtual Organizations • Used in paper Anatomy of the Grid (Foster, Kesselman, Tuecke) • “ … Grid concept is coordinated resource sharing in dynamic, multi-institutional virtual organizations …” • There is an analogy with an electrical Power Grid where producers share resources to provide a unified service to consumers. • A large unresolved question is how do Virtual Organizations federate across security boundaries (e.g. firewalls) and organisational boundaries (resource allocation). • Grids may have hierarchical structures, e.g. the EU DataGrid, or may have more federated structures, e.g. EuroGrid CS602
What can Grids be used for? Storage devices Grid infrastructure (Globus, Unicore,…) “Instruments”: XMT devices, LUSI,… HPC resources Scalable MD, MC, mesoscale modelling User with laptop/PDA (web based portal) Steering ReG steering API Performance control/monitoring Visualization engines VR and/or AG nodes Moving the bottleneck out of the hardware and into the human mind… CS602
Grids for Knowledge/Information Flow Data Capture ClinicalImage/SignalGenomic/Proteomic Analysis Annotation /KnowledgeRepresentation InformationSources Knowledge Repositories Model & Analysis Libraries Hypotheses Design Integration ClinicalResourcesIndividualisedMedicine Data Mining Case-BaseReasoning InformationFusion CS602
Parallel and Distributed Computing • Parallel computing is the synchronous coupling of computing resource, usually in a single machine architecture or single administrative domain, e.g. a cluster. • Distributed computing refers to a much looser use of resources, often across multiple administrative domains. • Grid computing is an attempt to provide a persistent and reliable infrastructure for distributed computing. • Users may wish to run workflows many times over a set of distributed resources, e.g. in bioinformatics applications. • Users may wish to couple heterogeneous resources for scientific collaboration, e.g. telescopes, computers, databases, video-conferencing facilities. CS602
Re-usability and Components • We wish to develop sufficient reusable components to provide common facilities so that applications and services can interoperate. • We can do this by various approaches, in Globus a toolkit is developed, in Unicore all actions on the Grid are modelled by abstractions encapsulated in an inheritance hierarchy. • As part of this course you should start to identify the strengths and weaknesses of these two approaches. • More radical approaches are to impose a meta-operating system to present the resources as a virtual computer. This was tried by the Legion project and the idea partially survives in the concept of a DataGrid. CS602
Toolkits for Grid Functions • Software development toolkits • Standard protocols, services & APIs • A modular “bag of technologies” • Enable incremental development of grid-enabled tools and applications • Reference implementations • Learn through deployment and applications • Open source A p p l i c a t i o n s Diverse global services Core services Local OS CS602
Layered Architecture Applications / Problem Solving Environments LUSI Portal Computational PSE Visualization & Steering Component Repository Application Toolkits MPICH-G DUROC GlobusView globusrun VIPAR Component Framework Grid Services GASS SRB MDS GSI GSI-FTP HBM GRAM Grid Fabric LSF NQE PBS MPI Tru64 UNICOS Linux Solaris IRIX Grid Resources Oxford EPCC Manchester Manchester Imperial College QM-LUSI/XMT Loughborough QM CS602
Core Functions for Grids Acknowledgements to Bill Johnston of LBL CS602
A Set of Core Functions for Grids • The GGF Document “Core Functions for Production Grids” is attempting to define Grids by the minimal set of functions that a Grid must implement to be “usable” • This is a higher level approach that does not attempt to specify how the functions are implemented, or what base technology is used to implement them • In the original Globus Toolkit functions were implemented in C and could be called via APIs, scripts or used on the command line • In Unicore functions were abstracted as a hierarchy of Java classes, then mapped to Perl scripts at a lower level, the “Incarnation process”. • In the Open Grid Services Architecture there is a move to a Web services based approach, the hosting environment assumes prominence. CS602
Converging Technologies Grid Computing Web Service & Semantic Web Technologies Agents CS602
Web Services • Early Grids were built on the technologies used for accessing supercomputers, e.g. ssh, shell scripts, ftp. Information services were built on directory services such as LDAP, Lightweight Directory Access Protocol. • However in the commercial sphere Web Services are becoming dominant based on SOAP, Simple Object Access Protocol, WSDL, Web Services Description Language and UDDI. • Early Grid systems such as Unicore and Globus are trying to refactor their functionality in terms of Web Services. • The key Grid concept not captured in Web services, is State, e.g what is the state of a job queue, the load on a resource, etc.. CS602
Other Types of Grid • The word Grid is very loosely used. • Some aspects of collaborative video-conferencing and advanced visualization are termed Grid. • These are currently trying to use technology developed for running computations, the results are not always usable. • This is just one indication that we must conceptualise what abstractions we need to capture in Grid software. • We also need to develop abstractions for both high and low level protocols, for security models, for user access policies. • The Unicore system we present has captured the key semantics and abstractions of a Computational Grid. CS602
Access Grid • Manchester official UK Constellation site Solar Terrestrial Physics Workshop Teleradiology, Denver CS602
Lecture 2: Computational Resource If the Grid concept is to move from a vague analogy to a workable scientific concept, the terms need to be more carefully defined. Here we describe one approach to defining one key abstraction, namely computational resource.
Terminology • We identify a problem: terms in distributed computing are used loosely and are thus not amenable to analysis. • We identify a possible programme: to seek for invariants which are conserved or are subject to identifiable constraints. • We now try to trace an analysis of the concept of “Computational Resource” since distributed computing networks are increasingly referred to as Grids. • An electricity grid distributes electrical power, a water grid distributes water, and information grid distributes information. • What does a computational grid distribute? CS602
The Analogy with a Power Grid • The power grid delivers electrical power in the form of a wave (A/C wave) • The form of the wave can change over the Grid but there is a universal (scalar) measure of power, Power = voltage x current. • This universal measure facilitates the underlying economy of the power grid. Since it is indifferent to the way the power is produced (gas, coal, hydro etc…) different production centres can all switch into the same Grid. • To define the abstractions necessary for a Computational Grid we must understand what we mean by computational resource. CS602
Information Grids • Information can be quantified as bits with sending and receiving protocols. • Bandwidth x time gives measure of information flow. Allows Telcos to charge. • Internet protocols allow discovery of static resource (e.g. WWW pages). • Information “providers” do not derive income directly according to volume of information supplied. Use other means (e.g. advertising, grants) to sustain resources needed. • Current Web is static, do not need to consider dynamic state, hence extensions needed for Open Grid Services Architecture. CS602
What is Computational Power? • Is there an equivalent of voltage x current? Megaflops? • Power is a rate of delivery of energy, so should we take Mflops/second. However this is application dependent. • Consider two different computations • Seti@home. Time factors not important. • Distributed collaborative working on a CFD problem with computation and visualization of results in multiple locations. Time and synchronicity are important! • But both may use exactly the same number of Mflops. CS602
Invariants in Distributed Computation • To draw an analogy with the current situation we refer to the status of physics in the 17th and 18th centuries. • It was not clear what the invariant quantities were that persisted through changes in physical phenomena. • Gradually quantities such as momentum, energy, electric charge were isolated and their invariance expressed in the form of Conservation Laws. • Without Conservation Laws, a precise science of physics is inconceivable. • We have extended our scope to important inequalities, e.g. Second Law of Thermodynamics, Bell’s inequality. • We must have constraints and invariants or analysis or modeling are impossible. CS602
An Abstract Space for Job-Costing • Define a job as a vector of computational resources • (r1,r2,…,rn) • A Grid resource advertises a cost function for each resource • (c1,c2,…,cn) • Cost function takes vector argument to produce job cost • (r1*c1 + r2*c2 + … + rn*cn) CS602
A Dual Job-Space • Thus we have a space of “requests” defined as a vector space of the computational needs of users over a Grid. For many jobs most of the entries in the vector will be null. • We have another space of “services” who can produce “cost vectors” for costing for the user jobs (providing they can accommodate them). • This is an example of a dual vector space. • A strictly defined dual space is probably too rigid but can provide a basis for simulations. • The abstract job requirements will need to be agreed. It may be a task for a broker to translate a job specification to a “user job” for a given Grid node. • A Mini-Grid can help to investigate a given Dual Job-Space with vectors of known length. CS602
4 - Dual Space Scalar cost in tokens 1 Job vector Cost 2 Cost vector User Job CS602
Computational Resource • Computational jobs ask questions about the internal structure of the provider of computational power in a manner that an electrically powered device does not. • For example, do we require specific compilers, libraries, disk resource, visualization servers? • What if it goes wrong, do we get support? If we transfer data and methods of analysis over the Internet is it secure? • A resource broker for high performance computation is a different order of complexity to a broker for an electricity supplier. CS602
Emergent Behaviour • Given this complexity, self-sustaining global Grids are likely to emerge rather than be planned. • Planned Grids can be important for specific tasks, the EU DataGrid project is an example. They are not required to be self-sustaining and questions of accounting and resource transfer are not of central interest. • We consider the EUROGRID multi-level structure as an emergent phenomenon that could have some pointers to the development of large scale, complex, self-sustaining computational Grids. • The Unicore Usite and Vsite structure is an elegant means of encapsulating such structure. CS602
Fractal Structure and Complexity • Grids are envisaged as having internal structure and also external links. • Via the external links (WANS, intercontinental networks) Grids can be federated. • Action of joining Grids raises interesting research questions: • 1. How do we conceptualise the joining of two Grids? • 2. Is there a minimum set of services that defines a Grid. • 3. Are there environments for distributed services and computing that are not Grids (e.g. a cluster) • We focus on the emergent properties of virtual organisations in considering whether they are Virtual Organizations. CS602
Resource Requestor and Provider Spaces • Resource requestor space (RR), in terms of what the user wants: e.g. Relocatable Weather Model, 10^6 points, 24 hours, full topography. • Resource Provider space (RP), 128 processors, Origin 3000 architecture, 40 Gigabytes Memory, 1000 Gigabytes disk space, 100 Mb/s connection. • We may even forward on requests from one resource provider to another, recasting of O3000 job in terms of IA64 cluster, gives different resource set. • Linkage and staging of different stages of workflow require environmental support, a hosting environment. CS602
request RR space RP space request B A sync RP space RR space RP space Request referral C D Figure 1: Request from RR space at A mapped into resource providers at B and C, with C forwarding a request formulated in RR space to RP space at D. B and C synchronize at end of workflow before results returned to the initiator A. RR and RP Spaces CS602
Resume • We have shown how some concepts from abstract vector spaces may be able to provide a definition of Computational Resource. • We do not know as yet what conservation laws or constraints could apply to such an abstraction and whether these would be useful in analysing distributed computing. • We believe that we can show convincingly that simple scalar measures such as Megaflops are inadequate to the task. • This invalidates the “league table” concept such as the Top 500 computers. Compuational resource will be increasingly judged by its utility within a given infrastructure. CS602
The Resource Universe • What is the “Universe” of resources for which we should broker? • One might use a search engine but then there is no agreed resource description language nor would users be able to run on most of the resources selected. • Globus uses a hierarchical directory structure, MDS based on LDAP. Essentially this is a “join the Grid model”, based on the VO concept. • By making Vsites capable of brokering we can potentially access the whole universe of Vsites. • Concept of a Shadow Resource DAG makes the resource search structurally similar to its implementation, maintains AJO abstraction. CS602
Towards a Global Grid Economy? • Much access to HPC resources is via national grants or the resources are private (governmental, commercial). Many problems with sharing resources, what incentives? • Grid resources can be owned by international projects but resources are allocated by national bodies. This is like collaboration in large scale facilities, e.g. CERN. • Europe has to go down the shared resource route, US doesn’t. Will this produce separate types of Grid economy? • The problems of accounting and resource trading are rarely touched on. Mini-Grids can help explore technical issues outside of political ones. CS602
Summary • The three different views of a distributed infrastructure relate to the way it is used. • We need to abstract usage patterns and see if we can link them to invariants that can be quantified. • We have investigated in depth the concept of “Computational Resource”. • This ties into all three definitions eScience collaborations use resources Cyber-infrastructures connect resources Grids distribute resources CS602
Human Factors • A prediction arises from this: that the abstracted idea of human collaboration will be essential to success in this field. • In an electricity Grid the human participants are completely anonymised and only influence via mass action e.g. a power surge. • Patterns of usage in eScience will be much more complex and dynamic. • It will belong to the post-Ford model of industrial production, this time the product will be knowledge. • Our search to abstractions to encapsulate this will be far more challenging and exciting. CS602
Lecture 3: Introduction to Unicore Unicore is the Grid middleware system you will study in depth. It is a complete system based on a three tier architecture. We have chosen it as an illustration because of its compact and complete nature and because it is very well-engineered for a Computational Grid. Thanks to Michael Parkin who created the slides in this lecture
UNICORE Grid • UNiform Interface to COmputing REsources • European Grid infrastructure to give secure and seamless access to High Performance Computing (HPC) resources • Secure: Strong authentication of users based on X509 certificates. Communication using SSL connections over a TCP/IP/Internet connection - Defined in the UNICORE Protocol Layer (UPL) specification. • Seamless: Uniform interface and consistent access to computing resources regardless of the underlying hardware, systems software, etc. - Achieved using Abstract Job Objects (AJO). • HPC resources based in centres in Switzerland, Germany, Poland, France, and United Kingdom integrated into a single grid CS602
Client: Interface to the user. Prepares and submits the job over the unsecured network to… Gateway: The entry point to the computing centre and secured network. Authenticates the user and passes job to… Server: Schedules the job for execution, translates the job to commands appropriate for the target system. UNICORE Grid Architecture • The UNICORE architecture is based on three layers: CS602
UNICORE Terminology • USite A site providing UNICORE Services (e.g. CSAR). • VSite A computing resource within the USite. • USpace Dedicated file space on VSite. May only exist during the execution of a job. • XSpace Permanent storage on the VSite. (e.g. users home directory). CS602
UNICORE Security • Between user and computing centre communications over SSL • Users X.509 certificate stored in the client. • Certificate encrypts data using Secure Sockets Layer (SSL) technology - Industry standard method for protecting web communications. - 128-bit encryption strength. • Defined in the UNICORE Protocol Layer (UPL) standard. • Prevents eavesdropping on and tampering with communications and data. • Provides instant authentication of visitor's identity instead of requiring individual usernames and passwords. • Within the computing centre communications are within secure network • Local site policy can specify encrypted communication if necessary. CS602
UNICORE Protocol Layer (UPL) • Is a set of rules by which data is exchanged between computers. • Request/reply structure. CS602
The Abstract Job Object (AJO) • Collection of approximately 250 Java classes representing actions, tasks, dependencies and resources • v4.0 can be downloaded from www.unicore.org. • Specify work to be done at a remote site seamlessly • No knowledge of underlying execution mechanism required. • Example classes: • ExecuteScriptTask • ListDirectory • CompileTask • Dependency • Processor, Storage • Signed, serialised Java object transmitted from the Client to gateway using the UPL CS602
Diagram shows how an Abstract Job object can be constructed from Tasks and groups of tasks. Resources can be allocated to each task.. Simplified AJO Class Diagram (1) AbstractTask ExecuteScriptTask AbstractJob AbstractAction ActionGroup FileTransfer Dependency CapacityResource ExecuteTask FileAction UserTask Resource FileTask {ordered} CopyPortfolioTask, ExportTask, GetPortfolio, ImportTask, PutPortfolio Memory, Node, PerformanceResource, Processor, RunTime, Storage ChangePermissions, CopyFile, CreateDirectory, DeleteFile, FileCheck, ListDirectory, RenameFile, SymbolicLink CopySpooled, DeclarePortfolio, DeleteSpooled, IncarnateFiles, MakePortfolio, Spool, UnSpool CS602
Simplified AJO Class Diagram (2) Outcome AbstractTask_Outcome ActionGroup_Outcome AbstractJob_Outcome • ChangePermissions_OutcomeCopyFile_Outcome CopyPortfolio_Outcome • CopyPortfolioToOutcome_OutcomeCopySpooled_Outcome CreateDirectory_Outcome • DeclarePortfolio_Outcome DeleteFile_Outcome DeletePortfolio_Outcome • DeleteSpooled_Outcome ExecuteTask_Outcome ExportTask_Outcome FileCheck_Outcome • GetPortfolio_Outcome ImportTask_Outcome IncarnateFiles_Outcome ListDirectory_Outcome • MakeFifo_Outcome, MakePortfolio_Outcome, MoveFifoToOutcome_Outcome • PutPortfolio_Outcome, RenameFile_Outcome Spool_Outcome, SymbolicLink_Outcome • UnSpool_Outcome CS602
AJO Example 1:ListDirectory :storage :listDirectory :abstractJob addResource() add() AbstractJob consigned to gateway Directory set using setTarget(string target) method. CS602
Used to download files on a specified VSite to the Client. Import task imports a file from the Storage area to the jobs USpace. (Portfolio represents a collection of files in the USpace). AJO Example 2:ImportTask :copyPortfolioToOutcome :dependency :abstractJob :importTask :storage AbstractJob consigned to gateway. File name set using addFile(string target) method 5. add() 1. add() 2. add() 4. add() 3. add() addResource() Dependency ensures that file(s) are in the USpace before copied to outcome CS602
AJO Example 3:ExecuteScriptTask Dependency (1) :executeScriptTask :abstractJob :makePortfolio :incarnateFiles :actionGroup d1: dependency d2 :dependency :resourceSet Name (String) Files (String[ ]) Script (byte[ ][ ]) :scriptType • IncarnateFiles • MakePortfolio • ResourceSet • + this diagram to be completed… AbstractJob consigned to gateway Script arguments set using setCommandLine(string args) method. add() add() add() add() add() add() setScriptType() setResource() Dependencies ensure that files arrive before task is executed CS602
Lectures 4-5: Unicore Client We now present a client side view of the Computational Grid. This will allow you to begin the practical exercises before engaging with the full complexity of the server side components and complete Grid architecture of Unicore. We thank Ralf Ratering of Intel for permission to use this material.
UNICORE • A production-ready Grid system that connects Supercomputers and Clusters to a Computing Grid. • Originally developed in German research projects UNICORE (1997-2000) and UNICORE Plus (2000-2003) • Client implemented by Pallas (now Intel PDSD) • Server implemented by Fujitsu as sub-contractor of Pallas • Further enhanced in European research projects • Eurogrid (2000-2003), Grip(2001-2003), OpenMolGrid (2002-2005), NextGrid (2004-2008), SimDat (2004-2008), others • Used as middleware for NaReGI CS602