650 likes | 669 Views
Grid Computing. ECI, July 2005. Living in an Exponential World. Moore’s Law: transistors count x2 in 18 months Storage density x2 in 12 months Online data x10 in 12 months (current = 10pB) Telescope to generate > 10pB by 2008 Network speed x2 in 9 months
E N D
Grid Computing ECI, July 2005
Living in an Exponential World • Moore’s Law: transistors count x2 in 18 months • Storage density x2 in 12 months • Online data x10 in 12 months (current = 10pB) • Telescope to generate > 10pB by 2008 • Network speed x2 in 9 months • 1986-2000: cpu x500, network x340000 • 2001-2010: cpu x60, network x4000
What is a Grid (informal) • Three key criteria: • Coordinates resources not under centralized control • Using standard, open, general purpose protocols and interfaces • To deliver non-trivial quality of service • What is not a Grid? • A cluster, a network attached storage • device, a scientific instrument, a network, • (though these are important components)
So… • We’ve got: • Fast computers (but not fast enough…) • Bigger storage (but not big enough…) • Fast networks (well, not speedy enough…) • And we want to: • Solve big computational problems… • In that case: • How about joining resources together ? • That’s GRID!
Why “Grid” ? • Analogy with the Power Grid • Service with known characteristics: • Stable voltage (~220v) • Contracted power • Pay the installed capacity and consumed power • Standard sockets, outlets, devices • Available 24/7 (usually…)
And in Computers • “Computer Grid” similar to “Power Grid” • Special socket to get connected • Pay subscription and the power consumed • If need more – contract more
Definitions of Grid • A paradigm/infrastructure that enables the sharing, selection, & aggregationof geographically distributed resources to solve large scale problems/applications • Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations • Computers, software, catalogue data and databases, special devices/instruments, people
What is a Grid (informal) • Three key criteria: • Coordinates resources not under centralized control • Using standard, open, general purpose protocols and interfaces • To deliver non-trivial quality of service • What is not a Grid? • A cluster, a network attached storage • device, a scientific instrument, a network, • (though these are important components)
Grid and the Hype • The classic Hype curve HERE !
Types of Grids • Grid systems can be classified depending on their usage: Distributed Supercomputing Computational Grid High Throughput Grid Systems Data Grid On Demand Services Grid Collaborative Multimedia
Types of Grids • Computational Grids • Distributed Supercomputing: grand challenge apps • High-Throughput: parametric modeling, independent tasks • Data Grids • Data mining, analysis, data processing • Service Grids • Collaborative: connects users, apps and devices • Multimedia: real time multimedia, virtual reality • Demand: aggregate more resource if required
database A Typical Grid Computing Environment Grid Information Service Grid Resource Broker Application R2 R3 R4 R5 RN Grid Resource Broker R6 R1 Resource Broker Grid Information Service
ComputeServer SimulationTool ComputeServer WebBrowser WebPortal RegistrationService Camera TelepresenceMonitor DataViewerTool Camera Database service ChatTool DataCatalog Database service CredentialRepository Database service Certificate authority How it Really Happens(A Simplified View) Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces
How it Really Happens(without Grid Software) A ComputeServer SimulationTool B ComputeServer WebBrowser WebPortal RegistrationService Camera TelepresenceMonitor DataViewerTool Camera C Database service ChatTool DataCatalog D Database service CredentialRepository E Database service Certificate authority Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces
How it Really Happens(with Grid Software) GlobusGRAM ComputeServer SimulationTool GlobusGRAM ComputeServer WebBrowser CHEF Globus IndexService Camera TelepresenceMonitor DataViewerTool Camera OGSADAI Database service ChatTool GlobusMCS/RLS OGSADAI Database service MyProxy OGSADAI Database service CertificateAuthority Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces
Grid Characteristics * Resource Management * Application Construction
Why is it Complex ? • Size (nodes, providers, consumers) • Heterogeneity of resources • Heterogeneity of fabric management • Systems, policies • Heterogeneity of applications • Type, requirements, patterns • Geographic distribution, varying time zones • Non-secure and Unreliable environment
APPLICATIONS Applications and Portals … Prob. Solving Env. Collaboration Engineering Web enabled Apps Scientific USER LEVEL MIDDLEWARE Development Environments and Tools Languages/Compilers Libraries Debuggers Monitors … Web tools Resource Management, Selection, and Aggregation (BROKERS) CORE MIDDLEWARE Distributed Resources Coupling Services Security Information Data Process Trading … QoS SECURITY LAYER Local Resource Managers FABRIC … Internet Protocols Libraries & App Kernels Queuing Systems Operating Systems Networked Resources across Organizations … Computers Networks Storage Systems Data Sources Scientific Instruments Layered Grid Architecture
Many sources of data, services, computation Security & policy must underlie access & management decisions Discovery R R RM RM Registries organize services of interest to a community Access RM Resource management is needed to ensure progress & arbitrate competing demands RM RM Policy service Security service Policy service Security service Data integration activities may require access to, & exploration/analysis of, data at many locations Exploration & analysis may involve complex,multi-step workflows Resource/Service Integrationas a Fundamental Challenge
Grid Middleware Technologies • Globus – Argonne National Lab and ISI • Gridbus – University of Melbourne • Unicore – Germany • Legion – University of Virginia
Globus Toolkit Services • Security (GSI) • PKI-based Security (Authentication) Service • Job submission and management (GRAM) • Uniform Job Submission • Information services (MDS) • LDAP-based Information Service • Remote file management (GASS) • Remote Storage Access Service • Remote Data Catalogue and Management Tools
Security • Resources and users belong to organizations • An authentication infrastructure is needed • Both users and owners should be protected from each other • Ensure security and privacy: • Data • Code • Message
Grid Security Infrastructure (GSI) Proxies and delegation (GSI Extensions) for secure single Sign-on • GSI is: Proxies and Delegation SSL/ TLS PKI (CAs and Certificates) SSL for Authentication And message protection PKI for credentials
Simple job submission • globus-job-run provides a simple RSH compatible interface • % grid-proxy-init Enter PEM pass phrase: ***** • % globus-job-run host program [args] • Authentication Test • % globusrun –a –r hostname • Running a Job on Remote node • % globusrun hostname <executable> • globus-job-run belle.anu.edu.au /bin/dat
Authorization • GSI handles authentication, but not authorization • Authorization issues: • Management of authorization on a multi-organization grid is still an interesting problem • Mapping resources to users does not scale well • Large communities that share resources...
Globus Resource Access Manager • Resource Specification Language (RSL) • GRAM allows programs to be started on remote resources • A layered architecture allows app-specific resource brokers and co-allocators to be defined as services
Broker Co-allocator Resource Management Architecture RSL specialization RSL Application Information Service Queries & Info Ground RSL Simple ground RSL Local resource managers GRAM GRAM GRAM LSF EASY-LL NQE
GRAM Components MDS client API calls to locate resources Client MDS: Grid Index Info Server Site boundary MDS client API calls to get resource info GRAM client API calls to request resource allocation and process creation. MDS: Grid Resource Info Server Query current status of resource GRAM client API state change callbacks Globus Security Infrastructure Local Resource Manager (e.g., PBS, Condor, or OS-fork()) Allocate & create processes Request Job Manager Create Gatekeeper Process Parse Monitor & control Process RSL Library Process
A simple run • Interactive Run/Output: • > globus-job-run belle.anu.edu.au /bin/date • Mon May 3 15:05:42 EST 2004 • > globusrun -o -r belle.anu.edu.au "&(executable=/bin/date)" • Sun May 22 17:27:22 EST 2005 • Batch Commands: • > globusrun -b -r belle.anu.edu.au "&(executable=/bin/date)(stdout=MyOutputFile)" • > gsincftpget belle.anu.edu.au . MyOutputFile (Pull output file to local directory)
Resource Specification Language (RSL) • Common notation for information exchange • Provides two types of information: • Resource requirements: machine type, number of nodes, memory, etc. • Job configuration: directory, executable, args, environment • API provided for manipulating RSL
RSL Syntax • Elementary form: parenthesis clauses • (attribute op value [ value … ] ) • Operators Supported: • <, <=, =, >=, > , != • Some supported attributes: • executable, arguments, environment, stdin, stdout, stderr • Unknown attributes are passed through • May be handled by subsequent tools
Constraints: “&” • globusrun -o -r belle.anu.edu.au "&(executable=/bin/date)" • For example: & (count>=5) (count<=10) (max_time=240) (memory>=64) (executable=myprog) “Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours”
Running job as batch job • globusrun -b -r belle.anu.edu.au '&(executable=/bin/date)(stdout=filename)' • It prints a "handle" that you can use to interrogate the job while it is running: • https://belle.anu.edu.au:4029/288/1116418550/ • Check job status: • > globusrun -status https://belle.anu.edu.au:4029/288/1116418550/ • Terminate job execution: • > globusrun -kill https://belle.anu.edu.au:4029/288/1116418550/
Disjunction: “|” • For example: • & (executable=myprog) • ( | (&(count=5)(memory>=64)) • (&(count=10)(memory>=32))) • Create 5 instances of myprog on a machine that has at least 64MB of memory, or 10 instances on a machine with at least 32MB of memory
Multirequest: “+” • A multi-request allows us to specify multiple resource needs, for example + (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2)) • Execute 5 instances of p1 on a machine with at least 64M of memory • Execute p2 on a machine with an ATM connection • Multirequests are central to co-allocation
Job Submission Interfaces • Command line programs for job submission • globus-job-run: Interactive jobs • globus-job-submit: Batch/offline jobs • globusrun: Flexible scripting infrastructure • Other High Level Interfaces • General purpose • Nimrod-G, Condor-G, Gridbus Broker, PBS, etc • Application specific • Web portals
globus-job-run • For running of interactive jobs • Additional functionality beyond rsh • Ex: Run 2 process job w/ executable staging globus-job-run -: host –np 2 –s myprog arg1 arg2 • Ex: Run 5 processes across 2 hosts globus-job-run \ -: host1 –np 2 –s myprog.linux arg1 \ -: host2 –np 3 –s myprog.aix arg2 • For list of arguments run: globus-job-run -help
globus-job-submit • For running of batch/offline jobs • globus-job-submit Submit job • Same interface as globus-job-run • Returns immediately • globus-job-status Check job status • globus-job-cancel Cancel job • globus-job-get-output Get job stdout/err • globus-job-clean Cleanup after job
“Perform a parameter study involving 10,000 separate trials” Parameter study specific broker " . . ." “Create a shared virtual space with participants X, Y, and Z” Collaborative environment-specific resource broker " . . ." Resource Brokers “Run a distributed interactive simulation involving 100,000 entities” “Supercomputers providing 100 GFLOPS, 100 GB, < 100 msec latency” DIS-Specific Broker Information Service Supercomputer resource broker “80 nodes on Argonne SP, 256 nodes on CIT Exemplar 300 nodes on NCSA O2000” Simultaneous start co-allocator "Run SF-Express on 80 nodes” "Run SF-Express on 256 nodes” “Run SF-Express on 300 nodes” Argonne Resource Manager CIT Resource Manager NCSA Resource Manager
Remote I/O and Data Access • Tell GRAM to pull executable from remote • Access files from a remote location • stdin/stdout/stderr from a remote location
What is GASS? • GASS file access API • Replace open/close with globus_gass_open/close; read/write calls can then proceed directly • RSL extensions • URLs used to name executables, stdout, stderr • Remote cache management utility • Low-level APIs for specialized behaviors
GASS File Naming • URL encoding of resource names https://quad.mcs.anl.gov:9991/~bester/myjob protocolserver address file name • Other examples https://pitcairn.mcs.anl.gov/tmp/input_dataset.1 https://pitcairn.mcs.anl.gov:2222/./output_data http://www.globus.org/~bester/input_dataset.2 • Supports http & https • Support ftp & gsiftp.
Example GASS Applications • On-demand, transparent loading of data sets • Caching of data sets • Automatic staging of code and data to remote supercomputers • (Near) real-time logging of application output to remote server
GASS File Access API • Minimum changes to application • globus_gass_open(), globus_gass_close() • Same as open(), close() but use URLs instead of filenames • Caches URL in case of multiple opens • Return descriptors to files in local cache or sockets to remote server
GASS File Access API (cont) • Support for different access patterns • Read-only (from local cache) • Write-only (to local cache) • Read-write (to/from local cache) • Write-only, append (to remote server)
program GASS server stdout 1 Host name Contact string jobmanager globus-job-run 2 Command Line Args RSL string GRAM & GASS 1. Derive Contact String 2. Build RSL string 3. Startup GASS server 4. Submit to request 5. Return output 5 5 4 5 5 3 4 4 gatekeeper
Example: A Simple Broker • Select machines based on availability • Use MDS queries to get current host loads • Look at output and figure out what machines to use • Generate RSL based on selection • globus-job-run -dumprsl can assist • Execute globusrun, feeding it the RSL generated in previous step
GRAM Components MDS client API calls to locate resources Client MDS: Grid Index Info Server Site boundary MDS client API calls to get resource info GRAM client API calls to request resource allocation and process creation. MDS: Grid Resource Info Server Query current status of resource GRAM client API state change callbacks Globus Security Infrastructure Local Resource Manager (e.g., PBS, Condor, or OS-fork()) Allocate & create processes Request Job Manager Create Gatekeeper Process Parse Monitor & control Process RSL Library Process
MDS: Monitoring and Discovery Service • General information infrastructure • Locate and determine characteristics of resources • Locate resources • Where are resources with required architecture, installed software, available capacity, network bandwidth, etc.? • Determine resource characteristics • What are the physical characteristics, connectivity, capabilities of a resource?