650 likes | 912 Views
Globus: A Core Grid Middleware. Source: The Globus Project Argonne National Laboratory, University of Southern California / ISI www.globus.org Localised by: Rajkumar Buyya. The Globus Project. Basic research in grid-related technologies
E N D
Globus: A Core Grid Middleware Source: The Globus Project Argonne National Laboratory, University of Southern California / ISI www.globus.org Localised by: Rajkumar Buyya
The Globus Project • Basic research in grid-related technologies • Resource & data management, security, QoS, policy, communication, adaptation, etc. • Development of Globus Toolkit • Core services for grid-enabled tools & apps • Construction of production grids & testbeds • Multiple deployments to distributed organizations for production & prototyping • Application experiments • Distributed applications, tele-immersion, etc.
Globus Approach • A toolkit and collection of services addressing key technical problems • Modular “bag of services” model • Not a vertically integrated solution • General infrastructure tools (aka middleware) that can be applied to many application domains • Inter-domain issues, rather than clustering • Integration of intra-domain solutions • Distinguish between local and global services
Globus Hourglass • Focus on architecture issues • Propose set of core services as basic infrastructure • Use to construct high-level, domain-specific solutions • Design principles • Keep participation cost low • Enable local control • Support for adaptation • “IP hourglass” model A p p l i c a t i o n s Diverse global services Core Globus services Local OS
Technical Focus & Approach • Enable incremental development of grid-enabled tools and applications • Model neutral: Support many programming models, languages, tools, and applications • Evolve in response to user requirements • Deploy toolkit on international-scale production grids and testbeds • Large-scale application development & testing • Information-rich environment • Basis for configuration and adaptation
Globus Toolkit Services • Security (GSI) • PKI-based Security (Authentication) Service • Job submission and management (GRAM) • Uniform Job Submission • Information services (MDS) • LDAP-based Information Service • Remote file management (GASS) • Remote Storage Access Service • Remote Data Catalogue and Management Tools • Support by Globus 2.0 released in 2002
High throughput Collab. design Remote control Application Toolkit Layer Data- intensive Remote viz Information Resource mgmt . . . Grid Services Layer Security Data access Fault detection Grid Services Architecture High-energy physics data analysis Collaborative engineering On-line instrumentation Applications Regional climate studies Parameter studies Grid Fabric Layer Transport . . . Multicast Instrumentation Control interfaces QoS mechanisms
Layered Architecture Applications Application Toolkits GlobusView Testbed Status DUROC MPI Condor-G HPC++ Nimrod/G globusrun Grid Services Nexus GRAM GSI-FTP I/O HBM GASS GSI MDS Grid Fabric Condor MPI TCP UDP DiffServ Solaris LSF PBS NQE Linux NT
Sample of High-Level Services • Resource brokers and co-allocators • DUROC, Nimrod/G, Condor-G, GridbusBroker Communication & I/O libraries • MPICH-G, PAWS, RIO (MPI-IO), PPFS, MOL • Parallel languages • HPC++, CC++, Nimrod Parameter Specification • Collaborative environments • CAVERNsoft, ManyWorlds • Others • MetaNEOS, NetSolve, LSA, AutoPilot, WebFlow
The Nimrod-G Grid Resource Broker • A resource broker for managing, steering, and executing task farming (parameter sweep/SPMD model) applications on the Grid based on deadline and computational economy. • Based on users’ QoS requirements, our Broker dynamically leases services at runtime depending on their quality, cost, and availability. • Key Features • A single window to manage & control experiment • Persistent and Programmable Task Farming Engine • Resource Discovery • Resource Trading • Scheduling & Predications • Generic Dispatcher & Grid Agents • Transportation of data & results • Steering & data management • Accounting • Uses Globus – MDS, GRAM, GSI, GASS
Condor-G: Condor for the Grid • Condor is a high-throughput scheduler • Condor-G uses Globus Toolkit libraries for: • Security (GSI) • Managing remote jobs on Grid (GRAM) • File staging & remote I/O (GSI-FTP) • Grid job management interface & scheduling • Robust replacement for Globus Toolkit programs • Globus Toolkit focus is on libraries and services, not end user vertical solutions • Supports single or high-throughput apps on Grid • Personal job manager which can exploit Grid resources
Production Grids & Testbeds • Production deployments underway at: • NSF PACIs National Technology Grid • NASA Information Power Grid • DOE ASCI • European Grid • Research testbeds • EMERGE: Advance reservation & QoS • GUSTO: Globus Ubiquitous Supercomputing Testbed Organization • Particle Physics Data Grid • World-Wide Grid (WWG)
Production Grids & Testbeds NASA’s Information Power Grid The Alliance National Technology Grid GUSTO Testbed
WW Grid World Wide Grid (WWG) Australia North America GMonitor Melbourne+Monash U: VPAC, Physics ANL: SGI/Sun/SP2 NCSA: Cluster Wisc: PC/cluster NRC, Canada Many others Gridbus+Nimrod-G MEG Visualisation Solaris WS Internet @ SC 2002/Baltimore Europe Grid MarketDirectory ZIB: T3E/Onyx AEI: Onyx CNR: Cluster CUNI/CZ: Onyx Pozman: SGI/SP2 Vrije U: Cluster Cardiff: Sun E6500 Portsmouth: Linux PC Manchester: O3K Cambridge: SGI Many others Asia AIST, Japan: Solaris Cluster Osaka University: Cluster Doshia: Linux cluster Korea: Linux cluster
Example Applications Projects (via Nimrod-G or Gridbus) • Molecular Docking for Drug Discovery • Docking molecules from chemical databases with target protein • Neuro Science • Brain Activity Analysis • High Energy Physics • Belle Detector Data Analysis • Natural Language Engineering • Analyzing audio data (e.g., to identify emotional state of a person!)
Example Application Projects • Computed microtomography (ANL, ISI) • Real-time, collaborative analysis of data from X-Ray source (and electron microscope) • Hydrology (ISI, UMD, UT; also NCSA, Wisc.) • Interactive modeling and data analysis • Collaborative engineering (“tele-immersion”) • CAVERNsoft @ EVL • OVERFLOW (NASA) • Large CFD simulations for aerospace vehicles
Example Application Experiments • Distributed interactive simulation (CIT, ISI) • Record-setting SF-Express simulation • Cactus • Astrophysics simulation, viz, and steering • Including trans-Atlantic experiments • Particle Physics Data Grid • High Energy Physics distributed data analysis • Earth Systems Grid • Climate modeling data management
The Globus Advantage • Flexible Resource Specification Language which provides the necessary power to express the required constraints • Services for resource co-allocation, executable staging, remote data access and I/O streaming • Integration of these services into high-level tools • MPICH-G: grid-enabled MPI • globus-job-*: flexible remote execution commands • Nimrod-G Grid Resource broker • Gridbus: Grid Business Infrastructure • Condor-G: high-throughput broker • PBS, GRD: meta-schedulers
Resource Management • Resource Specification Language (RSL) is used to communicate requirements • The Globus Resource Allocation Manager (GRAM) API allows programs to be started on remote resources, despite local heterogeneity • A layered architecture allows application-specific resource brokers and co-allocators to be defined in terms of GRAM services
Broker Co-allocator Resource Management Architecture RSL specialization RSL Application Information Service Queries & Info Ground RSL Simple ground RSL Local resource managers GRAM GRAM GRAM LSF EASY-LL NQE
GRAM Components MDS client API calls to locate resources Client MDS: Grid Index Info Server Site boundary MDS client API calls to get resource info GRAM client API calls to request resource allocation and process creation. MDS: Grid Resource Info Server Query current status of resource GRAM client API state change callbacks Globus Security Infrastructure Local Resource Manager Allocate & create processes Request Job Manager Create Gatekeeper Process Parse Monitor & control Process RSL Library Process
A simple run • [raj@belle raj]$ globus-job-run belle.anu.edu.au /bin/date • Mon May 3 15:05:42 EST 2004
Resource Specification Language (RSL) • Common notation for exchange of information between components • Syntax similar to MDS/LDAP filters • RSL provides two types of information: • Resource requirements: Machine type, number of nodes, memory, etc. • Job configuration: Directory, executable, args, environment • API provided for manipulating RSL
RSL Syntax • Elementary form: parenthesis clauses • (attribute op value [ value … ] ) • Operators Supported: • <, <=, =, >=, > , != • Some supported attributes: • executable, arguments, environment, stdin, stdout, stderr, resourceManagerContact,resourceManagerName • Unknown attributes are passed through • May be handled by subsequent tools
Constraints: “&” • globusrun -o -r belle.anu.edu.au "&(executable=/bin/date)" • For example: & (count>=5) (count<=10) (max_time=240) (memory>=64) (executable=myprog) “Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours”
Disjunction: “|” • For example: • & (executable=myprog) • ( | (&(count=5)(memory>=64)) • (&(count=10)(memory>=32))) • Create 5 instances of myprog on a machine that has at least 64MB of memory, or 10 instances on a machine with at least 32MB of memory
Multirequest: “+” • A multi-request allows us to specify multiple resource needs, for example + (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2)) • Execute 5 instances of p1 on a machine with at least 64M of memory • Execute p2 on a machine with an ATM connection • Multirequests are central to co-allocation
Co-allocation • Simultaneous allocation of a resource set • Handled via optimistic co-allocation based on free nodes or queue prediction • In the future, advance reservations will also be supported • globusrun and globus-job-* will co-allocate specific multi-requests • Uses a Globus component called the Dynamically Updated Request Online Co-allocator (DUROC)
DUROC Functions • Submit a multi-request • Edit a pending request • Add new nodes, edit out failed nodes • Commit to configuration • Delay to last possible minute • Barrier synchronization • Initialize computation • Bootstrap library • Monitor and control collection
RM1 RM2 RM3 Job 1 Job 2 Job 3 RM4 Job 4 Job 5 DUROC Architecture Controlled Jobs Subjobstatus Controlling Application RSL multi-request Edit request Barrier
RSL Creation Using globus-job-run • globus-job-run can be used to generate RSL from command-line args: globus-job-run –dumprsl \ -: host1 -np N1 [-s] executable1 args1 \ -: host2 -np N2 [-s] executable2 args2 \ ... > rslfile • -np: number of processors • -s: stage file • argument options for all RSL keywords • -help: description of all options
Job Submission Interfaces • Globus Toolkit includes several command line programs for job submission • globus-job-run: Interactive jobs • globus-job-submit: Batch/offline jobs • globusrun: Flexible scripting infrastructure • Other High Level Interfaces • General purpose • Nimrod-G, Condor-G, PBS, GRD, etc • Application specific • ECCE’, Cactus, Web portals
globus-job-run • For running of interactive jobs • Additional functionality beyond rsh • Ex: Run 2 process job w/ executable staging globus-job-run -: host –np 2 –s myprog arg1 arg2 • Ex: Run 5 processes across 2 hosts globus-job-run \ -: host1 –np 2 –s myprog.linux arg1 \ -: host2 –np 3 –s myprog.aix arg2 • For list of arguments run: globus-job-run -help
globus-job-submit • For running of batch/offline jobs • globus-job-submit Submit job • Same interface as globus-job-run • Returns immediately • globus-job-status Check job status • globus-job-cancel Cancel job • globus-job-get-output Get job stdout/err • globus-job-clean Cleanup after job
globusrun • Flexible job submission for scripting • Uses an RSL string to specify job request • Contains an embedded globus-gass-server • Defines GASS URL prefix in RSL substitution variable: (stdout=$(GLOBUSRUN_GASS_URL)/stdout) • Supports both interactive and offline jobs • Complex to use • Must write RSL by hand • Must understand its esoteric features • Generally you should use globus-job-* commands instead
“Perform a parameter study involving 10,000 separate trials” Parameter study specific broker " . . ." “Create a shared virtual space with participants X, Y, and Z” Collaborative environment-specific resource broker " . . ." Resource Brokers “Run a distributed interactive simulation involving 100,000 entities” “Supercomputers providing 100 GFLOPS, 100 GB, < 100 msec latency” DIS-Specific Broker Information Service Supercomputer resource broker “80 nodes on Argonne SP, 256 nodes on CIT Exemplar 300 nodes on NCSA O2000” Simultaneous start co-allocator "Run SF-Express on 80 nodes” "Run SF-Express on 256 nodes” “Run SF-Express on 300 nodes” Argonne Resource Manager CIT Resource Manager NCSA Resource Manager
Brokering via Lowering • Resource location by refining a RSL expression (RSL lowering): (MFLOPS=1000)Þ (& (arch=sp2)(count=200))Þ (+ (& (arch=sp2) (count=120) (resourceManagerContact=anlsp2)) (& (arch=sp2) (count=80) (resourceManagerContact=uhsp2)))
Remote I/O and Staging • Tell GRAM to pull executable from remote location • Access files from a remote location • stdin/stdout/stderr from a remote location
What is GASS? (a) GASS file access API • Replace open/close with globus_gass_open/close; read/write calls can then proceed directly (b) RSL extensions • URLs used to name executables, stdout, stderr (c) Remote cache management utility (d) Low-level APIs for specialized behaviors
GASS Architecture &(executable=https://…) main( ) { fd = globus_gass_open(…) … read(fd,…) … globus_gass_close(fd) } (b) RSL extensions GRAM GASS Server HTTP Server (a) GASS file access API FTP Server Cache (c) Remote cache management (d) Low-level APIs for customizing cache & GASS server % globus-gass-cache
GASS File Naming • URL encoding of resource names https://quad.mcs.anl.gov:9991/~bester/myjob protocolserver address file name • Other examples https://pitcairn.mcs.anl.gov/tmp/input_dataset.1 https://pitcairn.mcs.anl.gov:2222/./output_data http://www.globus.org/~bester/input_dataset.2 • Supports http & https • Support ftp & gsiftp.
GASS RSL Extensions • executable, stdin, stdout, stderr can be local files or URLs • executable and stdin loaded into local cache before job begins (on front-end node) • stdout, stderr handled via GASS append mode • Cache cleaned after job completes
GASS/RSL Example &(executable=https://quad:1234/~/myexe) (stdin=https://quad:1234/~/myin) (stdout=/home/bester/output) (stderr=https://quad:1234/dev/stdout)
Example GASS Applications • On-demand, transparent loading of data sets • Caching of data sets • Automatic staging of code and data to remote supercomputers • (Near) real-time logging of application output to remote server
GASS File Access API • Minimum changes to application • globus_gass_open(), globus_gass_close() • Same as open(), close() but use URLs instead of filenames • Caches URL in case of multiple opens • Return descriptors to files in local cache or sockets to remote server • globus_gass_fopen(), globus_gass_fclose()
GASS File Access API (cont) • Support for different access patterns • Read-only (from local cache) • Write-only (to local cache) • Read-write (to/from local cache) • Write-only, append (to remote server)
no Modified Remove cache reference yes Upload changes globus_gass_open()/close() no URL in cache? Download File into cache yes open cached file,add cache reference globus_gass_close() globus_gass_open()
GASS File API Semantics • Copy-on-open to cache if not truncate or write-only append and not already in cache • Copy on close from cache if not read only and not other copies open • Multiple globus_gass_open() calls share local copy of file • Append to remote file if write only append: e.g., for stdout and stderr • Reference counting keeps track of open files
globus-gass-server • Simple file server • Run by user wherever necessary • Secure https protocol, using GSI • APIs for embedding server into other programs • Example globus-gass-server –r –w -t • -r: Allow files to be read from this server • -w: Allow files to be written to this server • -t: Tilde expand (~/… $(HOME)/…) • -help: For list of all options
program GASS server stdout 1 Host name Contact string jobmanager globus-job-run 2 Command Line Args RSL string GRAM & GASS: Putting It Together 1. Derive Contact String 2. Build RSL string 3. Startup GASS server 4. Submit to request 5. Return output 5 5 4 5 5 3 4 4 gatekeeper