700 likes | 835 Views
A Look at Globus Grids. Leesa Brieger. What is a GRID?. A GRID must: Coordinate resources that are not under centralized control Use standard, open, general-purpose protocols and interfaces Deliver non-trivial quality of service. Not a grid:. a cluster a network
E N D
A Look at Globus Grids Leesa Brieger
What is a GRID? A GRID must: • Coordinate resources that are not under centralized control • Use standard, open, general-purpose protocols and interfaces • Deliver non-trivial quality of service
Not a grid: • a cluster • a network • a network-attached device These are important components of a grid, not the grid itself.
Why a GRID? • Construction and mining of large (distributed) databases • National Virtual Observatory - widespread access to and use of astronomical data collections; mosaicking services • Protein Data Base - single worldwide repository for the processing and distribution of 3-D biological macromolecular structure data; CE portal - allows structural comparison of proteins • Develop community resources (computational and data) • Large Hadron Collider particle physics experiments from CERN • Biomedical Informatics Research Network - to develop a "protocol" for collaborative research among neuroscientists and medical scientists in the neurosciences
Why a GRID? • Access specialized devices remotely • Brain imaging – data acquisition in situ, fast imaging on a (distant) supercomputer • Networks of sensors to accumulate field measurements for environmental studies • Develop simulations and analyses • ENZO astronomy project • Outsourcing becomes feasible • Geo-Services rents computing time from IBM
GRID Technology A GRID must: • allow the establishment and evolution of virtual organizations • manage security, trust, privacy • manage and monitor distributed resources • Computers, databases, networks, storage, software • federate data • discover available data, resources, services And all in a robust, secure, scalable, interoperable fashion.
How to enable Grid Technology? With APIs, standard protocols, services: • Protocol-mediated access to grid resources • Services must “speak” intergrid protocols • APIs form interface to grid protocols and facilitate application development
Open standards, so all can participate Global Grid Forum (GGF) - standards organization (www.gridforum.org) Open Grid Services Architecture (OGSA) – standard interface definitions (in evolution) Open source to reduce barriers to entry The Globus Project (www.globus.org)
Globus Toolkit • Open-source reference software base for developing Grid infrastructure and apps • Implements GGF standards • Service-oriented • Services can be decoupled from any fixed resource • A service consumes resources, but how is not most important • A better base abstraction for managing dependability, end-to-end quality of service
Globus Toolkit • Vendor-neutral • Moving over to OGSA structure (GT3) • embrace web services technologies • standard interfaces and behaviors for distributed system management • unify resources/services/information • easier integration with hosting environment and improved extensibility • leverage commercial efforts
identity & authentication authorization & policy resource/service discovery resource allocation (co-)reservation, workflow remote data access rapid data transfer monitoring intrusion detection resource management accounting fault management system evolution Grid Requirements
Layered Grid Architecture • Fabric Layer - provides the local services of a resource: • computational, storage, network • Connective Layer - core communication and authentication protocols • Enables exchange of data between fabric layer resources • Security and authentication important here
Layered Grid Architecture (2) • Resource Layer – enables resource sharing • Builds on connectivity layer to control and access resources (Ex: data servers) • Collective Layer - coordinates interactions across multiple resources • Ties multiple resources and services together (Ex: metacatalogues) • Application Layer - user applications use collective, resource, and connective layers to perform grid operations in a virtual organization
Layered Grid Architecture Application • Specialized services: user- or application-specific distributed services • Managing multiple resources: ubiquitous infrastructure services • Sharing resources: negotiating access, controlling use • Talking to things: communication (Internet protocols) and security • Controlling things locally: access to and control of resources User Collective Resource Connectivity Fabric
Globus Protocols - Connectivity Layer Grid Security Infrastructure (GSI): • Authentication/authorization, message protection across institutions • Single sign-on, delegation, identity mapping • Public key technology • Certificate authorities, certificate & key management
Globus Protocols - Resource Layer • Grid Resource Allocation Management (GRAM) • Remote allocation, control of compute resources • Furnishes information on state of the resources to the Metacomputing Directory Service (MDS) • GridFTP • High-performance data access and transport • Grid Resource Information Service (GRIS) • Access to structure and state info (MDS) • All built on connectivity layer
Globus Protocols - Collective Layer • Metadirectory services • SRB MCAT metacatalogue • Resource brokers • Condor Matchmaker • Co-reservation/co-allocation services • Workflow management services
Grid Security Infrastructure (GSI) Public key cryptography (asymmetric cryptography): • Encryption relies on two keys, related mathematically so that if either key encrypts a message, the other must be used to decrypt it • One key is public, the other is kept private • A user proves own identity by encrypting a message; if the public key can decrypt, the user is indeed holding the private key • No password is ever exchanged
GSI: Certificates • Globus uses the X.509 certification system to provide authentication services. • X.509certificates identify and authenticate users and services on the grid. • Certificates contain: • subject name: entity represented by the certificate • public key belonging to the subject • identity of a Certificate Authority (CA) that has signed (issued) the certificate certifying that both public key and identity belong to the subject • digital signature of the named CA
GSI: Certificates A Certificate Authority’s purpose is to offer a third-party verification of the link between a public key and the Common Name (CN) in a certificate. A certificate establishes/verifies the connection between a public key and a Distinguished Name (DN) at any site.
GSI: Authentication Mutual Authentication between A & B: • Each party must trust the CA who signed the other party’s certificate • A sends B his certificate • B checks the certificate for a valid signature of the CA to see if it really came from a trusted CA • B generates random message, asks A to encrypt, and decrypts it using A’s public key • If decryption works correctly, B now trusts A • Same procedure in reverse for A to establish trust in B
GSI: User Setup To access grid resources using GSI, must: • hold a valid account on the login resources • procure a certificate and private key from a trusted CA • ensure that the Distinguished Name (DN) is present in the grid-mapfile at each site • grid-mapfile entry establishes the connection between a certificate holder and a valid (local) user • For Teragrid machines, go to http://accounts.teragrid.org
GSI: Keys • Private keys • typically stored in directory ~/.globus • should be readable only by user • encrypted via password (pass phrase) • ~/.globus: • usercert.pem: certificate signed by your CA • userkey.pem: encrypted private key file
GSI: A Sample Certificate (1) Certificate: Data: Version: 3 (0x2) Serial Number: 314 (0x13a) Signature Algorithm: md5WithRSAEncryption Issuer: C=US, O=NPACI, OU=SDSC, CN=Certificate Manager/USERID=certman Validity Not Before: Jun 20 22:25:59 2002 GMT Not After : Jun 20 22:25:59 2006 GMT Subject: C=US, O=NPACI, OU=SDSC, CN=Leesa Brieger/USERID=leesa Subject Public Key Info: Public Key Algorithm: rsaEncryption RSA Public Key: (1024 bit)
GSI: A Sample Certificate (2) Modulus (1024 bit): 00:e5:a4:d1:41:34:d0:39:31:e6:02:1a:d9:a2:de: <snip> af:15:9a:17:f3:6c:59:9c:ef Exponent: 65537 (0x10001) X509v3 extensions: X509v3 Basic Constraints: CA:FALSE Netscape Cert Type: SSL Client, S/MIME, Object Signing Netscape Comment: OpenSSL Generated Certificate X509v3 Subject Key Identifier: A8:BD:02:2D:B1:4C:0A:74:B6:9D:6E:57:AB:D0:1F: . . . X509v3 Authority Key Identifier: keyid:66:CC:08:D9:FD:63:0F:CA:A6:24:56:86:E7:0B: . . .
GSI: A Sample Certificate (3) DirName:/C=US/O=NPACI/OU=SDSC/CN=Certificate Manager/USERID=certman serial:00 Signature Algorithm: md5WithRSAEncryption 49:73:c4:ce:e6:9c:53:08:61:4b:2a:bb:02:6e:b5:38:ab:df: <snip> 5e:0f:73:50 -----BEGIN CERTIFICATE----- MIIDtTCCAp2gAwIBAgICATowDQYJKoZIhvcNAQEEBQAwYzELMAkGA1UEBhMCVVMx <snip> Ib2a9AyA9coNnshWg+sWs6xTk0wWXMf4tHiS7dTLPQle1Gav5V4Pc1A= -----END CERTIFICATE-----
GSI: Keys • To use GSI, must enter pass phrase which decrypts the private key • Delegation capability: • single sign-on allows access to all the grid resources without further authorization through creation of aproxy A proxy consists of a new certificate and a new private key (written to /tmp).
GSI: Proxies • Proxy certificate: • contains a new public key • contains owner's identity, modified to indicate that this is a proxy • signed by owner, rather than by a CA • limited lifetime • Proxy’s private key: • not password-encrypted, limited lifetime • goes into /tmp, readable only by user ( Ex: /tmp/x509up_u12345 )
GSI: Some commands • grid-proxy-init: create a full proxy • creates the proxy private key in /tmp • grid-proxy-destroy • destroys the proxy private key in /tmp • grid-proxy-info • examines proxy file in /tmp • grid-cert-info • examines certificate file ~/.globus/usercert.pem • grid-change-pass-phrase
GSI: Some commands (2) GSI-enabled OpenSSH: • gsissh: GSI-enabled ssh • replaces default ssh when there is a proxy • gsiscp: GSI-enabled scp • replaces default scp when there is a proxy Use the –help option to see command usage.
GSI: Some commands (3) grid-change-pass-phrase help grid-change-pass-phrase [-help] [-version] [-file private_key_file] Changes the passphrase that protects the private key. If the –file argument is not given, the default location of the file containing the private key is assumed: -- The location pointed to by X509_USER_KEY -- If X509_USER_KEY not set, /users/science/leesa/.globus/userkey.pem Options -help Displays usage -version Displays version -file location Change passphrase on key stored in the file at the non-standard location 'location'.
GSI: Notes • grid-proxy-info options (some): • -subject: to see DN (grid-mapfile entry) • -text: shows proxy certificate • -timeleft: time remaining till proxy expires • grid-proxy-init command can only be given on a machine where you have your certificate and private key (~/.globus directory) • Not all proxies are created equal • Full/limited/delegate proxies • MyProxy • Depending on configuration of grid services, authentication may require full proxies or may accept limited proxies
GSI: Ongoing R&D • See www.globus.org/research/papers.html • See www.gridforum.org/security
Resource Management • Grid Resource Allocation Management (GRAM) protocol and client API used to start programs on remote resources • Resource Specification Language (RSL) communicates requirements to remote resource • Layered architecture allows resource brokers and co-allocators to be defined in terms of GRAM services
Grid Resource Allocation Management GRAM allows jobs to run remotely. How? • Job is submitted • Request is sent to gatekeeper (server) of the remote computer • Gatekeeper handles the request and creates a jobmanager for the job • Job manager starts and monitors the program, communicating state changes back to the user on the local machine • When remote application terminates, normally or by failing, the job manager terminates as well
GRAM • Gatekeeper • a process running as root on the remote computer, listening at a specific port • single point of entry • authenticates user (mutual authentication with client making the allocation request and then mapping to “local” user) • starts job manager on “local” host (as “local” user) • passes allocation arguments to job manager
GRAM • Job Manager • a gatekeeper service, running as “local” user • one job manager for every request to gatekeeper • layers on top of local resource management system (eg, PBS) • handles all (further) communication with the client about the job
GRAM - States of a job • Unsubmitted - job not yet submitted to the scheduler. No job state callback for this state; introduced for case when job manager is stopped and restarted before the job is submitted. • StageIn - job manager is staging executable, input, or data files to the job. Jobs which do not involve any staging will not enter this state. • Pending - job has been submitted to scheduler, resources not yet allocated for the job. • Active - job has received all its resources; application is executing. • Suspended - job has been stopped temporarily by scheduler. Only some schedulers will cause a job to enter the Suspended state. • StageOut - job manager is staging output files from job manager host to remote storage. Jobs with no staging will not enter this state. • Done - job completed successfully. • Failed - job terminated before completion, as a result of an error, or a user or system cancel.
GRAM Environment Variables GRAM Job Manager provides a minimal environment for jobs: • HOME - user's home directory • LOGNAME - user's login name • X509_USER_PROXY - path to job manager's delegated credential (GSI only) • GLOBUS_GRAM_JOB_CONTACT - job manager's contact string for this job • GLOBUS_GRAM_MYJOB_CONTACT - GRAM MyJob contact string for intrajob communication • GLOBUS_LOCATION - path to the Globus installation on the job manager host
GRAM Environment Variables (2) • X509_CERT_DIR - path to a trusted certificate directory (this variable is set only if the -x509-cert-dir argument is given to job manager) • GLOBUS_GASS_CACHE_DEFAULT - path to the job's GASS cache, where output is sent (if the gass_cache RSL attribute is present) • GLOBUS_TCP_PORT_RANGE - system-specific range of TCP ports usable by the job; Globus I/O honors this range. Only present if the related configuration option is present in the job manager configuration file. • GLOBUS_REMOTE_IO_URL - path to a file containing a URL string of a GASS server which the job may access (if the remote_io_url attribute is present).
Job Submission Interfaces Command line programs: • globus-job-run - remote interactive jobs • globus-submit - remote batch jobs • globusrun - the others are wrappers around this one • globus-url-copy-remote copy • -help to see usage notes Others: Condor-G, HotPage, web portals
globus-job-run: Examples • Ping a resource: %globusrun -a -r tfglobus.sdsc.edu GRAM Authentication test successful • Run a remote command: %globus-job-run tf005i.sdsc.edu /bin/echo ‘$(GLOBUS_LOCATION)’ /usr/local/apps/globus-2.2.3 • Watch out! This is not the same as: %globus-job-run tf005i.sdsc.edu /bin/echo $GLOBUS_LOCATION /usr/local/apps/nmi-2.1 (correct) (incorrect)
globus-job-run: Examples • Can take a look at the remote “globus environment”: %globus-job-run tf005i.sdsc.edu /bin/printenv • The executable can be a shell script: % globus-job-run tf005i.sdsc.edu –s hello.sh Hello from tf005i.sdsc.edu hello.sh: #!/bin/tcsh -f echo -n “Hello from” $GLOBUS_LOCATION/bin/globus-hostname -s “stages” the executable to the remote machine
globus-job-run: Examples Additional functionality beyond ssh • can run jobs remotely when executable resides on local machine via staging • can stage and run jobs across machines (“-:” indicates multi-requests): globus-job-run -args 20 30 \ -: tf005i -s add.sh 1 3 \ -: tg64.ncsa.uiuc.edu -s add.sh Hello from tf005i.sdsc.edu sum is 4 executable = /paci/sdsc/leesa/.globus/.gass_cache/local/… Hello from tg64-u01.ncsa.uiuc.edu sum is 50 executable = /home/ac/leesa/.globus/.gass_cache/local/…
globus-job-run add.sh (from previous example): #!/bin/csh -f echo “ ” echo -n “Hello from ” $GLOBUS_LOCATION/bin/globus-hostname echo -n “sum is ” echo “scale=4; $1+$2” | /usr/bin/bc -l echo executable = $0 echo “ ”
globus-job-run: Examples • Run multiple shell commands: globus-job-run tg64.ncsa.uiuc.edu /bin/sh -c \ “cd my_dir ; ls” • Run several mpi jobs: globus-job-run \ -: tf005i.sdsc.edu -np 64 -s my-aix-exec \ -: tg64.ncsa.uiuc.edu -np 128 -s my-linux-exec • For help: globus-job-run -help
globus-job-submit: Remote batch jobs • For help: globus-job-submit -help • To submit jobs to the remote batch scheduler (tfglobus.sdsc.edu): %globus-job-submit \ tfglobus.sdsc.edu/jobmanager-batch \ -queue normal -np 4 /paci/sdsc/leesa/mpi/little https://tf004i.sdsc.edu:44864/68982/1047069851/ ( jobID in response to submission )
globus-job-submit • Use jobID to check on job status: globus-job-statushttps://tf004i.sdsc.edu:44864/68982/1047069851 PENDING …ACTIVE…DONE • Use jobID to retrieve output or cancel job globus-job-get-output \ https://tf004i.sdsc.edu:44864/68982/1047069851 globus-job-cancel \ https://tf004i.sdsc.edu:44864/68982/1047069851 • Use jobID to clean up cached output from job (on remote machine): globus-job-clean https://tf004i.sdsc.edu:44864/68982/1047069851
globus-job-submit – variations • On dtf-login, things are somewhat different: % globus-job-submit dtf-login/jobmanager-pbs \ -np 6 /users/leesa/mpi/little https://dtf-login1.sdsc.teragrid.org:35764/14629/1047083644/ Watch out! Incompatibilities between different versions of Globus may affect how/whether globus-job-get-output works.
globusrun • Runs scripts written in the globus Resource Specification Language (RSL) • RSL provides information to job manager: • resource requirements: machine type, number of nodes, memory, etc • job configuration: directory, executable, arguments, environment variables