1 / 70

A Look at Globus Grids

A Look at Globus Grids. Leesa Brieger. What is a GRID?. A GRID must: Coordinate resources that are not under centralized control Use standard, open, general-purpose protocols and interfaces Deliver non-trivial quality of service. Not a grid:. a cluster a network

matt
Download Presentation

A Look at Globus Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Look at Globus Grids Leesa Brieger

  2. What is a GRID? A GRID must: • Coordinate resources that are not under centralized control • Use standard, open, general-purpose protocols and interfaces • Deliver non-trivial quality of service

  3. Not a grid: • a cluster • a network • a network-attached device These are important components of a grid, not the grid itself.

  4. Why a GRID? • Construction and mining of large (distributed) databases • National Virtual Observatory - widespread access to and use of astronomical data collections; mosaicking services • Protein Data Base - single worldwide repository for the processing and distribution of 3-D biological macromolecular structure data; CE portal - allows structural comparison of proteins • Develop community resources (computational and data) • Large Hadron Collider particle physics experiments from CERN • Biomedical Informatics Research Network - to develop a "protocol" for collaborative research among neuroscientists and medical scientists in the neurosciences

  5. Why a GRID? • Access specialized devices remotely • Brain imaging – data acquisition in situ, fast imaging on a (distant) supercomputer • Networks of sensors to accumulate field measurements for environmental studies • Develop simulations and analyses • ENZO astronomy project • Outsourcing becomes feasible • Geo-Services rents computing time from IBM

  6. GRID Technology A GRID must: • allow the establishment and evolution of virtual organizations • manage security, trust, privacy • manage and monitor distributed resources • Computers, databases, networks, storage, software • federate data • discover available data, resources, services And all in a robust, secure, scalable, interoperable fashion.

  7. How to enable Grid Technology? With APIs, standard protocols, services: • Protocol-mediated access to grid resources • Services must “speak” intergrid protocols • APIs form interface to grid protocols and facilitate application development

  8. Open standards, so all can participate Global Grid Forum (GGF) - standards organization (www.gridforum.org) Open Grid Services Architecture (OGSA) – standard interface definitions (in evolution) Open source to reduce barriers to entry The Globus Project (www.globus.org)

  9. Globus Toolkit • Open-source reference software base for developing Grid infrastructure and apps • Implements GGF standards • Service-oriented • Services can be decoupled from any fixed resource • A service consumes resources, but how is not most important • A better base abstraction for managing dependability, end-to-end quality of service

  10. Globus Toolkit • Vendor-neutral • Moving over to OGSA structure (GT3) • embrace web services technologies • standard interfaces and behaviors for distributed system management • unify resources/services/information • easier integration with hosting environment and improved extensibility • leverage commercial efforts

  11. identity & authentication authorization & policy resource/service discovery resource allocation (co-)reservation, workflow remote data access rapid data transfer monitoring intrusion detection resource management accounting fault management system evolution Grid Requirements

  12. Layered Grid Architecture • Fabric Layer - provides the local services of a resource: • computational, storage, network • Connective Layer - core communication and authentication protocols • Enables exchange of data between fabric layer resources • Security and authentication important here

  13. Layered Grid Architecture (2) • Resource Layer – enables resource sharing • Builds on connectivity layer to control and access resources (Ex: data servers) • Collective Layer - coordinates interactions across multiple resources • Ties multiple resources and services together (Ex: metacatalogues) • Application Layer - user applications use collective, resource, and connective layers to perform grid operations in a virtual organization

  14. Layered Grid Architecture Application • Specialized services: user- or application-specific distributed services • Managing multiple resources: ubiquitous infrastructure services • Sharing resources: negotiating access, controlling use • Talking to things: communication (Internet protocols) and security • Controlling things locally: access to and control of resources User Collective Resource Connectivity Fabric

  15. Globus Protocols - Connectivity Layer Grid Security Infrastructure (GSI): • Authentication/authorization, message protection across institutions • Single sign-on, delegation, identity mapping • Public key technology • Certificate authorities, certificate & key management

  16. Globus Protocols - Resource Layer • Grid Resource Allocation Management (GRAM) • Remote allocation, control of compute resources • Furnishes information on state of the resources to the Metacomputing Directory Service (MDS) • GridFTP • High-performance data access and transport • Grid Resource Information Service (GRIS) • Access to structure and state info (MDS) • All built on connectivity layer

  17. Globus Protocols - Collective Layer • Metadirectory services • SRB MCAT metacatalogue • Resource brokers • Condor Matchmaker • Co-reservation/co-allocation services • Workflow management services

  18. Grid Security Infrastructure (GSI) Public key cryptography (asymmetric cryptography): • Encryption relies on two keys, related mathematically so that if either key encrypts a message, the other must be used to decrypt it • One key is public, the other is kept private • A user proves own identity by encrypting a message; if the public key can decrypt, the user is indeed holding the private key • No password is ever exchanged

  19. GSI: Certificates • Globus uses the X.509 certification system to provide authentication services. • X.509certificates identify and authenticate users and services on the grid. • Certificates contain: • subject name: entity represented by the certificate • public key belonging to the subject • identity of a Certificate Authority (CA) that has signed (issued) the certificate certifying that both public key and identity belong to the subject • digital signature of the named CA

  20. GSI: Certificates A Certificate Authority’s purpose is to offer a third-party verification of the link between a public key and the Common Name (CN) in a certificate. A certificate establishes/verifies the connection between a public key and a Distinguished Name (DN) at any site.

  21. GSI: Authentication Mutual Authentication between A & B: • Each party must trust the CA who signed the other party’s certificate • A sends B his certificate • B checks the certificate for a valid signature of the CA to see if it really came from a trusted CA • B generates random message, asks A to encrypt, and decrypts it using A’s public key • If decryption works correctly, B now trusts A • Same procedure in reverse for A to establish trust in B

  22. GSI: User Setup To access grid resources using GSI, must: • hold a valid account on the login resources • procure a certificate and private key from a trusted CA • ensure that the Distinguished Name (DN) is present in the grid-mapfile at each site • grid-mapfile entry establishes the connection between a certificate holder and a valid (local) user • For Teragrid machines, go to http://accounts.teragrid.org

  23. GSI: Keys • Private keys • typically stored in directory ~/.globus • should be readable only by user • encrypted via password (pass phrase) • ~/.globus: • usercert.pem: certificate signed by your CA • userkey.pem: encrypted private key file

  24. GSI: A Sample Certificate (1) Certificate: Data: Version: 3 (0x2) Serial Number: 314 (0x13a) Signature Algorithm: md5WithRSAEncryption Issuer: C=US, O=NPACI, OU=SDSC, CN=Certificate Manager/USERID=certman Validity Not Before: Jun 20 22:25:59 2002 GMT Not After : Jun 20 22:25:59 2006 GMT Subject: C=US, O=NPACI, OU=SDSC, CN=Leesa Brieger/USERID=leesa Subject Public Key Info: Public Key Algorithm: rsaEncryption RSA Public Key: (1024 bit)

  25. GSI: A Sample Certificate (2) Modulus (1024 bit): 00:e5:a4:d1:41:34:d0:39:31:e6:02:1a:d9:a2:de: <snip> af:15:9a:17:f3:6c:59:9c:ef Exponent: 65537 (0x10001) X509v3 extensions: X509v3 Basic Constraints: CA:FALSE Netscape Cert Type: SSL Client, S/MIME, Object Signing Netscape Comment: OpenSSL Generated Certificate X509v3 Subject Key Identifier: A8:BD:02:2D:B1:4C:0A:74:B6:9D:6E:57:AB:D0:1F: . . . X509v3 Authority Key Identifier: keyid:66:CC:08:D9:FD:63:0F:CA:A6:24:56:86:E7:0B: . . .

  26. GSI: A Sample Certificate (3) DirName:/C=US/O=NPACI/OU=SDSC/CN=Certificate Manager/USERID=certman serial:00 Signature Algorithm: md5WithRSAEncryption 49:73:c4:ce:e6:9c:53:08:61:4b:2a:bb:02:6e:b5:38:ab:df: <snip> 5e:0f:73:50 -----BEGIN CERTIFICATE----- MIIDtTCCAp2gAwIBAgICATowDQYJKoZIhvcNAQEEBQAwYzELMAkGA1UEBhMCVVMx <snip> Ib2a9AyA9coNnshWg+sWs6xTk0wWXMf4tHiS7dTLPQle1Gav5V4Pc1A= -----END CERTIFICATE-----

  27. GSI: Keys • To use GSI, must enter pass phrase which decrypts the private key • Delegation capability: • single sign-on allows access to all the grid resources without further authorization through creation of aproxy A proxy consists of a new certificate and a new private key (written to /tmp).

  28. GSI: Proxies • Proxy certificate: • contains a new public key • contains owner's identity, modified to indicate that this is a proxy • signed by owner, rather than by a CA • limited lifetime • Proxy’s private key: • not password-encrypted, limited lifetime • goes into /tmp, readable only by user ( Ex: /tmp/x509up_u12345 )

  29. GSI: Some commands • grid-proxy-init: create a full proxy • creates the proxy private key in /tmp • grid-proxy-destroy • destroys the proxy private key in /tmp • grid-proxy-info • examines proxy file in /tmp • grid-cert-info • examines certificate file ~/.globus/usercert.pem • grid-change-pass-phrase

  30. GSI: Some commands (2) GSI-enabled OpenSSH: • gsissh: GSI-enabled ssh • replaces default ssh when there is a proxy • gsiscp: GSI-enabled scp • replaces default scp when there is a proxy Use the –help option to see command usage.

  31. GSI: Some commands (3) grid-change-pass-phrase help grid-change-pass-phrase [-help] [-version] [-file private_key_file] Changes the passphrase that protects the private key. If the –file argument is not given, the default location of the file containing the private key is assumed: -- The location pointed to by X509_USER_KEY -- If X509_USER_KEY not set, /users/science/leesa/.globus/userkey.pem Options -help Displays usage -version Displays version -file location Change passphrase on key stored in the file at the non-standard location 'location'.

  32. GSI: Notes • grid-proxy-info options (some): • -subject: to see DN (grid-mapfile entry) • -text: shows proxy certificate • -timeleft: time remaining till proxy expires • grid-proxy-init command can only be given on a machine where you have your certificate and private key (~/.globus directory) • Not all proxies are created equal • Full/limited/delegate proxies • MyProxy • Depending on configuration of grid services, authentication may require full proxies or may accept limited proxies

  33. GSI: Ongoing R&D • See www.globus.org/research/papers.html • See www.gridforum.org/security

  34. Resource Management • Grid Resource Allocation Management (GRAM) protocol and client API used to start programs on remote resources • Resource Specification Language (RSL) communicates requirements to remote resource • Layered architecture allows resource brokers and co-allocators to be defined in terms of GRAM services

  35. Grid Resource Allocation Management GRAM allows jobs to run remotely. How? • Job is submitted • Request is sent to gatekeeper (server) of the remote computer • Gatekeeper handles the request and creates a jobmanager for the job • Job manager starts and monitors the program, communicating state changes back to the user on the local machine • When remote application terminates, normally or by failing, the job manager terminates as well

  36. GRAM • Gatekeeper • a process running as root on the remote computer, listening at a specific port • single point of entry • authenticates user (mutual authentication with client making the allocation request and then mapping to “local” user) • starts job manager on “local” host (as “local” user) • passes allocation arguments to job manager

  37. GRAM • Job Manager • a gatekeeper service, running as “local” user • one job manager for every request to gatekeeper • layers on top of local resource management system (eg, PBS) • handles all (further) communication with the client about the job

  38. GRAM - States of a job • Unsubmitted - job not yet submitted to the scheduler. No job state callback for this state; introduced for case when job manager is stopped and restarted before the job is submitted. • StageIn - job manager is staging executable, input, or data files to the job. Jobs which do not involve any staging will not enter this state. • Pending - job has been submitted to scheduler, resources not yet allocated for the job. • Active - job has received all its resources; application is executing. • Suspended - job has been stopped temporarily by scheduler. Only some schedulers will cause a job to enter the Suspended state. • StageOut - job manager is staging output files from job manager host to remote storage. Jobs with no staging will not enter this state. • Done - job completed successfully. • Failed - job terminated before completion, as a result of an error, or a user or system cancel.

  39. GRAM Environment Variables GRAM Job Manager provides a minimal environment for jobs: • HOME - user's home directory • LOGNAME - user's login name • X509_USER_PROXY - path to job manager's delegated credential (GSI only) • GLOBUS_GRAM_JOB_CONTACT - job manager's contact string for this job • GLOBUS_GRAM_MYJOB_CONTACT - GRAM MyJob contact string for intrajob communication • GLOBUS_LOCATION - path to the Globus installation on the job manager host

  40. GRAM Environment Variables (2) • X509_CERT_DIR - path to a trusted certificate directory (this variable is set only if the -x509-cert-dir argument is given to job manager) • GLOBUS_GASS_CACHE_DEFAULT - path to the job's GASS cache, where output is sent (if the gass_cache RSL attribute is present) • GLOBUS_TCP_PORT_RANGE - system-specific range of TCP ports usable by the job; Globus I/O honors this range. Only present if the related configuration option is present in the job manager configuration file. • GLOBUS_REMOTE_IO_URL - path to a file containing a URL string of a GASS server which the job may access (if the remote_io_url attribute is present).

  41. Job Submission Interfaces Command line programs: • globus-job-run - remote interactive jobs • globus-submit - remote batch jobs • globusrun - the others are wrappers around this one • globus-url-copy-remote copy • -help to see usage notes Others: Condor-G, HotPage, web portals

  42. globus-job-run: Examples • Ping a resource: %globusrun -a -r tfglobus.sdsc.edu GRAM Authentication test successful • Run a remote command: %globus-job-run tf005i.sdsc.edu /bin/echo ‘$(GLOBUS_LOCATION)’ /usr/local/apps/globus-2.2.3 • Watch out! This is not the same as: %globus-job-run tf005i.sdsc.edu /bin/echo $GLOBUS_LOCATION /usr/local/apps/nmi-2.1 (correct) (incorrect)

  43. globus-job-run: Examples • Can take a look at the remote “globus environment”: %globus-job-run tf005i.sdsc.edu /bin/printenv • The executable can be a shell script: % globus-job-run tf005i.sdsc.edu –s hello.sh Hello from tf005i.sdsc.edu hello.sh: #!/bin/tcsh -f echo -n “Hello from” $GLOBUS_LOCATION/bin/globus-hostname -s “stages” the executable to the remote machine

  44. globus-job-run: Examples Additional functionality beyond ssh • can run jobs remotely when executable resides on local machine via staging • can stage and run jobs across machines (“-:” indicates multi-requests): globus-job-run -args 20 30 \ -: tf005i -s add.sh 1 3 \ -: tg64.ncsa.uiuc.edu -s add.sh Hello from tf005i.sdsc.edu sum is 4 executable = /paci/sdsc/leesa/.globus/.gass_cache/local/… Hello from tg64-u01.ncsa.uiuc.edu sum is 50 executable = /home/ac/leesa/.globus/.gass_cache/local/…

  45. globus-job-run add.sh (from previous example): #!/bin/csh -f echo “ ” echo -n “Hello from ” $GLOBUS_LOCATION/bin/globus-hostname echo -n “sum is ” echo “scale=4; $1+$2” | /usr/bin/bc -l echo executable = $0 echo “ ”

  46. globus-job-run: Examples • Run multiple shell commands: globus-job-run tg64.ncsa.uiuc.edu /bin/sh -c \ “cd my_dir ; ls” • Run several mpi jobs: globus-job-run \ -: tf005i.sdsc.edu -np 64 -s my-aix-exec \ -: tg64.ncsa.uiuc.edu -np 128 -s my-linux-exec • For help: globus-job-run -help

  47. globus-job-submit: Remote batch jobs • For help: globus-job-submit -help • To submit jobs to the remote batch scheduler (tfglobus.sdsc.edu): %globus-job-submit \ tfglobus.sdsc.edu/jobmanager-batch \ -queue normal -np 4 /paci/sdsc/leesa/mpi/little https://tf004i.sdsc.edu:44864/68982/1047069851/ ( jobID in response to submission )

  48. globus-job-submit • Use jobID to check on job status: globus-job-statushttps://tf004i.sdsc.edu:44864/68982/1047069851 PENDING …ACTIVE…DONE • Use jobID to retrieve output or cancel job globus-job-get-output \ https://tf004i.sdsc.edu:44864/68982/1047069851 globus-job-cancel \ https://tf004i.sdsc.edu:44864/68982/1047069851 • Use jobID to clean up cached output from job (on remote machine): globus-job-clean https://tf004i.sdsc.edu:44864/68982/1047069851

  49. globus-job-submit – variations • On dtf-login, things are somewhat different: % globus-job-submit dtf-login/jobmanager-pbs \ -np 6 /users/leesa/mpi/little https://dtf-login1.sdsc.teragrid.org:35764/14629/1047083644/ Watch out! Incompatibilities between different versions of Globus may affect how/whether globus-job-get-output works.

  50. globusrun • Runs scripts written in the globus Resource Specification Language (RSL) • RSL provides information to job manager: • resource requirements: machine type, number of nodes, memory, etc • job configuration: directory, executable, arguments, environment variables

More Related