780 likes | 923 Views
High-Performance Grid Computing and Research Networking. Grid Computing. Presented by Selim Kalayci Instructor: S. Masoud Sadjadi http://www.cs.fiu.edu/~sadjadi/Teaching/ sadjadi At cs Dot fiu Dot edu. Acknowledgements.
E N D
High-Performance Grid Computing and Research Networking Grid Computing Presented by Selim Kalayci Instructor: S. Masoud Sadjadi http://www.cs.fiu.edu/~sadjadi/Teaching/ sadjadi At cs Dot fiu Dot edu
Acknowledgements • The content of many of the slides in this lecture notes have been adopted from the online resources prepared previously by the people listed below. Many thanks! • Henri Casanova • Principles of High Performance Computing • http://navet.ics.hawaii.edu/~casanova • henric@hawaii.edu • Ian Foster • Presentations&Tutorials from • www.globus.org
Agenda • Grid Computing • Grid Middleware - Globus • Security in Globus • Data Management • Execution Management • Monitoring • Metaschedulers - Gridway
Multiple Computers • Adding CPUs to a single computer becomes very expensive • How about multiple computers together? • Linux Clusters (60% of Top-500 list) Blue/Gene: 30K computers
Campus Machine Room Nation Beyond the machine room? • Need more capacity than available at (most) single sites • Everyone would like a 10K-node 100GHz cluster • Very expensive (cooling, power) • More economical to have multiple sites • Need to locate available resources now • Data/Instruments are inherently distributed
A dynamicmulti-institutional network of computers that come together to share resources for the purpose of coordinatedproblem solving. Grid Computing resource application institutional boundary • Achieved through: • Open general-purpose protocols • Standard interfaces
A Grid Checklist • coordinates resources that are not subject to centralized control … • … using standard, open, general-purpose protocols and interfaces … • … to deliver nontrivial qualities of service. • Virtual Organizations • Group of individuals or institutions defined by sharing rules to share the resources of “Grid” for a common goal. • Example: Application service providers, storage service providers, databases, crisis management team, consultants.
How is a grid different? • Grids focus on site autonomy • Grids involve heterogeneity • Grids involve more resources than just computers and networks • Grids focus on the user
Agenda • Grid Computing • Grid Middleware - Globus • Security in Globus • Data Management • Execution Management • Monitoring • Metaschedulers - Gridway
Grid Infrastructure • Distributed management • Of physical resources • Of software services • Of communities and their policies • Unified treatment • Build on Web services framework • Use WS-RF, WS-Notification (or WS-Transfer/Man) to represent/access state • Common management abstractions & interfaces
Globus is Open Source Grid Infrastructure • Implement key Web services standards • State, notification, security, … • Software for Grid infrastructure • Service-enable new & existing resources • E.g., GRAM on computer, GridFTP on storage system, custom application services • Uniform abstractions & mechanisms • Tools to build applications that exploit Grid infrastructure • Registries, security, data management, … • Enabler of a rich tool & service ecosystem
GLOBUS TOOLKIT 4 – GT4 • Open source toolkit developed by The Globus Alliance that allows us to build Grid applications. • Organized as a collection of loosely coupled components. • Consists of services, programming libraries, and development tools.
GT Domain Areas • Core runtime • Infrastructure for building new services • Security • Apply uniform policy across distinct systems • Execution management • Provision, deploy, & manage services • Data management • Discover, transfer, & access large data • Monitoring • Discover & monitor dynamic services
WSRF & WS-Notification • Naming and bindings (basis for virtualization) • Every resource can be uniquely referenced, and has one or more associated services for interacting with it • Lifecycle (basis for fault resilient state mgmt) • Resources created by services following factory pattern • Resources destroyed immediately or scheduled • Information model (basis for monitoring, discovery) • Resource properties associated with resources • Operations for querying and setting this info • Asynchronous notification of changes to properties • Service groups (basis for registries, collective svcs) • Group membership rules & membership management • Base Fault type
Agenda • Grid Computing • Grid Middleware - Globus • Security in Globus • Data Management • Execution Management • Monitoring • Metaschedulers - Gridway
Security Services • Forms the underlying communication medium for all the services • Secure Authentication and Authorization • Single Sign-on • User need not explicitly authenticate himself every time a service is requested • Uniform Credentials • Ex: GSI (Globus Security Infrastructure)
Grid Security Infrastructure - GSI • Grid Security Infrastructure (GSI) • Use GSI as a standard mechanism for bridging disparate security mechanisms • Doesn’t solve trust problem, but now things talk same protocol and understand each other’s identity credentials • Basic support for delegation, policy distribution • Translate from other mechanisms to/from GSI as needed • Convert from GSI identity to local identity for authorization
Grid Security Infrastructure - GSI • Grid Security Infrastructure (GSI) • Based on standard PKI technologies • CAs allow one-way, light-weight trust relationships (not just site-to-site) • SSL protocol or WS-Security for authentication, message protection • X.509 Certificates for asserting identity • for users, services, hosts, etc. • Proxy Certificates • GSI extension to X.509 certificates for delegation, single sign-on
Gridmap file • A gridmap file at each site maps the grid id of a user to a local id • The grid id of the user is his/her subject in the grid user certificate • The local id is site-specific; • multiple grid ids can be mapped to a single local id • Usually a local id exists for each VO participating in that grid effort • The local ids are then used to implement site specific policies • Priorities etc.
Gridmap file entry • The gridmap-file is maintained by the site administrator • Each entry maps a Grid DN (distinguished name of the user; subject name) to local user names # #Distinguished Name Local username # “/DC=org/DC=doegrids/OU=People/CN=Laukik Chitnis 712960” ivdgl “/DC=org/DC=doegrids/OU=People/CN=Richard Cavanaugh 710220” grid3 “/DC=org/DC=doegrids/OU=People/CN=JangUk In 712961” ivdgl “/DC=org/DC=doegrids/OU=People/CN=Jorge Rodriguez 690211” osg
How to create and use an Identity (1) • Run the below command to generate a personal grid identity certificate. grid-cert-request • This will create the following files in $HOME/.globususercert_request.pem (request to sign certificate)userkey.pem (private key - encrypted)usercert.pem (public key - signed)
How to create and use an Identity (2) • After you have created the request then you need to mail it to the local certificate authority: cat $HOME/.globus/usercert_request.pem | mail skala001@cis.fiu.edu (or dvill013@cs.fiu.edu) • Then the CA will mail you back a signed certificate which you will want to put into $HOME/.globus/usercert.pem(it can take up to a day for the CA to process the request)
Commands to log in / logout • grid-proxy-init • This "logs you into" the globus system. • grid-proxy-info • Use this to see your status. • grid-proxy-destroy • Use this to log out. • A proxy is like a temporary ticket to use the Grid, default in the above case being 12 hours. • Once this is done, you should be able to run “grid jobs” • globus-job-run site-name command
Agenda • Grid Computing • Grid Middleware - Globus • Security in Globus • Data Management • Execution Management • Monitoring • Metaschedulers - Gridway
GT4 Data Management • Stage/move large data to/from nodes • GridFTP, Reliable File Transfer (RFT) • Alone, and integrated with GRAM • Locate data of interest • Replica Location Service (RLS) • Replicate data for performance/reliability • Distributed Replication Service (DRS) • Provide access to diverse data sources • File systems, parallel file systems, hierarchical storage: GridFTP • Databases: OGSA DAI
GridFTP • What is GridFTP? • A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol • A Protocol • Multiple independent implementations can interoperate • This works. Both the Condor Project at Uwisand Fermi Lab have home grown servers that work with ours. • Lots of people have developed clients independent of the Globus Project. • We also supply a reference implementation: • Server • Client tools (globus-url-copy) • Development Libraries
Globus-url-copy • GridFTP-compliant client from the Globus team • Copy files from one URL to another URL • One URL is usually a gsiftp:// URL • Another URL is usually a file:/ URL • To move a file from remote GridFTP-enabled server to local machine % globus-url-copy gsiftp://gcb.fiu.edu/tmp/jt file:/home/skala001/jt • To put file onto server reverse URLs % globus-url-copy file:/home/skala001/jt gsiftp://gcb.fiu.edu/tmp/jt • Monitor performance using –vb flag % globus-url-copy -vb gsiftp://gcb.fiu.edu/tmp/jt file:/home/skala001/jt
Reliable File Transfer - RFT • WSRF compliant Fault-tolerant, High- performance data transfer service • Soft state. • Notifications/Query • Reliability on top of high performance provided by GridFTP. • Fire and Forget. • Integrated Automatic Failure Recovery. • Network level failures. • System level failures etc. • Essentially a Data transfer scheduler with FIFO as a Queue Policy.
IPCReceiver DataChannel DataChannel MasterDSI SlaveDSI Protocol Interpreter SlaveDSI Protocol Interpreter Data Channel MasterDSI IPCReceiver Data Channel IPC Link IPC Link RFT RFT Client SOAP Messages Notifications(Optional) RFT Service GridFTP Server GridFTP Server
Agenda • Grid Computing • Grid Middleware - Globus • Security in Globus • Data Management • Execution Management • Monitoring • Metaschedulers - Gridway
Execution Management • Common WS interface to schedulers • Unix, Condor, LSF, PBS, SGE, … • More generally: interface for process execution management • Lay down execution environment • Stage data • Monitor & manage lifecycle • Kill it, clean up • A basis for application-driven provisioning
Grid Job Management Goals Provide a service to securely: • Create an environment for a job • Stage files to/from environment • Cause execution of job process(es) • Via various local resource managers • Monitor execution • Signal important state changes to client • Enable client access to output files • Streaming access during execution
GRAM • GRAM:Globus Resource Allocation and Management • GRAM is a Globus Toolkit component • For Grid jobmanagement • GRAM is a unifying remote interface to Resource Managers • Yet preserves local site security/control • GRAM is for stateful job control • Reliable operation • Asynchronous monitoring and control • Remote credential management • File staging via RFT and GridFTP
GT4 WS GRAM Architecture Service host(s) and compute element(s) Job events SEG GT4 Java Container Compute element GRAM services Local job control GRAM services Local scheduler Job functions sudo GRAM adapter Delegate Transfer request Client Delegation Delegate GridFTP User job RFT File Transfer FTP control FTP data Remote storage element(s) GridFTP
GT4 WS GRAM Architecture Service host(s) and compute element(s) Job events SEG GT4 Java Container Compute element GRAM services Local job control GRAM services Local scheduler Job functions sudo GRAM adapter Delegate Transfer request Client Delegation Delegate GridFTP User job RFT File Transfer FTP control FTP data Remote storage element(s) GridFTP Delegated credential can be: Made available to the application
GT4 WS GRAM Architecture Service host(s) and compute element(s) Job events SEG GT4 Java Container Compute element GRAM services Local job control GRAM services Local scheduler Job functions sudo GRAM adapter Delegate Transfer request Client Delegation Delegate GridFTP User job RFT File Transfer FTP control FTP data Remote storage element(s) GridFTP Delegated credential can be: Used to authenticate with RFT
GT4 WS GRAM Architecture Service host(s) and compute element(s) Job events SEG GT4 Java Container Compute element GRAM services Local job control GRAM services Local scheduler Job functions sudo GRAM adapter Delegate Transfer request Client Delegation Delegate GridFTP User job RFT File Transfer FTP control FTP data Remote storage element(s) GridFTP Delegated credential can be: Used to authenticate with GridFTP
A Simple Example • Command example: % globusrun-ws -submit -c /bin/date Submitting job...Done.Job ID: uuid:002a6ab8-6036-11d9-bae6-0002a5ad41e5Termination time: 01/07/2005 22:55 GMTCurrent job state: ActiveCurrent job state: CleanUpCurrent job state: DoneDestroying job...Done. • A successful submission will create a new ManagedJob resource with its own unique EPR for messaging • Use –o option to create the EPR file % globusrun-ws -submit –o job.epr -c /bin/date
A Simple Example(2) • To see the output, use –s (stream) option % globusrun-ws -submit –s -c /bin/date Termination time: 06/14/2007 18:07 GMT Current job state: Active Current job state: CleanUp-Hold Wed Jun 13 14:07:54 EDT 2007 Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done. • If you want to send the output to a file, use –so option % globusrun-ws -submit –s –so job.out -c /bin/date … % cat job.out Wed Jun 13 14:07:54 EDT 2007
A Simple Example(3) Submitting your job to different schedulers • Fork % globusrun-ws -submit -Ft Fork -s -c /bin/date (Actually, the default is Fork. So, you can skip it in this case.) • SGE % globusrun-ws -submit -Ft SGE -s -c /bin/hostname
Batch Job Submissions % globusrun-ws -submit -batch -o job_epr -c /bin/sleep 50Submitting job...Done.Job ID: uuid:f9544174-60c5-11d9-97e3-0002a5ad41e5Termination time: 01/08/2005 16:05 GMT % globusrun-ws -status -j job_eprCurrent job state: Active % globusrun-ws -status -j job_eprCurrent job state: Done % globusrun-ws -kill -j job_eprRequesting original job description...Done.Destroying job...Done.
Complete Factory Contact • Override default EPR • Select a different host/service • Use “contact” shorthand for convenience • Relies on proprietary knowledge of EPR format! • Command example: % globusrun-ws -submit –F gcb.fiu.edu\-c /bin/date
Read RSL from File • Command: % globusrun-ws -submit -f touch.xml • Contents of touch.xml file: <job> <executable>/bin/touch</executable> <argument>touched_it</argument></job>
Resource Specification Language (RSL) • RSL is the language used by the clients to submit a job. • All job submission requests are described in RSL, including the executable file and arguments. • You can specify the type and capabilities of resources to execute your job. • You can also coordinate Stage-in and Stage-out operations through RSL.
Common/useful options • globusrun-ws -J • Perform delegation as necessary for job • globusrun-ws -S • Perform delegation as necessary for job’s file staging • globusrun-ws -s • Stream stdout/err during job execution to the terminal • globusrun-ws -self • Useful for testing, when you have started the service using your credentials instead of host credentials
Staging job <job><executable>/bin/echo</executable><directory>/tmp</directory><argument>Hello</argument><stdout>job.out</stdout><stderr>job.err</stderr><fileStageOut> <transfer> <sourceUrl>file:///tmp/job.out</sourceUrl> <destinationUrl> gsiftp://host.domain:2811/tmp/stage.out </destinationUrl> </transfer></fileStageOut> </job>
RSL Variable • Enables late binding of values • Values resolved by GRAM service • System-specific variables • ${GLOBUS_USER_HOME} • ${GLOBUS_LOCATION} • ${GLOBUS_SCRATCH_DIR} • Alternative directory that is shared with compute node • Typically providing more space than user’s HOME dir
RSL Variable Example <job><executable>/bin/echo</executable><argument>HOME is ${GLOBUS_USER_HOME}</argument><argument>SCRATCH = ${GLOBUS_SCRATCH_DIR}</argument><argument>GL is ${GLOBUS_LOCATION}</argument><stdout>${GLOBUS_USER_HOME}/echo.stdout</stdout><stderr>${GLOBUS_USER_HOME}/echo.stderr</stderr> </job> !!!/tmp/rslExample