1 / 68

第四章 资源管理

第四章 资源管理. 龚 斌 山东大学计算机科学与技术学院 山东省高性能计算中心. Globus 与资源规范语言 RSL. Globus 的资源管理. RSL specialization. Broker. RSL. Application. Information Service. Queries. & Info. Ground RSL. Co-allocator. Simple ground RSL. GRAM. GRAM. GRAM. LSF. Condor. SGEEE. Globus RMS. User Proxy Cert.

garret
Download Presentation

第四章 资源管理

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 第四章 资源管理 龚 斌 山东大学计算机科学与技术学院 山东省高性能计算中心

  2. Globus与资源规范语言RSL

  3. Globus的资源管理

  4. RSL specialization Broker RSL Application Information Service Queries & Info Ground RSL Co-allocator Simple ground RSL GRAM GRAM GRAM LSF Condor SGEEE Globus RMS

  5. User Proxy Cert X509 User Cert Machines RSL string mpirun grid-proxy-init globusrun RSL parser DUROC GRAM Client GRAM Client GSI GSI GASS Server GRAM Job Manager GRAM Job Manager GRAM Gatekeeper GRAM Gatekeeper GSI GASS Client GASS Client GSI App App Nexus Nexus AIX MPI Solaris MPI Globus Components In Action Local Machine RSL multi-request RSL single request PBS Unix Fork Remote Machine Remote Machine

  6. GRAM(Globus Resource Allocation Manager) Overview • 定位:资源管理的最低层 • 功能:远程运行作业,通过提供的API提交,检测与终止作业 • GRAM的具体职责 • 处理Resource Specification Language (RSL)形式的作业请求 • 对创建的作业进行远程监控与管理 • 更新MDS的信息

  7. GSI GSI GSI GSI Globus Pre-WS Component Interaction Diagram GRAM: Grid Resource Allocation Manager GASS: Global Access to Secondary Storage(辅助存储全局访问) MDS: Monitoring and Discovery Service GRIS: Grid Resource Information Service GIIS: Grid Index Information Service From IBM Redbook SG24-6895-012003: Intro to Grid Computing

  8. GRAM • Service that provides remote execution and status management of the request • When a job is submitted by a client, the request is sent to the remote host and handled by the gatekeeper daemon located in the remote host. • Then the gatekeeper creates a job manager to start and monitor the job. • When the job is finished, the job manager sends the status information back to the client and terminates.

  9. GRAM Architecture From IBM Redbook SG24-6895-012003: Intro to Grid Computing

  10. GRAM Elements • Clients • Gatekeeper daemon门户监护进程 • Job Manager • Global Access to Secondary Storage (GASS)辅助存储全局访问 • Dynamically-Updated Request Online Coallocator (DUROC)动态更新请求在线协同分配器 • User Resource Specification Language (RSL)

  11. GRAM Clients • Three clients: globusrun globus-job-run globus-job-submit

  12. GRAM管理流程图示 Client API Job Request state change callback Gatekeeper Job cancel fork/su/exec Job Manager Scheduler Specific Plugin fork/exec/wait spsubmit/spq condor,lsf Job Process

  13. gatekeeper的作用 • gatekeeper:A process, running as root, which begins the process of handling allocation requests • performing mutual authentication of user and resource, • determining a local user name for the remote user, • starting a job manager which executes as that local user and actually handles the request. • In order to start the job manager, the gatekeeper must run as a privileged program

  14. 相关名词解释 • Resource • An entity capable of running one or more processes on behalf of a user • Client • The process that is using the resource allocation client-side API • Job • A process or set of processes resulting from a job request. • Job Request • A request to gatekeeper to create one or more job processes, expressed in the supplied Resource Specification Language. • Job Manager • One job manager is created by the gatekeeper to fulfill every request submitted to the gatekeeper.

  15. GRAM调度与状态转换模型

  16. 对各个阶段的解释 • Unsubmitted:The job has not yet been submitted to the scheduler • StageIn:The job manager is staging executable, input, or data files to the job • Pending:The job has been submitted to the scheduler, but resources have not yet been allocated for the job. • Active:The job has received all of it's resources, and the application is executing • Suspended:The job has been stopped temporarily by the scheduler • StageOut:The job manager is staging output files from the job manager host to remote storage. • Done:The job completed successfully. • Failed:The job terminated before completion, as a result of an error, or a user or system cancel.

  17. GRAM Components MDS client API calls to locate resources Client MDS: Grid Index Info Server Site boundary MDS client API calls to get resource info GRAM client API calls to request resource allocation and process creation. MDS: Grid Resource Info Server Query current status of resource GRAM client API state change callbacks Globus Security Infrastructure Local Resource Manager Allocate & create processes Request Job Manager Create Gatekeeper Process Parse Monitor & control Process RSL Library Process

  18. DUROC(Dynamically-Updated Request Online Co-allocator) • Simultaneous allocation of a resource set • Handled via optimistic co-allocation based on free nodes or queue prediction • advance reservations will also be supported • globusrun will co-allocate specific multi-requests using DUROC

  19. GRAM Examples The globus-job-run client is a sample GRAM client, using command-line arguments rather than RSL. % globus-job-run pitcairn.mcs.anl.gov /bin/ls % globus-job-run pitcairn.mcs.anl.gov –s myprog % globus-job-run pitcairn.mcs.anl.gov \ –s myprog –stdin –s in.txt –stdout –s out.txt

  20. GRAM Examples The globusrun client is a more involved prototype that allows complicated RSL expressions. % globusrun –r pitcairn.mcs.anl.gov –f myjob.rsl % globusrun –r pitcairn.mcs.anl.gov \ ‘&(executable=myprog)’

  21. Resource Management APIs • Globus Toolkit has APIs for RSL, GRAM, and DUROC: • globus_rsl • globus_gram_client • globus_gram_myjob • globus_duroc_control • globus_duroc_runtime

  22. Resource Specification Language • 可以用于说明作业要求的通用语言 • RSL是GRAM的核心部分,它提供了不同组件之间交换信息的手段,比如应用与资源代理之间,资源协同分配与资源管理之间的信息交换 • 形式 • (attribute=value) • 需要GRAM理解这些属性attribute • Globus提供使用RSL的API • 可以用于以上之外的更多场合

  23. RSL的一些属性 • (executable=string) • Program to run • A file path (absolute or relative) or URL • (directory=string) • Directory in which to run (default is $HOME) • (arguments=arg1 arg2 arg3...) • List of string arguments to program • (environment=(E1 v1)(E2 v2)) • List of environment variable name/value pairs

  24. RSL的一些属性 • (stdin=string) • Stdin for program • A file path (absolute or relative) or URL • (stdout=string) • Stdout for program • A file path (absolute or relative) or URL • (stderr=string) • Stdout for program • A file path (absolute or relative) or URL • (count=integer) • Number of processes to run (default is 1) • (hostCount=integer) • On SMP multi-computers, number of nodes to distribute the “count” processes across • (project=string) • Project (account) against which to charge • (queue=string) • Queue into which to submit job

  25. RSL的一些属性 • (maxTime=integer) • Maximum wall clock or cpu runtime (schedulers’s choice) in minutes • (maxWallTime=integer) • Maximum wall clock runtime in minutes • (maxCpuTime=integer) • Maximum CPU runtime in minutes • (maxMemory=integer) • Maximum amount of memory for each process in megabytes • (minMemory=integer) • Minimum amount of memory for each process in megabytes

  26. RSL Attributes For GRAM • (jobType=value) • Value is one of “mpi”, “single”, “multiple”, or “condor” • mpi: Run the program using “mpirun -np <count>” • single: Only run a single instance of the program, and let the program start the other count-1 processes. • multiple: Start <count> instances of the program using the appropriate scheduler mechanism • condor: Start a <count> Condor processes running in “standard universe”

  27. RSL Attributes for GRAM • (gramMyjob=value) • Value is one of “collective”, “independent” • Defines how the globus_gram_myjob library will operate on the <count> processes • collective: Treat all <count> processes as part of a single job • independent: Treat each of the <count> processes as an independent uniprocessor job • (dryRun=true) • Do not actually run job

  28. RSL 的替代符 • RSL supports simple variable substitutions • Substitutions are declared using a list of pairs • (rslSubstitution=(SUB1 val1)(SUB2 val2) • A substitution is invoked with $(SUB) • Processing order: • Within scope, processed left-to-right, • Outer scope processed before inner scope • Variable definition can reference previously defined variables

  29. 替代符示例 • This &(rslSubstitution=(URLBASE “ftp://host:1234”)) (rslSubstitution=(URLDIR $(URLBASE)/dir)) (executable=$(URLDIR)/myfile) • is equivalent to this &(executable=ftp://host:1234/dir/myfile)

  30. GRAM Defined RSL Substitutions • GRAM defines a set of RSL substitutions before processing the job request • Machine Information • GLOBUS_HOST_MANUFACTURER • GLOBUS_HOST_CPUTYPE • GLOBUS_HOST_OSNAME • GLOBUS_HOST_OSVERSION

  31. GRAM Defined RSL Substitutions • Paths to Globus • GLOBUS_INSTALL_PATH • GLOBUS_TOOLS_PATH • GLOBUS_SERVICES_PATH • GLOBUS_DEPLOY_PATH • Miscellaneous • HOME • LOGNAME • GLOBUS_ID

  32. 用于DUROC的RSL属性 • (subjobStartType=value) • Alters the startup barrier mechanism • values are “strict-barrier”, “loose-barrier”, “no-barrier” • (subjobCommsType=value) • values are “blocking-join” and “independent” • if value is set to “independent”, the subjob won’t be seen from the other subjobs when doing inter-subjob communication. • (label=string) • Identifier for this subjob • (resourceManagerContact=string)(resourceManagerName=string) • Resource manager to which to submit a subjob

  33. Example: (single resource for now…) $ globusrun -r chi/jobmanager-pbs '& (executable="/home/abose/test.exe") (host_count=2) (count=4) (arguments=“-t 100 –f out.dat") (email_address=“abose@umich.edu") (queue="cac") (pbs_stagein=“morpheus:/home/abose/test.exe") (pbs_stageout=“morpheus:/home/abose/out.dat") (pbs_stdout="/tmp/stdout") (pbs_stderr="/tmp/stderr") (maxwalltime=10)(jobtype="mpi”)‘ “get test.exe from morpheus and run it on hypnos” - submitted by Globus gatekeeper on chi using PBS job manager

  34. RSL Example – Resulting PBS Submission Script on Hypnos: #! /bin/sh # PBS batch job script built by Globus job manager # #PBS -S /bin/sh #PBS -M abose@umich.edu #PBS -m n #PBS -q cac #PBS -W stagein=/home/abose/test.exe@morpheus.engin.umich.edu:/home/abose/test.exe #PBS -W stageout=/home/abose/out.dat@morpheus.engin.umich.edu:/home/abose/out.dat #PBS -l walltime=10:00 #PBS -o hypnos:/tmp/stdout #PBS -e hypnos:/tmp/stderr #PBS -l nodes=2 #PBS -v X509_USER_PROXY=/home/abose/.globus/.gass_cache/local/md5/1c/fd/d3/753b90 28dfec2ddd6df84cd06c/md5/0a/4b/1d/599dac54863d650c2531cb92fc/data,GLOBUS_ LOCATION=/usr/grid,GLOBUS_GRAM_JOB_CONTACT=https://chi.grid.umich.edu:58963/ 575/1047861360/,GLOBUS_GRAM_MYJOB_CONTACT=URLx-nexus://chi.grid.umich.edu:58 964/, HOME=/home/abose,LOGNAME=abose,LD_LIBRARY_PATH= #Change to directory requested by user cd /home/abose /usr/gmpi.pgi/bin/mpirun –np 4 /home/abose/test.exe –t 100 –f out.dat Slides taken from NPACI Training, 2003

  35. Programming with Globus API • • Command line programs syntax: grid_* or globus_* • • Function calls/APIs start with globus_* • • Library binaries start with libglobus_*.a • • Includes: • #include <globus_common.h> //defines most common data structures • and others depending on which modules/functions are called in the program. • • Module Activation/Deactivation: • - Functions are arranged in several modules. The corresponding modules must be activated before calling a function: • globus_module_activate(MODULE_NAME) • globus_module_deactivate(MODULE_NAME) • globus_module_deactivate_all() • GLOBUS_SUCCESS (0) is returned if successful. • Example Module Names: • GLOBUS_GRAM_CLIENT_MODULE • GLOBUS_IO_MODULE • GLOBUS_GASS_COPY_MODULE • Dependencies among module activations exist. Read API documentation.

  36. 评 价 • 优点: • 增加了对JOB资源的描述 • 定义了很多Attribute,支持GRAM、DUROC等多种资源管理方式 • 缺点: • 也是偏重于对计算资源和资源请求的描述,不够广泛 • 可扩展性不好 • 目前仅用于Globus,还不被其他Grid项目所支持

  37. WWW服务描述语言WSDL

  38. WSDL • Web Service Description Language • 用于描 述Web服务的技术调用语法。 • WSDL定义了一套基于 XML的语法,将Web服务描述为能够进行消息交换的服务访问点的集合,从而满足了这种需求。 • WSDL服务定义为分布式系统提供了可机器识别的SDK文档,并且可用于描述自动执行应用程序通信中所涉及的细节。 • WSDL的当前版本是1.1,规范可以从http://www.w3.org/TR/wsdl获得。

  39. WSDL • WSDL由Ariba、Intel、IBM和微软等开发商提出。 • 它用一种和具体语言无关的抽象方式定义了给定Web服务收发的有关操作和消息。 • WSDL保持协议中立,但它确实内建了绑定SOAP的支持,从而同SOAP建立了不可分割的联系。

  40. WSDL的信息模型 • WSDL信息模型充分利用了抽象规范与规范具体实现的分离,也就是分离了服务接口定义(抽象接口)与服务实现定义(具体端点)。 • 抽象接口规范描述了终端的处理能力,它在WSDL中表示为portType。束定机制 (binding mechanism)在WSDL中表示为binding元素,它使用特定的通信协议、数据编码模型和底层通信协议,将Web服务的抽象定义映射至特定实现。若束定结合了实现的访问地址,抽象端点也就成为可供服务请求者调用的具体端点(concrete endpoint),WSDL的port元素表示了这一结合。 • 抽象接口可以支持任何数量的操作(operations)。操作是由一组消息(messages)定义,消息定义了操作的交互定式。与抽象的消息、操作概念相对应的具体实现是由binding元素指定。与XML应用相同,WSDL模式定义了几个高层元素,或称为主要元素。

  41. WSDL描述的基本属性 • 服务做些什么--服务所提供的操作(方法)。 • 如何访问服务--数据格式详情以及访问服务操作的必要协议。 • 服务位于何处--由特定协议决定的网络地址,如URL。

  42. WSDL基本元素的含义

  43. WSDL信息模型

  44. WSDL对象结构图

  45. WSDL文档类型

  46. types types types types message message operation operation portType binding binding port port port service service WSDL文档结构

  47. WSDL工具 • Omniopera----图形用户界面的WSDI、XML和XSD编辑器。 • Microsoft的SOAP Toolkit----一种工具包,其中包括根据WSDL定义创建COM接口的向导程序,还包括根据COM接口创建WSDL的向导程序。 • IBM的Web Services Toolkit----一种工具包,其中包括产生WSDL和SOAP部署说明的向导程序。

  48. 资源描述框架RDF

  49. RDF • Resource Description Framework, RDF • W3C的资源描述框架(RDF)的目的是提供一个访问网络资源元数据(metadata)的标准,因此也提供了一个描述特定资源内容的标准协议。 • W3C应用元数据时的推荐标准 • 是一个模型,一种句法(syntax(es)) • 应用在Web上时,RDF通常用XML来编码 • 是语义万维网(semantic Web)的基础、支撑W3C - Resource Description Framework (RDF)http://www.w3.org/RDF/

  50. RDF • 是一个用于表达关于万维网(World Wide Web)上的资源的信息的语言。 • 专门用于表达关于Web资源的元数据, 比如Web页面的标题、作者和修改时间,Web文档的版权和许可信息,某个被共享资源的可用计划表等

More Related