650 likes | 666 Views
This project aims to build a platform for academic research using grid computing technologies in Taiwan. The system integrates various resources within a large network environment and allows collaboration for large-scale applications.
E N D
Taiwan UniGrid Yeh-Ching Chung Department of Computer Science National Tsing Hua University Hsin-Chu, 300, Taiwan
Outline • Introduction • Portal • Broker and Scheduler • Resource Information Service • Storage Service • Applications • Conclusion
Introduction (1) • The purpose of grid computing is to integrate various resources within a large network environment. • The purpose of the UniGrid project is to build a platform for academic research using grid-related technologies in Taiwan.
Introduction (2) • 9 institutes join to develop the system • 國網中心 • 清華大學資工系 • 中研院資科所 • 東華大學資工系 • 東海大學資科系 • 中華大學資工系 • 靜宜大學資管系 • 興國管理學院電子商務學系 • 台灣大學大氣科學系
Introduction (3) • All institutes that participate in the UniGrid project contribute some resources. • These resources can be used in collaboration for large scale applications.
Introduction (4) • System Architecture
Outline • Introduction • Portal • Broker and Scheduler • Resource Information Service • Storage Service • Applications • Conclusion
Portal • The UniGrid portal provides an interface for UniGrid users to use the resources available in the UniGrid system. • Functionalities of the portal • System status monitoring • Single sign-on • User workflow management • Project information
System Status Monitoring (1) • UniGrid users can examine the status of system resources through the portal. • The portal gathers the current system information from the information service and present these information to the users.
System Status Monitoring (2) • Screenshot of the system status monitoring web page
Single Sign-On (1) • Single sign-on is a mechanism whereby a single authentication can permit a user to access all resources where he has access permission, without the need to enter multiple passwords. • All user account information are kept in a database at the portal site. • When a user requests a service, his verification data is passed to that service. • The request will be granted only if the identity is verified by the verification web service
Single Sign-On (2) • User identity verification through single sign-on service
User Workflow Management (1) • A UniGrid user can design and save his own workflows at the UniGrid portal. • A user can select any workflow he designed and execute the workflow through the UniGrid portal. • A user can also monitor the status of his workflow through the UniGrid portal.
User Workflow Management (2) • Structure of a workflow Workflow parallel execution sequential execution
User Workflow Management (3) • The workflows of each user is stored in the portal storage in XML format. • <flow name="testflow" numstages="3"> <stage name="stage1" numjobs="1"> <job id="0"> <sortkey>1</sortkey> <runtype>mpi</runtype> <workdir>/home/test/</workdir> <filename>mm_mpi</filename> <runrp>true</runrp> <datafile/> <argu>256</argu> <otherurl/> <cpuno>4</cpuno> </job> </stage> … </flow>
User Workflow Management (4) • Screenshot of the workflow editing web page
User Workflow Management (5) • When an user submits a workflow, the portal will pass the selected workflow information to the broker. • Upon receiving an execution request, the resource broker will find the required resource for that workflow and schedule its execution.
User Workflow Management (7) • Users can examine the execution status of his workflow through the portal’s workflow monitoring system. • All workflow execution information are stored in a database at the machine with resource broker installed on it. • The portal queries the database and obtain the current status of a particular workflow. • The status information is processed and presented in the form of web pages.
User Workflow Management (8) • Screenshot of the workflow monitoring web page
User Workflow Management (9) • Screenshot of the UniGrid workflow management web page
Outline • Introduction • Portal • Broker and Scheduler • Resource Information Service • Storage Service • Applications • Conclusion
Broker & Scheduler (1) • The broker provides a uniform interface to access available resources in the UniGrid system. • The broker uses the resource information service to obtain the current status of the resources in the system. • After these information are gathered, the broker will allocate the resources that meets the requirements of the current job. • The jobs are then passed to the corresponding local schedulers to be executed locally.
Broker & Scheduler (2) • Broker workflow
Broker & Scheduler (3) • Each participating organization has a local scheduler (Condor) installed to schedule the jobs assigned to that organization. • Condor • A scheduler for large collections of distributively owned computing resources • Developed by the researchers at University of Wisconsin • Specialized for compute-intensive jobs • Uses the “ClassAd” mechanism to match job requirements to machine status and schedule the jobs according to the matching results
Related Research (1) • Tools have been developed to simulate different load sharing and scheduling policies on computing grid and analyze their performance • Queuing methods • Independent clusters • Multiple queues • Forwarding to no-need-to-wait site • Forwarding to shortest-queue site • Forwarding to least-load site, load=
Related Research (2) • Queuing methods (cont’d.) • Single queue • Multi-pool centralized queue • Single-pool centralized queue • One big cluster • Two-level scheduling • Empty queue only • Shortest queue first • Least load first • Two-level local queues • Forwarding to shortest-queue site
Related Research (3) • Scheduling policies • Non-FCFS • Multi-pool centralized queue • Single-pool centralized queue • FCFS • Two-level scheduling • The performance of Non-FCFS is three times better than FCFS
Related Research (4) • Implementation Approaches • Multi-Pool Centralized Queue • Global queue scheduling in the broker, no local queuing system • Global queue scheduling in the broker, making sure available processors through local queuing system • Single-Pool Centralized Queue • Global queue scheduling in the broker, no local queuing system
Related Research (5) • Two-Level Scheduling (Empty-Queue-Only Multi-Pool Grid) • Global queue in the broker, local queues in the local queuing systems
Related Research (6) • Simulation results
Related Research (7) • Simulation results (cont’d.)
Related Research (8) • Discussion • Non-FCFS methods can effectively improve the overall system utilization and performance. • The smallest first non-FCFS policy outperforms all other policies in terms of waiting time and waiting ratio. • As the worst case is concerned, the backfilling policy is superior because it does not allow jobs to be delayed by the backfilling activities
Outline • Introduction • Portal • Broker & Scheduler • Resource Information Service • Storage Service • Applications • Conclusion
Resource Information Services • The resource information service provides information about current resource status, these information can be used by other services of the system • Functionalities of the resource information service • Information system • Performance visualization of MPI parallel program’s execution
Information System (1) • Provides an interface for other services to query various information about computing nodes • The statistics about the individual nodes are obtained using MDS (Monitoring & Discovery Service) provided by the Globus Toolkit • The current network status between machines are gathered using NWS (Network Weather Service) • Automatic update of node information • When a new computing nodes is added/removed
Information System (2) • The Network Weather Service (NWS) • A distributed system that periodically monitors and dynamically forecasts the performance various network and computational resources can deliver over a given time interval • Developed by the researchers at UCSB • It uses numerical models to generate forecasts of what the conditions will be for a given time frame • Because this functionality is analogous to weather forecasting, the system is called Network Weather Service
Information System (4) • Screenshot of the node status webpage
Performance Visualization of MPI Programs (1) • Input: any application (depending on the availability of compiler in grid platform) • Output: performance visualization of the execution of this application
Performance Visualization of MPI Programs (2) • Execution of a Parallel Application using 4 computing nodes
Related Research (1) • Communication localization & data partitioning techniques in cluster-based grid system • Localized communication enhances performance of parallel applications on grid • Adaptive data partitioning for identical cluster & non-identical cluster grid topology • In-core & out-of-core applications
Related Research (2) • Communication localization techniques for identical cluster Localized communication patterns Original communication patterns
Related Research (3) • Communication localization techniques for non-identical cluster Original communication table
Related Research (4) • Communication localization techniques for non-identical cluster (cont’d.) Localized communication table
Outline • Introduction • Portal • Broker and Scheduler • Resource Information Service • Storage Service • Applications • Conclusion
Storage Service • The goal of storage service is to provide a collaborative space where UniGrid users can share their data and resources with others. • Components of the storage service • Virtual storage system • Data management system
Virtual Storage System (1) • Virtual storage system architecture
Virtual Storage System (2) • The virtual storage system is implemented with Java as a web service • UniGrid services access the virtual storage system when they need to fetch/modify users’ data files • A client program is available for users to manage his own storage space • The files are stored in a master file server and replicas of the files are distributed to other machines