230 likes | 387 Views
GASS: A Data Movement and Access Service for Wide Area Computing Systems. Joseph Bester∗ Ian Foster∗† Carl Kesselman ‡ Jean Tedesco†∗ Steven Tuecke∗. 6. 5. Performance Studies. Application. Outline. 1. Introduction. 2. Background. 3. GASS Architecture. 4. GASS Implementation.
E N D
GASS: A Data Movement and Access Servicefor Wide Area Computing Systems Joseph Bester∗ Ian Foster∗†Carl Kesselman‡Jean Tedesco†∗ Steven Tuecke∗
6 5 Performance Studies Application Outline 1 Introduction 2 Background 3 GASS Architecture 4 GASS Implementation Your Site Here
Requirement Easy modified. high-performance implementations and support application-oriented management Existing technology Web-based file system – transparent access. Condor – relinking to special I/O lib. RIO(Remote I/O system) – adopt MPI-IO parallel I/O lib. Strategy Optimized grid I/O pattern – fit the system. Without any specialized services - all resource can use. B.W. management - optimize performance. Not be known until runtime – you can prediction. Introduction Your Site Here
I/O requirements in Grid applications Hierarchical Data Formant just access ‘nearest’. Regional model Access other input, e.g. topography. Diagnostic data stream back information. Output data must be stored somewhere Exploratory model Background(1/4) Your Site Here
so GASS must be support… Uniform access to files. Diverse data sources – FTP,HTTP,tap,disk Dynamic resource set. Support for streaming I/O Because user’s habit use UNIX I/O. Little or no program modification. Support for programmer-directed performance optimization A override strategy for performance Background(2/4) Your Site Here
Existing system Andrew File System (AFS) – kernel level DFS Prospero File System a DFS in heterogeneous but no addressing in performance Condor high-throughput computing system Link run-time lib replace to I/O system achieve transparent and small require, but no cache-manage. Background(3/4) Your Site Here
Legion – next generation virtual computer Using specialized standard to copy into LS. some B.W. manage in this. WebFS and UFO- Web based data source Background(4/4) Your Site Here
Access Patterns Default Data movement strategies Structure one: fetch and cache first read open inappropriate if a file is large: computation may be delayed too long while the file is transferred, or the local cache may be too small to hold the entire file. Structure two: flush cache and transfer on last write and close GASS Architecture Your Site Here
Figure 1: The GASS system is optimized for I/O patterns (a){(c); patterns (d){(f) are not supported effciently Your Site Here
Figure 2: The GASS cache architecture. Files opened by application processes (represented by circles) are maintained in a local cache directory; they are copied from the remote location (on open, if opened for reading) and/or to the remote location (on close, if created or opened for writing). Your Site Here
Specialized Data movement strategies prestaging & poststaging - extension on cache manage [pre.] ‘open file for reading’ allocate cache -> move in it -> count++ [pos.] ‘close file that was open for writing’ Such as file staging and redirection of standard I/O Low-level cache – more fine-grained Benefit 1: allow file caching to be directed to specific locations, on a per-file and/or per-user basis: for example, to access-controlled user file systems. Benefit 2: exploit local DFS, e.g. DPSS,NFS,AFS GASS Architecture Your Site Here
GASS operation Minimum changes to application Multi file path, performance high than DFS globus_gass_open(), globus_gass_close()globus_gass_fopen(), globus_gass_fclose() use URLs instead of filenames. Caches URL in case of multiple opens. Return descriptors to files in local cache or sockets to remote server. GASS Architecture Your Site Here
no Modified Remove cache reference yes Upload changes globus_gass_open()/close() no URL in cache? Download File into cache yes open cached file,add cache reference globus_gass_close() globus_gass_open() Your Site Here
How to Integration with Globus GRAM feature Allocate resources. Initial and manage computation. So GASS can extend on GARM Small overhead GASS Architecture Your Site Here
APIs File Access API [synchronous]. Read from cache. Cache Management API [synchronous]. Support insertion, lock(reference counting), removal, allow overlapping multiple cache. Client Implementation API [asynchronous] allows applications to eliminate data copies select transfer unit size and proxy server. enable data transfer overlapped cache. Server Implementation API [asynchronous] remote file access protocol GASS Implementation Your Site Here
Cache Management API Add, delete, maintain log(filename, local name, stamp, tag list). Against race, soif contention = block. notprovide any wide area cache coherency Client and Server APIs Nexus-based protocols can deliver superior performance to IP-based wide-area networking protocol. GASS Implementation Your Site Here
Globus Executable Management System Assign .EXE, combine GASS client and caching API. GASS command Line Tools allow the programmer to implement pre-staging, post-staging, and other remote file operations without modifying user applications. Globus-rcp :Third-party-initiated data transfer remote start-up, p2p authentication Globusrun :auto task associated Cache avoid unnecessary download. provide arbitrary application. job management can schedule the linear time. MPICH_G = Message Passing Interface + globusrun (to initiate MPI program). Application Your Site Here
SF-Express: Distributed Supercomputing. 100000 entry,1000 CPU. Require: locate, assemble, manage resources Result: Append-mode, redirect diagnostic If the GASS isn’t exist. 逐一與這些計算機的管理人員聯繫,定下時間,預留機器。 如何把SF-Express程序代碼及初始數據傳送到每台並行計算機上並啟動之?看來只好逐一登錄,用手工完成。 如果在程序運行過程中,出現異常情況時?SF Express只好停下來,找個機會重新開始。 Application Your Site Here
GASS Cache Overhead Performance Studies Table 1: Time to transfer remote files of various sizes directly into memory (To Memory), through a /tmp file (No Cache),and through the GASS cache on /tmp (GASS Cache). All times are in seconds and are the average of multiple runs. See text for details. Your Site Here
GASS Cache Contention Performance Studies Table 2: Results of contention experiments in which multiple processes open and read a file at the same time, via standard Unix open and close calls; GASS transfer followed by read; and GASS access to a prestaged file. All times are in seconds. See text for details. Your Site Here
GASS and AFS Performance Performance Studies Table 3: Overall time required to read the content of a remote file using GASS and AFS to access the le. All times are in seconds. See text for details Your Site Here
High performance Efficient in using B.W. Movement strategies with out modify code. Useful Suit to HPSS, DPSS, SRB. Future SRB interfacing into GASS. Globus = GASS + Advance reservation Conclusion Your Site Here
Thank you! LOGO your site here