90 likes | 224 Views
A Proposal of Capacity and Performance Assured Storage in The PRAGMA Grid Testbed. Yusuke Tanimura 1) , Hidetaka Koie 1,2) , Tomohiro Kudoh 1) Isao Kojima 1) , and Yoshio Tanaka 1) 1) National Institute of AIST, Japan 2) SURIGIKEN Co., Ltd. Background.
E N D
A Proposal ofCapacity and Performance Assured Storagein The PRAGMA Grid Testbed Yusuke Tanimura1),Hidetaka Koie1,2),Tomohiro Kudoh1) Isao Kojima1),and Yoshio Tanaka1) 1) National Institute of AIST, Japan 2) SURIGIKEN Co., Ltd.
Background • Support of data-intensive scientific applications is one of the challenges of the PRAGMA grid testbed work. • Avian Flu Grid applications • Geo sciences applications • GLEON/CLEON applications • Gfarm provides ... • Global file access using a single name space • POSIX-like • Efficient file replication among sites • Competitive to grid-ftp • Excellent performance with data access locality • Each process just accesses a local disk drive. • However, there is a missing part of the storage resource management from the view point of resource sharing.
Problem 1 • Most PRAGMA sites are multitenant and a shared storage tends to be a performance bottleneck. • Total performance is not satisfied. • Access conflict occurs at some of storage servers. Each PRAGMA site Application A (parallel job) Application B (striping I/O) Compute servers of a cluster A shared storage for the cluster (NFS, PVFS, Lustre, etc.)
Problem 2 • Using a remote storage over the Internet is required in some use cases but ... • The remote storage is not fast enough, the performance is unpredictable, or the disk space is not enough. • On the other hand, high-bandwidth or bandwidth-guaranteed dynamic network (Ex. lambda paths) is available. Client x x Performance? x x x x Disk space? x Lambda path network Site A x Site B Storage Site C
Our proposed storage (Papio) • Allow users to reserve performance in advance • Specify date, time, and read/write throughput • For write access, disk space can be reserved, too. • During reserved time, the storage servers are dedicated to the user or the user’s access is prioritized (SLA). • Use existing I/O control techniques for prioritization. • Disk I/O scheduling • Expect stable disk throughput or performance prediction when using flush disk (Ex. SSD). • Flow control of I/O path • Reserve buffer cache on storage servers • Reservation interface • Provide a special command and a Web-services based interface. • Collocation with network resources is also supported.
Flow control (by PSPacer) Our proposed storage (Papio) Proposed storage (deployed in a single site) Network Resource Manager Management server Reservation management service File metadata service Storage Resource Manager (SRM) Web services based protocol Reserve Administrate I/O controls according to the reservation Collocation Global Resource Coordinator Disk I/O control (by dm-ioband) Storage server Reserve request (by command) Reserve request (by Web services based protocol, GNS-WSI3) Storage server Application Storage server Client node
MPI-IO application, virtual clusters, etc. Require high-throughput 60MB/s for each process 150MB/s 420MB/s Application B Application C Application C Application C Application C Application A 140MB/s 60MB/sec 60MB/sec 60MB/sec 60MB/sec 150MB/s 140MB/s 140MB/s Resource allocation • Papio allocates the storage resources (Disk space, I/O path, etc.) to each application according to reservations. Storage server Storage server Storage server Storage server Storage server Storage server Storage server 200MB/s 200MB/s 200MB/s 200MB/s 100MB/s 100MB/s 100MB/s
Current status and future work • A prototype implementation is planned to be completed by this Summer. • Not for production but for tests and demonstration • Performance guarantee is challenging. • We first support, • Dedicated use and then try to support prioritized use. • Sequential read throughput (MB/sec) reservation • Write access is more complicated. • Need to study how much performance granularity can be guaranteed. • If someone is interested in this, we can deploy the software on the AIST cluster for experimental use around PRAGMA 19.
Acknowledgement • A part of this research is supported by the Special Coordination Funds for Promoting Science and Technology of Ministry of Education, Culture, Sports, Science and Technology (MEXT) in Japan.