Reliable and Efficient Grid Data Placement using Stork and DiskRouter

Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison kosart@cs.wisc.edu April 15th, 2004

A Single Project.. • LHC (Large Hadron Collider) • Comes online in 2006 • Will produce 1 Exabyte data by 2012 • Accessed by ~2000 physicists, 150 institutions, 30 countries

And Many Others.. • Genomic information processing applications • Biomedical Informatics Research Network (BIRN) applications • Cosmology applications (MADCAP) • Methods for modeling large molecular systems • Coupled climate modeling applications • Real-time observatories, applications, and data-management (ROADNet)

The Same Big Problem.. • Need for data placement: • Locate the data • Send data to processing sites • Share the results with other sites • Allocate and de-allocate storage • Clean-up everything • Do these reliably and efficiently

Outline • Introduction • Stork • DiskRouter • Case Studies • Conclusions

Stork • A scheduler for data placement activities in the Grid • What Condor is for computational jobs, Stork is for data placement • Stork comes with a new concept: “Make data placement a first class citizen in the Grid.”

Stage-in • Execute the Job • Stage-out Stage-in Execute the job Stage-out Release input space Release output space Allocate space for input & output data Individual Jobs The Concept

Stage-in • Execute the Job • Stage-out Stage-in Execute the job Stage-out Release input space Release output space Allocate space for input & output data Data Placement Jobs Computational Jobs The Concept

A B D E F The Concept Condor Job Queue DaP A A.submit DaP B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D, E ….. DAG specification C DAGMan Stork Job Queue C E

Why Stork? • Stork understands the characteristics and semantics of data placement jobs. • Can make smart scheduling decisions, for reliable and efficient data placement.

Failure Recovery and Efficient Resource Utilization • Fault tolerance • Just submit a bunch of data placement jobs, and then go away.. • Control number of concurrent transfers from/to any storage system • Prevents overloading • Space allocation and De-allocations • Make sure space is available

Support for Heterogeneity Protocol translation using Stork memory buffer.

Support for Heterogeneity Protocol translation using Stork Disk Cache.

Flexible Job Representation and Multilevel Policy Support [ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”; …… …… Max_Retry = 10; Restart_in = “2 hours”; ]

Run-time Adaptation • Dynamic protocol selection [ dap_type = “transfer”; src_url = “drouter://slic04.sdsc.edu/tmp/test.dat”; dest_url = “drouter://quest2.ncsa.uiuc.edu/tmp/test.dat”; alt_protocols = “nest-nest, gsiftp-gsiftp”; ] [ dap_type = “transfer”; src_url = “any://slic04.sdsc.edu/tmp/test.dat”; dest_url = “any://quest2.ncsa.uiuc.edu/tmp/test.dat”; ]

Run-time Adaptation • Run-time Protocol Auto-tuning [ link = “slic04.sdsc.edu – quest2.ncsa.uiuc.edu”; protocol = “gsiftp”; bs = 1024KB; //block size tcp_bs = 1024KB; //TCP buffer size p = 4; ]

DiskRouter • A mechanism for high performance, large scale data transfers • Uses hierarchical buffering to aid in large scale data transfers • Enables application-level overlay network for maximizing bandwidth • Supports application-level multicast

Store and Forward C A With DiskRouter DiskRouter B Without DiskRouter Improves performance when bandwidth fluctuation between A and B is independent of the bandwidth fluctuation between B and C

DiskRouter Overlay Network 90 Mb/s B A

DiskRouter Overlay Network 90 Mb/s B A 400 Mb/s 400 Mb/s DiskRouter C Add a DiskRouter Node C which is not necessarily on the path from A to B, to enforce use of an alternative path.

Data Mover/Distributed Cache Sourcewrites to the closestDiskRouterandDestinationreceives it up from itsclosestDiskRouter Source Destination DiskRouter Cloud

Submit Site SRB Server UniTree Server SDSC Cache NCSA Cache Case Study I: SRB-UniTree Data Pipeline • Transfer ~3 TB of DPOSS data from SRB @SDSC to UniTree @NCSA • A data pipeline created with Stork and DiskRouter

Failure Recovery Diskrouter reconfigured and restarted UniTree not responding SDSC cache reboot & UW CS Network outage Software problem

Case Study -II

Dynamic Protocol Selection

Runtime Adaptation • Before Tuning: • parallelism = 1 • block_size = 1 MB • tcp_bs = 64 KB • After Tuning: • parallelism = 4 • block_size = 1 MB • tcp_bs = 256 KB

Conclusions • Regard data placement as first class citizen. • Introduce a specialized scheduler for data placement. • Introduce a high performance data transfer tool. • End-to-end automation, fault tolerance, run-time adaptation, multilevel policy support, reliable and efficient transfers.

Future work • Enhanced interaction between Stork, DiskRouter and higher level planners • co-scheduling of CPU and I/O • Enhanced authentication mechanisms • More run-time adaptation

You don’t have to FedEx your data anymore.. We deliver it for you! • For more information • Stork: • Tevfik Kosar • Email: kosart@cs.wisc.edu • http://www.cs.wisc.edu/condor/stork • DiskRouter: • George Kola • Email: kola@cs.wisc.edu • http://www.cs.wisc.edu/condor/diskrouter

Reliable and Efficient Grid Data Placement using Stork and DiskRouter