250 likes | 380 Views
Stork: State of the Art. Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu http://www.cs.wisc.edu/condor/stork. The Imminent Data “deluge”. Exponential growth of scientific data 2000 : ~0.5 Petabyte 2005 : ~10 Petabytes
E N D
Stork: State of the Art Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison kosart@cs.wisc.edu http://www.cs.wisc.edu/condor/stork
The Imminent Data “deluge” • Exponential growth of scientific data • 2000 : ~0.5 Petabyte • 2005 : ~10 Petabytes • 2010 : ~100 Petabytes • 2015 : ~1000 Petabytes • “I am terrified by terabytes” -- Anonymous • “I am petrified by petabytes” -- Jim Gray • Moore’s Law outpaced by growth of scientific data!
Bioinformatics:BLAST High Energy Physics:LHC Educational Technology:WCER EVP Astronomy: LSST 2MASS SDSS DPOSS GSC-II WFCAM VISTA NVSS FIRST GALEX ROSAT OGLE ... 2-3 PB/year 11 PB/year 500 TB/year 20 TB - 1 PB/year
How to access and process distributed data? TB TB PB PB
CPU MEMORY BUS HARDWARE LEVEL I/O PROCESSOR DMA CONTROLLER CPU HARDWARE LEVEL BUS I/O PROCESSOR DISK MEMORY CONTROLLER DMA DISK I/O Management in the History
I/O SUBSYSTEM I/O CONTROL SYSTEM OPERATING SYSTEMS LEVEL CPU SCHEDULER I/O SCHEDULER CPU SCHEDULER I/O SUBSYSTEM OPERATING SYSTEMS LEVEL I/O SCHEDULER I/O CONTROL SYSTEM CPU HARDWARE LEVEL BUS I/O PROCESSOR MEMORY CONTROLLER DMA DISK I/O Management in the History
BATCH SCHEDULERS DISTRIBUTED SYSTEMS LEVEL CPU SCHEDULER I/O SUBSYSTEM OPERATING SYSTEMS LEVEL I/O SCHEDULER I/O CONTROL SYSTEM CPU HARDWARE LEVEL BUS I/O PROCESSOR MEMORY CONTROLLER DMA DISK I/O Management in the History
BATCH SCHEDULERS DATA PLACEMENT SCHEDULER CPU SCHEDULER OPERATING SYSTEMS LEVEL I/O SCHEDULER I/O CONTROL SYSTEM CPU HARDWARE LEVEL BUS I/O PROCESSOR MEMORY CONTROLLER DMA DISK I/O Management in the History DISTRIBUTED SYSTEMS LEVEL I/O SUBSYSTEM
Allocate space for input & output data JOB i JOB k JOB i JOB i JOB k JOB k Release input space Execute job j Stage-out Execute job j Stage-out Stage-out Stage-out Stage-in Stage-in Stage-in Stage-in Release output space Compute Jobs Data placement Jobs Allocate space for input & output data Allocate space for input & output data get JOB j put Release input space Release input space Release output space Release output space Individual Jobs
Compute Job Queue C Workflow Manager Data Job Queue A B D E F C E Separation of Jobs Data A A.stork Data B B.stork Job C C.condor ….. Parent A child B Parent B child C Parent C child D, E ….. DAG specification
Stork: Data Placement Scheduler • First scheduler specialized for data movement/placement. • De-couples data placement from computation. • Understands the characteristics and semantics of data placement jobs. • Can make smart scheduling decisions for reliable and efficient data placement. • A prototype is already implemented and deployed at several sites. • Now distributed with Condor Developers Release v6.7.6 http://www.cs.wisc.edu/condor/stork
[ICDCS’04] [ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”; …… …… Max_Retry = 10; Restart_in = “2 hours”; ] Support for Heterogeneity • Provides uniform access to different data storage systems and transfer protocols. • Acts as an IOCS for distributed systems. • Multilevel Policy Support Protocol translation: using Stork Memory Buffer using Stork Disk Cache
[ICDCS’04] Dynamic Protocol Selection [ dap_type = “transfer”; src_url = “drouter://slic04.sdsc.edu/tmp/test.dat”; dest_url = “drouter://quest2.ncsa.uiuc.edu/tmp/test.dat”; alt_protocols = “gsiftp-gsiftp, nest-nest”; or: src_url = “any://slic04.sdsc.edu/tmp/test.dat”; dest_url = “any://quest2.ncsa.uiuc.edu/tmp/test.dat”; ] Traditional Scheduler: 48 Mb/s Using Stork: 72 Mb/s DiskRouter crashes DiskRouter resumes
[AGridM’03] GridFTP • Before Tuning: • parallelism = 1 • block_size = 1 MB • tcp_bs = 64 KB • After Tuning: • parallelism = 4 • block_size = 1 MB • tcp_bs = 256 KB Run-time Auto-tuning Traditional Scheduler (without tuning) 0.5 MB/s Using Stork (with tuning) 10 MB/s [ link = “slic04.sdsc.edu – quest2.ncsa.uiuc.edu”; protocol = “gsiftp”; bs = 1024KB; // I/O block size tcp_bs = 1024KB; // TCP buffer size p = 4; // number of parallel streams ]
[Europar’04] Controlling Throughput Wide Area Local Area • Increasing concurrency/parallelism does not always in crease transfer rate • Effect on local area and wide are is different • Concurrency and parallelism have slightly different impacts on transfer rate
[Europar’04] Controlling CPU Utilization Client Server Concurrency and parallelism have totally opposite impacts on CPU utilization at the server side.
[Grid’04] POLICIES Detecting and Classifying Failures F DNS Server error Check DNS Server Transient S F Check DNS No DNS entry Permanent S F Network Outage Check Network Transient S F Check Host Host Down Transient S F Check Protocol Protocol Unavailable Transient S F Check Credentials Not Authenticated Permanent S F Source File Does Not Exist Check File Permanent S S F Test Transfer Transfer Failed
[Cluster’04] 99.7% 15.8 min Detecting Hanging Transfers • Collecting job execution time statistics • Fit a distribution • Detect and avoid • black holes • hanging transfers Eg. for normal distribution: 99.7%of job execution times should lie between [(avg-3*stdev), (avg+3*stdev)]
Stork can also: • Allocate/de-allocate (optical) network links • Allocate/de-allocate storage space • Register/un-register files to Meta Data Catalog • Locate physical location of a logical file name • Control concurrency levels on storage servers • You can refer to [ICDCS’04][JPDC’05][AGridM’03]
Failure Recovery Diskrouter reconfigured and restarted UniTree not responding SDSC cache reboot & UW CS Network outage Software problem
End-to-end Processing of 3 TB DPOSS Astronomy Data Traditional Scheduler: 2 weeks Using Stork: 6 days
Summary • Stork provides solutions for the data placement needs of the Grid community. • It is ready to fly! • Now distributed with Condor developers release v6.7.6. • All basic features you will need are included in the initial release. • More features coming in the future releases.