Explicit Control in a Batch-aware Distributed File System

Explicit Control in a Batch-aware Distributed File System

Focus of work • Harnessing, managing remote storage • Batch-pipelined I/O intensive workloads • Scientific workloads • Wide-area grid computing

Batch-pipelined workloads • General properties • Large number of processes • Process and data dependencies • I/O intensive • Different types of I/O • Endpoint • Batch • Pipeline

Endpoint Endpoint Batch dataset Endpoint Pipeline Pipeline Batch dataset Batch-pipelined workloads Endpoint Endpoint Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Endpoint Endpoint Endpoint Endpoint

Wide-area grid computing Internet Home storage

Cluster-to-cluster (c2c) • Not quite p2p • More organized • Less hostile • More homogeneity • Correlated failures • Each cluster is autonomous • Run and managed by different entities • An obvious bottleneck is wide-area Internet Home store How to manage flow of data into, within and out of these clusters?

Current approaches • Remote I/O • Condor standard universe • Very easy • Consistency through serialization • Prestaging • Condor vanilla universe • Manually intensive • Good performance through knowledge • Distributed file systems (AFS, NFS) • Easy to use, uniform name space • Impractical in this environment

Pros and cons

BAD-FS • Solution: Batch-Aware Distributed File System • Leverages workload info with storage control • Detail information about workload is known • Storage layer allows external control • External scheduler makes informed storage decisions • Combining information and control results in • Improved performance • More robust failure handling • Simplified implementation

Practical and deployable • User-level; requires no privilege • Packaged as a modified Condor system • A Condor system which includes BAD-FS • General; glide-in works everywhere SGE SGE SGE SGE BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS BAD- FS SGE SGE SGE SGE Internet Home store

NeST NeST NeST NeST Jobqueue 1 2 3 4 BAD-FS == Condor ++ Compute node Compute node Compute node Compute node Condor startd Condor startd Condor Startd Condor startd BAD-FS BAD-FS BAD-FS 1) NeST storage management 3) Expanded Condor submit language 2) Batch-Aware Distributed File System 4) BAD-FS scheduler Job queue Home storage Condor DAGMan ++ Condor DAGMan

BAD-FS knowledge • Remote cluster knowledge • Storage availability • Failure rates • Workload knowledge • Data type (batch, pipeline, or endpoint) • Data quantity • Job dependencies

Control through lots • Abstraction that allows external storage control • Guaranteed storage allocations • Containers for job I/O • e.g. “I need 2 GB of space for at least 24 hours” • Scheduler • Creates lots to cache input data • Subsequent jobs can reuse this data • Creates lots to buffer output data • Destroys pipeline, copies endpoint • Configures workload to access lots

Knowledge plus control • Enhanced performance • I/O scoping • Capacity-aware scheduling • Improved failure handling • Cost-benefit replication • Simplified implementation • No cache consistency protocol

I/O scoping • Technique to minimize wide-area traffic • Allocate lots to cache batch data • Allocate lots for pipeline and endpoint • Extract endpoint • Cleanup Compute node Compute node AMANDA: 200 MB pipeline 500 MB batch 5 MB endpoint Steady-state: Only 5 of 705 MB traverse wide-area. Internet BAD-FS Scheduler

Capacity-aware scheduling • Technique to avoid over-allocations • Scheduler has knowledge of • Storage availability • Storage usage within the workload • Scheduler runs as many jobs as fit • Avoids wasted utilizations • Improves job throughput

Improved failure handling • Scheduler understands data semantics • Data is not just a collection of bytes • Losing data is not catastrophic • Output can be regenerated by rerunning jobs • Cost-benefit replication • Replicates only data whose replication cost is cheaper than cost to rerun the job • Can improve throughput in lossy environment

Simplified implementation • Data dependencies known • Scheduler ensures proper ordering • Build a distributed file system • With cooperative caching • But without a cache consistency protocol

Real workloads • AMANDA • Astrophysics study of cosmic events such as gamma-ray bursts • BLAST • Biology search for proteins within a genome • CMS • Physics simulation of large particle colliders • HF • Chemistry study of non-relativistic interactions between atomic nuclei and electrons • IBIS • Ecology global-scale simulation of earth’s climate used to study effects of human activity (e.g. global warming)

Setup 16 jobs 16 compute nodes Emulated wide-area Configuration Remote I/O AFS-like with /tmp BAD-FS Result is order of magnitude improvement Real workload experience

BAD Conclusions • Schedulers can obtain workload knowledge • Schedulers need storage control • Caching • Consistency • Replication • Combining this control with knowledge • Enhanced performance • Improved failure handling • Simplified implementation

For more information “Pipeline and Batch Sharing in Grid Workloads,” Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dussea, Miron Livny. HPDC 12, 2003. • http://www.cs.wisc.edu/condor/publications.html • Questions? “Explicit Control in a Batch-Aware Distributed File System,” John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dussea, Miron Livny. NSDI ‘04, 2004.

Explicit Control in a Batch-aware Distributed File System

Explicit Control in a Batch-aware Distributed File System

Presentation Transcript

Distributed File System

Distributed File System

Frangipani: A Scalable Distributed File System

Explicit Control in a Batch-aware Distributed File System

Hadoop Distributed File System

Hadoop Distributed File System

DISTRIBUTED FILE SYSTEM

SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM

Caching in Distributed File System

Distributed File System

Speculative Execution in a Distributed File System

Distributed File system(DFS)

1DT057 DISTRIBUTED INFORMATION SYSTEM DISTRIBUTED FILE SYSTEM

Distributed File System

Distributed File System

distributed file system and google file system

Distributed File System

Distributed File System

Distributed File System .