160 likes | 260 Views
Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory http://sdm.lbl.gov/srm. Participants. PI: Arie Shoshani LBNL – 2 FTEs: Arie Shoshani, PI
E N D
Storage Resource Management (SRM) For Grid Applications A SciDAC supported middleware component Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory http://sdm.lbl.gov/srm
Participants PI: Arie Shoshani LBNL – 2 FTEs: Arie Shoshani, PI Alex Sim, co-PI Junmin Gu Andreas Mueller Fermilab – ½ FTE: Don Petravick, Co-PI Rich Wellner
Motivation • Grid architecture emphasized in the past • Security • Compute resource coordination & scheduling • Network resource coordination & scheduling (QOS) • SRMs role in the data grid architecture • Storage resource coordination & scheduling • Types of storage resource managers • Disk Resource Manager (DRM) • Tape Resource Manager (TRM) • Hierarchical Resource Manager (TRM + DRM)
client client Replica catalog Request Interpreter Request Executer request planning Network Weather Service HRM DRM DRM tape system Disk Cache Disk Cache Disk Cache Where Do SRMs Fit in Grid Architecture? ... Client’s site logical query property-file index logical files site-specific files site-specific files requests pinning & file transfer requests network ...
Challenges (1) • Managing storage resources in an unreliable distributed large heterogeneous system • Long lasting data intensive transactions • Can’t afford to restart jobs • Can’t afford to loose data, especially from experiments • Type of failures • Storage system failures • Mass Storage System (MSS) • Disk system • Server failures • Network failures
Challenges (2) • Heterogeneity • Operating systems (well understood) • MSS - HPSS, Castor, Enstore, … • Disk systems – system attached, network attached, parallel • Optimization issues • avoid extra file transfers - What to keep in each disk caches over time • How to maximize sharing for multiple users • Global optimization • Multi-Tier storage system optimization
Specific Problems • Managing resource space allocation • What if there is no space? • Managing pinning of files • What if files can be removed in the middle of a transfer • Space reservations • What if multiple files are needed concurrently • File streaming • For processing a large set of files • Pin-lock • What if you pinned files, and system deadlocks • User priorities • Access control – who can read/write a file
tape system tape system Disk Cache Disk Cache HRMs in PPDG(high level view) • Monitors files written into BNL’s HPSS • Selects files to replicate • Issues request_to_put for file (or many files) Replica Coordinator HRM-COPY HRM-GET HRM (performs writes) HRM (performs reads) GridFTP GET (pull mode) LBNL BNL
Measurements FILE_REQUEST_FAILED Notified_Client Migration_Finished Migration_Requested Transfered_to_PDSF_from_BNL Staging_finished_at_BNL Staging_started_at BNL Staging_requested_at_BNL File replication request start
Disk Cache Disk Cache Disk Cache Disk Cache Disk Cache SC 2001 Demo Setup Denver client Logical Request BIT-MAP Index Request Manager File Transfer Monitoring Legend: GridFTP DRM Control path Data Path Chicago Berkeley Livermore Berkeley server server server server GridFTP DRM FTP GridFTP HRM GridFTP
Accomplishment • Developed HRMs and DRMs using the same uniform protocols • Deployed in PPDG • Developed Command Line interface to HRM • Wrote a joint design specification in cordination with EDG, Jlab, and Fermi (to be presented at GGF) • Wrote a paper for MSS conference • Future: develop a standard protocol • Future: deploy HRM in ORNL & NERSC for ESG II project