290 likes | 460 Views
Storage Resource Managers: Essential Components for the Grid Arie Shoshani Staff: Alex Sim, Junmin Gu, Alex Romosan, Viji Natarajan Scientific Data Management Group Lawrence Berkeley National Laboratory http://sdm.lbl.gov/srm. Outline. What are Storage Resource Managers - Motivation
E N D
Storage Resource Managers: Essential Components for the Grid Arie Shoshani Staff: Alex Sim, Junmin Gu, Alex Romosan, Viji Natarajan Scientific Data Management Group Lawrence Berkeley National Laboratory http://sdm.lbl.gov/srm
Outline • What are Storage Resource Managers - Motivation • General Analysis Scenario and the use of SRMs • SRM functionality • Real examples of working SRMs • Advantages of using SRMs • Conclusions and Future Work
Motivation • Grid architecture needs to include reservation & scheduling of: • Compute resources • Storage resources • Network resources • Storage Resource Managers (SRMs) role in the data grid architecture • Shared storage resource allocation & scheduling • Especially important for data intensive applications • Often files are archived on a mass storage system (MSS) • large scientific collaborations (100’s of clients) – opportunities for file sharing • File replication and caching may be used • Need to support non-blocking (asynchronous) requests
Types of SRMs • Types of storage resource managers • Disk Resource Manager (DRM) • Manages one or more disk resources • Tape Resource Manager (TRM) • Manages access to a tertiary storage system (e.g. HPSS) • Hierarchical Resource Manager (HRM=TRM + DRM) • An SRM that stages files from tertiary storage into its disk cache • SRMs and File transfers • SRMs DO NOT perform file transfer • SRMs DO invoke file transfer service if needed (GridFTP, FTP, HTTP, …) • SRMs DO monitor transfers and recover from failures • TRM: from/to MSS • DRM: from/to network
... client client Client-SRM Communication file access multi-file request file transfer requests network Tape System ... File Transfer Service File Transfer Service DRM Disk Cache Disk Cache Disk Cache A multi-file request to aDisk Resource Manager
... client client file access multi-file request SRM-SRM Communication file transfer requests network Tape System ... DRM DRM HRM Disk Cache Disk Cache Disk Cache Accessing Remote Storage Resource Managers
... Client’s site client client logical query Metadata catalog A set of logical files Replica catalog Request Interpreter Request Executer request planning Execution plan and site-specific files Storage Resource Manager Network Weather Service result files Requests for data placement and remote computation Execution DAG network Storage Resource Manager Compute Resource Manager Storage Resource Manager Compute Resource Manager Storage Resource Manager ... MSS Disk Cache Compute Engine Compute Engine Disk Cache Disk Cache Disk Cache Site 1 Site 2 Site N :Uniform SRM Interface General Analysis Scenario
SRM is a Service(OGSA, CORBA, C++, Java, …) • SRM functionality • Manage space • Negotiate and assign space to users • Manage “lifetime” of spaces • Manage files on behalf of a user • Pin files in storage till they are released • Manage “lifetime” of files • Manage action when pins expire (depends on file types) • Manage file sharing • Policies on what should reside on a storage resource at any one time • Policies on what to evict when space is needed • Get files from remote locations when necessary • Purpose: to simplify client’s task • Manage multi-file requests • A brokering function: queue file requests, pre-stage when possible • Provide grid access to/from mass storage systems • HPSS (LBNL, ORNL, BNL), Enstore (Fermi), JasMINE (Jlab), Castor (CERN), MSS (NCAR), …
Disk Cache Disk Cache Disk Cache Disk Cache Disk Cache SRM works with other SRMsas well as legacy systemsby using GridFTP client Logical Request Request Interpreter Request Manager Legend: GridFTP DRM Control path Data Path Chicago Berkeley Livermore Berkeley server server server server GridFTP DRM FTP GridFTP HRM GridFTP
disk disk disk disk Earth System Grid LBNL HPSS High Performance Storage System ANL CAS Community Authorization Services NCAR HRM Storage Resource Management gridFTP Striped server gridFTP server openDAPg server Tomcat servlet engine MyProxy server MCS client MyProxy client LLNL DRM Storage Resource Management RLS client CAS client DRM Storage Resource Management gridFTP server GRAM gatekeeper ORNL gridFTP server gridFTP HRM Storage Resource Management USC-ISI gridFTP gridFTP server HRM Storage Resource Management SOAP MCS Metadata Cataloguing Services HPSS High Performance Storage System RMI RLS Replica Location Services NCAR-MSS Mass Storage System
Disk Cache Uniformity of Interface Compatibility of SRMs Client USER/APPLICATIONS Grid Middleware SRM SRM SRM SRM SRM SRM Enstore DCache JASMine CASTOR
: . G N O R 2 S O T R O I E E O T Request Workflow or C C N V I A L Application- Community Consistency Services I I I F Interpretation Request C A T V A I I Specific Data Authorization (e.g., Update Subscription, C C R U M L and Planning Management E T E E P O E Discovery Services Services Versioning, Master Copies) P L S R P V Services Services D I L S I A V T O C C E L L G : R O 1 S N O C I E E E L F T V L Data Filtering or C A Data Data General Data Storage Compute Monitoring/ A I S P R R T N I E Transformation E Transport Federation Discovery Management Scheduling Auditing U T I C C D L N O E I Services Services Services Services (Brokering) (Brokering) Services R U E V L S L M G O R E O E O R C S C : E C Resource Storage File Transfer Data Filtering or Database Compute R Monitoring/ U Service Resource Transformation Management Resource O Auditing (GridFTP) Manager Services Services Management S E R Y T I V I T Communication Authentication and C E Protocols (e.g., Authorization N TCP/IP stack) Protocols (e.g., GSI) N O C C I Other Storage R Mass Storage System (HPSS) Compute B Networks A This figure based on the Grid Architecture paper by Globus Team Systems F systems Where do SRMs belongin the Grid architecture?
: . G N O R 2 S O T R O I E E O T Request Workflow or C C N V I A L Application- Community Consistency Services I I I F Interpretation Request C A T V A I I Specific Data Authorization (e.g., Update Subscription, C C R U M L and Planning Management E T E E P O E Discovery Services Services Versioning, Master Copies) P L S R P V Services Services D I L S I A V T O C C E L L G : R O 1 S N O C I E E E L F T V L Data Filtering or C A Data Data Storage General Data Compute Monitoring/ A I S P R R T N I E Transformation E Transport Federation Management Discovery Scheduling Auditing U T I C C D L N O E I Services Services Services (Brokering) Services (Brokering) Services R U E V L S L M G O R E O E O R C S C E L : S G E E N C Resource I Storage C File Transfer Data Filtering or Database Compute R S R Monitoring/ U Service Resource Transformation Management Resource U G O O N Auditing (GridFTP) Manager Services Services Management S I S R E E A R R H S Y T I V I T Communication Authentication and C E Protocols (e.g., Authorization N TCP/IP stack) Protocols (e.g., GSI) N O C C I Other Storage R Mass Storage System (HPSS) Compute B Networks A This figure based on the Grid Architecture paper by Globus Team Systems F systems SRMs provide a brokering serviceby supporting multi-file requests
Anywhere DataMover (Command-line Interface) Recovers from file transfer failures Recovers from staging failures Get list of files From directory Recovers from archiving failures HRM-COPY (thousands of files) NCAR SRM-GET (one file at a time) HRM (performs writes) HRM (performs reads) LBNL/ ORNL GridFTP GET (pull mode) NCAR-MSS Network transfer Web-based File Monitoring Tool archive files stage files Disk Cache Disk Cache DataMover: SRMs use in ESG and PPDG for Robust Muti-file replication
Concepts: Types of Files • Volatile: temporary files with a lifetime guarantee • Files are “pinned” and “released” • Files can be removed by SRM when released or when lifetime expires • Permanent • No lifetime • Files can only be removed by creator (owner) • Durable: files with a lifetime that CANNOT be removed by SRM • Files are “pinned” and “released” • Files can only be removed by creator (owner) • If lifetime expires – invoke administrative action (e.g. notify owner, archive and release)
Concepts: Types of Spaces • Types • Volatile • Space can be reclaimed by SRM when lifetime expires • durable • Space can be reclaimed by SRM only if it does NOT contain files • Can choose to archive files and release space • Permanent • Space can only be released by owner or administrator • Assignment of files to spaces • Files can only be assigned to spaces of the same type • Spaces can be reserved • No limit on number of spaces • Space reference handle is returned to client • Total space of each type are subject to SRM and/or VO policies • Default spaces • Files can be put into SRM spaces without explicit reservation • Defaults are not visible to client • Compacting space • Release all unused space – space that has no files or files whose lifetime expired
Concepts: Directory Management • Usual unix semantics • srmLs, srmMkdir, srmMv, srmRm, srmRmdir • A single directory for all file type • No directories for each type • File assignment to types is virtual • File can be placed in SRM-managed directories by maitaining mapping to client’s directory • Access control services • Support owner/group/world permission • Can only be assigned by owner • When file requested by user, SRM should check permission with source site
Examples of Directory Structures(user defined) D1 D1 D2 D3 D2 D3 D4 D4 F1 (D) F2 (P) F3 (V) F1 (V) F2 (V) F3 (V) F4 (D) F5 (D) F6 (D) F7 (P) F8 (P) F4 (P) F5 (D) (1) Mixed file types (2) By file type • Supported function: ChangeFileType • Advantage of (1): no need to move files when file types are changed
Concepts: Space Reservations • Negotiation • Client asks for space: C-guaranteed, MaxDesired • SRM return: S-guaranteed <= C-guaranteed, best effort <= MaxDesired • Type of space • Can be specified • Subject to limits per client (SRM or VO policies) • Default: volatile • Lifetime • Negotiated: C-lifetime requested • SRM return: S-lifetime <= C-lifetime • Reference handle • SRM returns space reference handle • User can provide: srmSpaceTokenDescription to recover handles
Concepts: Transfer Protocol Negotiation • Negotiation • Client provides an ordered list • SRM return: highest possible protocol it supports • Example • Protocols list: bbftp, gridftp, ftp • SRM returns: gridftp • Advantages • Easy to introduce new protocols • User controls which protocol to use • Default – SRM policy choice • How it is returned? • The protocol of the Transfer URL (TURL) • Example: bbftp://dm.slac.edu/temp/run11/File678.txt
Concepts: Multi-file requests • Can srmRequestToGet multiple files • Required: Files URLs • Optional: space file type, space handle, Protocol list • Optional: total retry time • Provide: Site URL (SURL) • URL known externally – e.g. in Rep Catalogs • e.g. srm://sleepy.lbl.gov:4000/tmp/foo-123 • Get back: transfer URL (TURL) • Path can be different that in SURL – SRM internal mapping • Protocol chosen by SRM • e.g. gridftp://dm.lbl.gov:4000/home /level1/foo-123 • Managing request queue • Allocate space according to policy, system load, etc. • Bring in as many files as possible • Provide information on each file brought in or pinned • Bring additional files as soon as files are released • Support file streaming
SRM functionality • Space reservation • Negotiate and assign space to users • Manage “lifetime” of spaces • Release and compact space • File management • Assign space for putting files into SRM • Pin files in storage when requested till they are released • Manage “lifetime” of files • Manage action when pins expire (depends on file types) • Get files from remote locations when necessary • Purpose: to simplify client’s task • srmCopy: in “pull” and “push” modes
SRM functionality (Cont’d) • Space management policies and file sharing • Policies on what should reside on a storage resource at any one time • Policies on what to evict when space is needed • Share files to avoid getting them from remote locations • Manage multi-file requests • Queues file requests, pre-stage when possible • Status functions • Files: lifetime remaining, what’s available locally • Requests: what files are available (needed in lieu of callbacks) • Request summary: for progress report • Space metadata: space in use, space available, lifetime • Provide grid access to/from mass storage systems • HPSS (LBNL, ORNL, BNL), Enstore (Fermi), JasMINE (Jlab), Castor (CERN), MSS (NCAR), SE (RAL) …
SRM Methods Space management srmReserveSpace srmReleaseSpace srmUpdateSpace srmCompactSpace: srmGetCurrentSpace: FileType management srmChangeFileType: Status/metadata srmGetRequestStatus: srmGetFileStatus: srmGetRequestSummary: srmGetRequestID: srmGetFilesMetaData: srmGetSpaceMetaData: File Movement srm(Prepare)Get: srm(Prepare)Put: srmReplicate: Lifetime management srmReleaseFiles: srmPutDone: srmExtendFileLifeTime: Terminate/resume srmAbortRequest: srmAbortFile srmSuspendRequest: srmResumeRequest:
Summary: advantages of using SRMs • Synchronization between storage resources • Pinning file, releasing files • Allocating space dynamically on as “needed basis” • Insulate clients from storage and network system failures • Transient MSS failure • Network failures • Interruption of large file transfers • Facilitate file sharing • Eliminate unnecessary file transfers • Support “streaming model” • Use space allocation policies by SRMs: no reservations needed • Use explicit release by client for reuse of space • Control number of concurrent file transfers • From/to MSS – avoid flooding MSS and thrashing • From/to network – avoid flooding and packet loss
Web-Based File Monitoring Tool • Shows: • Files already transferred- Files during transfer • Files to be transferred • Also shows for • each file: • Source URL • Target URL • Transfer rate
File tracking helps to identify bottlenecks Shows that archiving is the bottleneck
File tracking shows recovery from transient failures Total: 45 GBs
Ongoing and Future Work • Ongoing work • Developing Standard SRM interfaces • Particle Physics Data Grid (PPDG) project • LBNL, TJNAF, FNAL • European Data Grid (EDG) project • WP2 - data management • WP5 – mass storage • Deployment • LBNL, BNL, ORNL, TJNAF, FNAL, CERN, (SE-England) • Use of SRM by other agents • Storage Resource Broker (SDSC) calling HRM to Stage files from HPSS • GridFTP invoking HRM • New Spec completed (SRM V2.1) • directory management • File/directory file movement • dynamic space management • Future work • Access authorization – community access service (CAS) • “On-demand” space allocation, accounting, and charging • Replica management – invoke SRMs and RLS as a single service • Request executer (e.g. DAGMAN) to invoke SRMs • SRMs over NeST (Network STorage)