390 likes | 555 Views
Advanced topic on data management. The SRM protocol and the StoRM implementation. Advanced topic on data management. I will briefly describe how the classic SE works: Highlight design points and consequences for file security. File security: POSIX-like ACL access to files from the GRID .
E N D
Advanced topic on data management The SRM protocol and the StoRM implementation
Advanced topic on data management • I will briefly describe how the classic SE works: • Highlight design points and consequences for file security. • File security: POSIX-like ACL access to files from the GRID. • I’ll then talk about the SRM protocol: • Its origin to allow tape resources to be accessed from the GRID. • Particular attention to design differences with classic SE. • SRM transition as an interface to disk storage resources. • Differences with Tape based systems. • I’ll finally talk about StoRM: an SRM implementation that allows POSIX like ACL access.
Classic SE • It allows disk resources to be accessed from the GRID. • What makes a machine into a SE? Three components are needed: • A component that publishes and tells the GRID that it is an available storage resource. • The usual framework for authentication: GSI. • A component that actually moves the files around: the characterizing feature!
Classic SE • Component that allows the GRID to be aware of its presence, i.e. to be included in the GRID information system • There is an LDAP Server that publishes information about the SE. • Information organised according to the GlueSchema: specifically by the GlueSEUniqueID entity. • Information describing the SE such as its name and listening port of service. • Information specific to each VO that the SE is serving such as the local path to the file holding directory, available space, etc. • Part of the information is updated dynamically, especially that concerning the disk space available and disk space occupied. • It is done through LDAP Providers found in /opt/lcg/libexec. • The providers run periodically scripts which update the dynamic information. • Finally the rest of the grid information system periodically polls the information made available by the SE present there.
Classic SE • User authentication: Grid Security Infrastructure GSI • Core of GLOBUS 2.4 libraries: used by service in charge of moving files around! • i.e. /opt/globus/lib/libglobus_gsi_credential_gcc32dbg.so.0, /opt/globus/lib/libglobus_gsi_proxy-core_gcc32dbg.so.0, etc. • Set of scripts run by cron jobs to manage pool accounts: • /opt/edg/sbin/edg-mkgridmap creates a gridmap file by reading a local configuration file that specifies sources of allowed credentials, from LDAP server or a specific file. • /opt/edg/sbin/lcg-expiregridmapdir used to remove the mapping to local credentials when a grid user no longer is working on that machine. • /opt/edg/sbin/edg-fetch-crl used to retrieve revocation lists of invalid certificates.
Classic SE Component that carries out the functionality of moving files around the GRID. • In general it is just any implementation of a transport protocol that implements GSI! • GridFTP most common! • RFIO • Anything that somebody comes up with as long as it is GSI enabled: it is just a matter of who will adopt it and use it!
Classic SE GridFTP: • Essentially an FTP server extended/optimized for large data transfers: • Parallel streams for speed. • Allows checkpoints during file transfers, for later resuming. • Authentication through GSI certificates instead of user name + password
Classic SE • Central point: • It is FTP! A user can do what an FTP client allows to be done! • There is no separation of what can be done from the grid, and the actual transport protocol. • There is no explicit and separate list of file manipulation operations that can be done from the grid! • There is no uniform view of the possible file manipulations: they are linked to the underlying transport protocol! • Depending on the protocol you may not have the same functionality • For the same functionality the specific protocol must be used: it may not be possible to access seamlessly all SEs!
Classic SE Compare with CEs that have LRMS interface to forked jobs or to batch jobs. • It is an abstraction layer on the kinds of computations that can be done. • LRMS may not be a great protocol (gLite CEs are somewhat different)… yet it is an attempt to introduce an abstraction.
Classic SE A more serious consequence of the lack of abstraction is how to apply POSIX ACL like control on files, from the grid. It is left up to the transport protocol! • For GridFTP: • It is FTP modified for GSI. • FTP allows file manipulation compatible with underlying Unix filesystem permissions. • If grid control on files is needed, it is the underlying filesystem that must be carefully managed! • Map users to specific local accounts: not pool accounts. Each grid user can be controlled individually once it gets into the machine. • Partition local accounts into especially created groups: reflects data access patterns. • Carefully crafted directory tree guides data access. • So a grid user with no access rights to a file is stopped because the GridFTP server gets stopped on its track by the local filesystem!
Classic SE • In any case the proposed solution is problematic because data may be present in several SEs: • Users have same UID across all SEs. • Replication/Synchronisation of directory structure across all SEs. • Users supplied with tools to manage permissions coherently across all SEs.
Classic SE Central point: • GRID lacked the concept of access control within the same VO. • It was only possible to find it when passing to the local machine. • The local machine had the means to enforce it: users + group membership. • Security therefore is set up behind the scenes at the implementation level! • No GRID concept involved! No GRID abstraction available to: • Express fine grained authorization. • Express what can be accessed. • Check GRID credentials.
Classic SE VOMS proxies and GridFTP • Allows to define roles and groups: it therefore allows for fine tuning who the GRID user is. • It is up to the system receiving these detailed credentials to decide what local resources to use. • For SE there is still the same problem of explicitly listing what these resources are: dependency on the transport protocol as stated.
The SRM protocol Storage Resource Manager protocol: • Originally devised to allow grid access to tape based resources that had a disk area acting as cache. • Staging of files: • A request for a file arrives • If it is in cache it is returned right away • Otherwise it is first fetched from tapes, copied to disk and then returned. • The system takes care of consistency between cache and tapes. • Needed to offset latency due to robotic arm switching tapes.
The SRM protocol SRM designed to handle that Tape/Disk-cache scenario, from the GRID: • The presence of cache area introduces the concept of file type: • Volatile: files get written in cache and the system then removes them automatically after a lifetime expires. • Permanent: the files that get into cache are not removed automatically by the system • Durable: files do have a lifetime that may expire but the system does not remove them and instead sends an e-mail notification to the user.
The SRM protocol • File staging introduces the concept of asynchronous calls to get or put a file: • SRM request issued to get a file • Server replies immediately without waiting for staging to complete. • Server returns a Request Token which the client uses to periodically poll the request’s status.
The SRM protocol • The cache area also introduces a partition of file namespace: • Tape must store files: there have to be names that uniquely identify the file in tape! • The cache area must serve files. • It may return a path to fetch the file on disk that is different from the name that allows to uniquely identify the file in tape. • It can easily support different fetching mechanisms… that is different transport protocols! • SRM reflects this distinction in the concept of SURLs and TURLs: • SURL: Storage URL - A name that identifies a grid file in SRM storage: it is what the GRID sees! • srm://storage.egrid.it:8334/old-stocks/NYSE.txt • TURL: Transfer URL – A name that identifies a transport protocol and the path to fetch the file: it is how the GRID moves the file around! • gridftp://storage.egrid.it:2110/home/ecorso/examples/2005/data.txt
The SRM protocol Central point: • SRM introduces an abstraction to separate transfer protocol from the file operation itself. • Although introduced to handle the cache area, it also solves classic SE issues! • It decouples file operations from transfer protocol!
The SRM protocol Direct consequence: • SRM servers do not move files in and out of GRID storage! • They only return TURLS! • It is up to the SRM client once it gets a TURL to call a GridFTP/RFIO/etc client for moving files! • SRM acts only as a broker for file management requests! • Transfer is decoupled from data presentation!
The SRM protocol Extra features and concepts in the protocol: • Big issue of not running out of space during a large file transfer. • System used by the HEP community to store/manage huge amounts of data from LHC. • SRM introduced space management and reservation interface.
The SRM protocol • It distinguishes three types of reserved disk space: • Volatile: will be freed by the system as soon as its lifetime expires. • Permanent: will not be freed by the system. • Durable: will not be freed but the user that allocated it will be warned. • Space type and file type cannot be mixed in arbitrary ways: • Permanent space will be able to host all three types of files. • Volatile space can only host Volatile files. • The general way of working: • Space request is made. • Server returns a SpaceToken. • All subsequent SRM calls made by the client pass on the token. • The SRM server keeps track tokens and recognises allocated space.
The SRM protocol The protocol calls: Data Transfer Functions • Misnomer… no data is moved by an SRM server • srmPrepareToPut, srmPrepareToGet: for putting a file into GRID storage or getting one out. • srmStatusOfPutRequest srmStatusOfGetRequest for polling! • They work on SURLs!
The SRM protocol The protocol calls: Cache area management • srmExtendFileLifeTime for extending lifetime of volatile files • srmRemoveFiles to remove permenent files • srmReleaseFiles, srmPutDone to force early lifetime expiry
The SRM protocol The protocol calls: Directory functions to manage files in tape • srmRmdir • srmMkdir • srmRm • srmLs • They work on SURL!
The SRM protocol • The protocol calls: Space management functions • srmReserveSpace • srmReleaseSpace • srmGetSpaceMetaData • Space Token returned and used with all Data transfer functions.
SRM applied to disk storage! • SRM addresses the issues of classic SE: it is natural to use it also for disk resources. • There was also another important driving force for its adoption: • Many facilities were in place for LHC analysis of data coming from experiments production centres. • The facilities had high performance storage solutions in place, employing disk parallel file systems such as GPFS and Lustre. • With advent of GRID technologies it became necessary to adapt existing installations to the GRID.
SRM applied to disk storage! • The context of operation is now different: • No tape with a cache in between • In general all concepts are kept with slight semantic adjustments • SURL/TURL distinction is kept - it decouples transfer protocol from data presentation as stated. • Three file types are kept - some files may be copied and live just for a certain amount of time. • Space reservation is kept - it is an important functionality. • Directory functions are kept.
SRM applied to disk storage! Some compromises: • Asynchronous nature of srmPrepareToGet, srmPrepareToPut and srmCopy, remain although don’t make sense. • SpaceType distinction makes less sense: • Arguably the whole disk can be seen as permanent space, and so allow all three file types. • Akin to tapes that are permanent by their nature. • Releasing of file and lifetime extension remain for volatile files; srmRemoveFiles for managing cache files does not make sense
StoRM SRM implementation Result of collaboration between: INFN - Grid.IT Project from the Physics community + ICTP - EGRID Project: to build a pilot national grid facility for research in Economics and Finance (www.egrid.it)
StoRM SRM implementation • StoRM’s implementation of SRM 2.1.1 meant to meet three important requirements from Physics community: • Large volumes of data exasperating disk resources: Space Reservation is paramount. • Boosted performance for data management: direct POSIX I/O call. • Security on data as expressed by VOMS: strategic integration with VOMS proxies.
StoRM SRM implementation • EGRID Requirements: • Data comes from Stock Exchanges: very strict legally binding disclosure policies. POSIX-like ACL access from GRID environment. • Promiscuous file access: existing file organisation on disk seamlessly available from the grid + files entering from the grid must blend seamlessly with existing file organisation. Very challenging – probably only partly achievable! • StoRM: disk based storage resource manager… allows for controlled access to files – major opportunity for low level intervention during implementation.
StoRM SRM implementation • How StoRM solves POSIX-like ACL access from the GRID: • All file requests are brokered with SRM protocol. • When StoRM receives an SRM request for a file: • StoRM asks policy source for access rights to: given SURL for given grid credentials. • Check is made at the grid credential level: not local user as before! And it is done on a grid view of a file as identified by the SURL!
StoRM SRM implementation • The only part of the implementation outside of the protocol is the Policy Source: a GRID service that is able to formulate/express physical access rules to resources. • StoRM leverages grid’s LogicalFileCatalogue (LFC) as policy source: it is intended for Logical Names! StoRM therefore stretches its use. Still, it is very GRID-friendly: it is not a proprietary solution! • It would be better to have it explicitly in the SRM protocol: SRM 2.1.1 does have some Permission functions but their expressive power is weak, and in the next version of the protocol they will be re-addressed (srmSetPermission, srmReassignToUser, srmCheckPermission).
StoRM SRM implementation • A last note: physical enforcement through JustInTime ACL setup. • All files have no ACLs setup: no user can access files. • Local Unix account corresponding to grid credentials is determined. • ACL granting requested access set up for local user. • ACL removed when file no longer needed.
Advanced topic on data management Thank-you!