820 likes | 946 Views
Storage and Data. Grid Middleware 6 David Groep, lecture series 2005-2006. Outline. Data management concepts metadata, logical filename, SURL, TURL, object store Protocols GridFTP, SRM RFT/FTS, FPS & scheduled transfers with GT4 (LIGO) End-to-end integrated systems SRB
E N D
Storage and Data Grid Middleware 6 David Groep, lecture series 2005-2006
Grid Middleware VI 2 Outline • Data management concepts • metadata, logical filename, SURL, TURL, object store • Protocols • GridFTP, SRM • RFT/FTS, FPS & scheduled transfers with GT4 (LIGO) • End-to-end integrated systems • SRB • Structured data and databases • OGSA-DAI • Data curation issues • media migration • content conversion (emulation or translation?)
Grid Middleware VI 3 Grid data management • Data in a grid need to be • located • replicated • life-time managed • accessed (sequentially and at random) • and the user does not know where the data is
Grid Middleware VI 4 Types of storage • ‘File oriented’ storage • cannot support content-based queries • needs annotation & metadata to be useful(note that a file system and name is a ‘type of meta-data’) • most implementations can handle any-sized object(but MSS tape systems cannot handle very small files) • Databases • structured data representation • supports content queries well via indexed searches • good for small data objects (with BLOBs of MBytes, not GBytes)
Grid storage structure For file oriented storage
Grid Middleware VI 6 File storage layers (file system analogy) • Separation the storage concepts • helps for both better interoperation and scalability • Semantic view • description of data in words and phrases • Meta-data view • describe data by attribute-value pairs (filename is also an A-V pair) • like filesystems like HPFS, EXT2+, AppleFS with ‘extended attributes’ • Object view • refers to a blob of data by a meaningless handle (unique ID) • e.g. in typical Unix FS’s: inode • FAT: directory entry + alloc table (mixes filename and object view) • Physical view • block devices: series of blocks on a disk, or a specific tape & offset
Grid Middleware VI 7 Storage layers (grid naming terminology) • LFN (Logical File Name) – level 2 • like the filename in the traditional file system • may have hierarchical structure • is not directly suitable for access, as it is site independent • GUID (Globally Unique ID) – level 3 • opaque handle to reference a specific data object • still independent of the site • GUID-LFN mapping in 1-n • SURL (Storage URL, of physical file name PFN) – level 3 • SE specific reference to a file • understood by the storage management interface • GUID-SURL mapping is 1-n • TURL (Transfer URL) – ‘griddy level 4’ • current physical location of a file inside a specific SE • is transient (i.e. only exists after being returned by the SE management interface) • has a specific lifetime • SURL-TURL mapping is 1-(small number, typically 1) terminology from EDG, gLite and Globus
Grid Middleware VI 8 Data Management Services Overview
Grid Middleware VI 9 Storage concepts using the OSG-EDG-gLite terminology … • Storage Element • management interface • transfer interface(s) • Catalogues • File Catalogue (meta-data catalogues) • Replica Catalogue (location services & indices) • Transfer Service • File Placement • Data Scheduler
Grid Middleware VI 10 Grid Storage Concepts: Storage Element • Storage Element • responsible for manipulating files, on anything from disk to tape-backed mass storage • contains services up to the filename level • the filename typically an opaque handle for files, • as a higher-level file catalogue serves the meta-data, and • the same physical file will be replicated to several SEs with different local file names • SE is a site function (not a VO function) • Capabilities • Storage space for files • Storage Management interface (staging, pinning) • Space management (reservation) • Access (read/write, e.g. via gridFTP, HTTP(s), Posix (like)) • File Transfer Service (controlling influx of data from other SEs)
Grid Middleware VI 11 Storage Element: grid transfer services Possiblities • GridFTP • de-facto standard protocol • supports GSI security • features: striping & parallel transfers, third-party transfers (TPTs, like regular FTP) part of protocol • issue: firewalls don’t ‘like’ open port ranges needed by FTP(neither active nor passive) • HTTPs • single port, so more firewall-friendly • implementation of GSI and delegation required (mod_gridsite) • TPTs not part of protocol • …
Grid Middleware VI 12 GridFTP • ‘secure, robust, fast, efficient, standards based, widely accepted’ data transfer protocol • Protocol based • Multiple Independent implementation can interoperate • Globus Toolkit supplies reference implementation • Server, Client tools (globus-url-copy), Development Libraries
Grid Middleware VI 13 GridFTP: The Protocol • FTP protocol is defined by several IETF RFCs • Start with most commonly used subset • Standard FTP: get/put etc., 3rd-party transfer • Implement standard but often unused features • GSS binding, extended directory listing, simple restart • Extend in various ways, while preserving interoperability with existing servers • Striped/parallel data channels, partial file, automatic & manual TCP buffer setting, progress monitoring, extended restart source: Bill Allcock, ANL, Overview of GT4 Data Services, 2004
Grid Middleware VI 14 GridFTP: The Protocol (cont) • Existing standards • RFC 959: File Transfer Protocol • RFC 2228: FTP Security Extensions • RFC 2389: Feature Negotiation for the File Transfer Protocol • Draft: FTP Extensions • GridFTP: Protocol Extensions to FTP for the Grid • Grid Forum Recommendation • GFD.20 • http://www.ggf.org/documents/GWD-R/GFD-R.020.pdf source: Bill Allcock, ANL, Overview of GT4 Data Services, 2004
Grid Middleware VI 15 Striped Server Mode • Multiple nodes work together *on a single file* and act as a single GridFTP server • An underlying parallel file system allows all nodes to see the same file system and must deliver good performance (usually the limiting factor in transfer speed) • I.e., NFS does not cut it • Each node then moves (reads or writes) only the pieces of the file that it is responsible for. • This allows multiple levels of parallelism, CPU, bus, NIC, disk, etc. • Critical if you want to achieve better than 1 Gbs without breaking the bank source: Bill Allcock, ANL, Overview of GT4 Data Services, 2004
Grid Middleware VI 16 source: Bill Allcock, ANL, Overview of GT4 Data Services, 2004
Grid Middleware VI 17 Disk to Disk Striping Performance source: Bill Allcock, ANL, Overview of GT4 Data Services, 2004
Grid Middleware VI 18 GridFTP: Caveats • Protocol requires that the sending side do the TCP connect (possible Firewall issues) • Working on V2 of the protocol • Add explicit negotiation of streams to relax the directionality requirement above(*) • Optionally adds block checksums and resends • Add a unique command ID to allow pipelining of commands • Client / Server • Currently, no server library, therefore Peer to Peer type apps VERY difficult • Generally needs a pre-installed server • Looking at a “dynamically installable” server (*)DG: like a kind of application-level BEEP protocol source: Bill Allcock, ANL, Overview of GT4 Data Services, 2004
Grid Middleware VI 19 SE transfers: random access wide-area R/A for files is new typically address by adding GSI to existing cluster protocols • dcap -> GSI-dcap • rfio -> GSI-RFIO • xrootd -> ?? One (new) OGSA-style service • WS-ByteIO • Bulk interface • RandomIO interface • posix-like • needs negotiation of actual transfer protocol • attachment, DIME, …
Grid Middleware VI 20 SE transfer: local back-end access backend of a grid store is not always just a disk • distributed storage systems without native posix • even if posix emulation is provided, that is always slower! • for grid use, need to also provide GridFTP • and a management interface: SRM • local access might be through the native protocol • but the application may not know • and it is usually not secure enough to run over WAN • so no use for ‘non-LAN’ use by others in the grid
Grid Middleware VI 21 Storage Management (SRM) • common management interface on top of many backend storage solutions • a GGF draft standard (from the GSM-WG)
Grid Middleware VI 22 Standards for Storage Resource Management • Main concepts • Allocate spaces • Get/put files from/into spaces • Pin files for a lifetime • Release files and spaces • Get files into spaces from remote sites • Manage directory structures in spaces • SRMs communicate other SRMs as peer-to-peer • Negotiate transfer protocols • No logical name space management (can come from GGF- GFS) source: A. Sim, CRD, LBNL 2005
Grid Middleware VI 23 SRM Functional Concepts • Manage Spaces dynamically • Reservation, allocation, lifetime • Release, compact • Negotiation • Manage files in spaces • Request to put files in spaces • Request to get files from spaces • Lifetime, pining of files, release of files • No logical name space management (rely on GFS) • Access remote sites for files • Bring files from other sites and SRMs as requested • Use existing transport services (GridFTP, http, https, ftp, bbftp, …) • Transfer protocol negotiation • Manage multi-file requests • Manage request queues • Manage caches, pre-caching (staging) when possible • Manage garbage collection • Directory Management • Manage directory structure in spaces • Unix semantics: srmLs, srmMkdir, srmMv, srmRm, srmRmdir • Possible Grid access to/from MSS • HPSS, MSS, Enstore, JasMINE, Castor source: A. Sim, CRD, LBNL 2005
Grid Middleware VI 24 SRM Methods by the features Space management srmCompactSpace srmGetSpaceMetaData srmGetSpaceToken srmReleaseFilesFromSpace srmReleaseSpace srmReserveSpace srmUpdateSpace Authorization Functions srmCheckPermission srmGetStatusOfReassignment srmReassignToUser srmSetPermission Request Administration srmAbortRequestedFiles srmRemoveRequestedFiles srmResumeRequest srmSuspendRequest Core (Basic) srmChangeFileStorageType srmExtendFileLifetime srmGetFeatures srmGetRequestSummary srmGetRequestToken srmGetSRMStorageInfo srmGetSURLMetaData srmGetTransferProtocols srmPrepareToGet srmPrepareToPut srmPutFileDone srmPutRequestDone srmReleaseFiles srmStatusOfGetRequest srmStatusOfPutRequest srmTerminateRequest Copy Function srmCopy srmStatusOfCopyRequest Directory Function srmCp srmLs srmMkdir srmMvsrmRm srmRmdir srmStatusOfCpRequest srmStatusOfLsRequest source: A. Sim, CRD, LBNL 2005
Grid Middleware VI 25 SRM interactions
Grid Middleware VI 26 SRM Interactions
Grid Middleware VI 27 SRM Interactions
Grid Middleware VI 28 SRM Interactions
Grid Middleware VI 29 SRM Interactions
Grid Middleware VI 30 SRM Interactions
Grid Middleware VI 31 Storage infra example with SRM graphic: Mark van de Sanden, SARA
Grid Middleware VI 32 SRM Summary • SRM is a functional definition • Adaptable to different frameworks for operation (WS, WSRF, …) • Multiple implementations interoperate • Permit special purpose implementations for unique products • Permits interchanging one SRM product by another • SRM implementations exist and some in production use • Particle Physics Data Grid • Earth System Grid • More coming … • Cumulative experiences • SRM v3.0 specifications to complete source: A. Sim, CRD, LBNL 2005
Grid Middleware VI 33 Replicating Data • Data on the grid may, will and should exist in multiple copies • Replicas may be temporary • for the duration of the job • opportunistically stored on cheap but unreliable storage • contain output cached near a compute site for later scheduled replication • Replicas may also provide redundancy • application level instead of site-local RAID or backup
Grid Middleware VI 34 Replication issues • Replicas are difficult to manage • if the data is modifiable • and consistency is required • Grid DM today does not address modifiable data sets as soon as more than one copy of the data exists • otherwise, result would be either inconsistency • or requires close coordination between storage locations (slow) • or almost guarantees a deadlock • Some wide-area distributed file systems do this (AFS,DFS) • but are not scalable • or require a highly available network
Grid Middleware VI 35 Grid Storage concepts: Catalogues • Catalogues • index of files that link to a single object (referenced by GUID) • Catalogues logically a VO function, with local instances per site • Capabilities • expose mappings, not actual data • File or Meta-data Catalogue: names, metadata -> GUID • Replica Catalogue and Index: GUID - SURLs for all SEs containing the file
Grid Middleware VI 36 File Catalogues
Grid Middleware VI 37 graphic: Peter Kunszt, EGEE DJRA1.4 gLite Architecture
Grid Middleware VI 38 Alternatives to the File Catalogue • Store SURLs with data in application DB schema • better adapted to the application needs • easier integration in existing frameworks
Grid Middleware VI 39 Grid Storage Concepts: Transfer Service • Transfer service • responsible for moving (replicating) data between SEs • transfers are scheduled, as data movement capacity is scarce(not because of WAN network bandwidth, but because of CPU capacity and disk/tape bandwidth in data movement nodes!) • logically a per VO function, hosted at the site • builds on top of the SE abstraction and a data movement protocoland is co-ordinated with a specific SE • Capabilities • transfer SURL at SE1 to new SURL at SE2 • using SE mechanisms such as SRM-COPY, or directly GridFTP • either push or pull • subject to a set of policies, e.g. • max. number of simultaneous transfers between SE1 and SE2 • with specific timeout or #retries • asynchronous • states like: SUBMITTED, PENDING, ACTIVE, CANCELLING, CANCELLED, DONE_INCOMPLETE, DONE_COMPLETE • update replica catalogues (GUID->SURL mappings)
Grid Middleware VI 40 File Transfer Service graphic: gLite Architecture v1.0 (EGEE-I DJRA1.1)
Grid Middleware VI 41 FTS ‘Channels’ • Scheduled number of transfers from one site to a (set of) other sites • below: CERNCI to sites on the OPN (next slide)
Grid Middleware VI 42 FTS channels • for scaling reasons • one transfer agent for each channel, i.e. each SRC<->TGT pair • agents can be spread over multiple boxes
Grid Middleware VI 43 LHC OPN
Grid Middleware VI 44 in network terms • Cricket graph 2006 CERN->SARA via OPN • link speed is 10 Gb/s
Grid Middleware VI 45 FTS complex services • Protocol translation • although many will, not all SEs support GridFTP • FTS in that case needs protocol translation • translation through memory excludes third-party transfers Other Issues • credential handling • files on the source and target SE are readable for specific users and specific VO (groups) • SEs are site services, and sites want to be access by the end-user credential for tracability (not a generic “VO” account) • continued access to the user credential needed (like in any compute broker)
Grid Middleware VI 46 Grid Storage Concept: File Placement • Placement Service • manage transfers for which the host site is the destination • coordinate updates up the VO file catalogue and the actual transfers (via the FTS, a site-managed service) • Capabilities • transfer GUID or LFN from A to B(note: the FTS could only operate on SURLs) • needs access to the VO catalogues, and thus needs sufficient privileges to do the job(i.e. update the catalogues) • API can be the same as for the FTS
Grid Middleware VI 47 Data Scheduler • Like the placement service, but can direct requests to different sites
Grid Middleware VI 48 DM: Putting it all together graphic: gLite Architecture v1.0 (EGEE-I DJRA1.1)
Grid Middleware VI 49 GT4 view on the same issues • Similar functionalitybut more closely linked to the VO than the site • based on soft-state registrations(like the information system) • treats files as the basic resource abstraction next two slides: Ann Chervenak, ISI/USC: Overview of GT4 Data Management Services, 2004
RLS Framework Replica Location Indexes • Local Replica Catalogs (LRCs) contain consistent information about logical-to-target mappings RLI RLI LRC LRC LRC LRC LRC Local Replica Catalogs • Replica Location Index (RLI) nodes aggregate information about one or more LRCs • LRCs use soft state update mechanisms to inform RLIs about their state: relaxed consistency of index • Optional compression of state updates reduces communication, CPU and storage overheads • Membership service registers participating LRCs and RLIs and deals with changes in membership Grid Middleware VI