10 likes | 131 Views
SA1 – Data Grid Interoperation. Enabling Grids for E-sciencE. Grid Data Interoperation (Part II): Data, Metadata, Catalogues. GridFTP. Info. SRM. Data Mode 1: Pretend SRB is a “Classic SE” Classic SE (still) supported by gLite FTS. Three interoperation modes for data transfers: FTS
E N D
SA1 – Data Grid Interoperation Enabling Grids for E-sciencE Grid Data Interoperation (Part II): Data, Metadata, Catalogues GridFTP Info SRM Data Mode 1: Pretend SRB is a “Classic SE” Classic SE (still) supported by gLite FTS Three interoperation modes for data transfers: FTS SRM drives transfer via srmCopy() (not shown) lcg-utils DPM, dCache, StoRM, CASTOR,… SRB FTS Data Interoperation SRM SRM selects pool node… GridFTP GridFTP SRB GridFTP GridFTP Disk storage Disk storage Disk storage Catalogue lcg-* User Domain Add a static information provider using BDII LFC Domain (LCG File Catalogue) LFN Can also use to move data to/from disks with GridFTP SRM Data Mode 3: using lcg-utils GUID BDII GridFTP GridFTP SRB SURLs are GSIFTP URLs; SRB TURLs are the same as the SURLs. lcg-* tools do not accept GSIFTP SURLs GridFTP GridFTP Disk storage SRB Disk storage Disk storage Storage Elements SURLTURL SURL • Two approaches to file metadata management (primary key): • Use the GUID as filename – shallow hierarchy • Use the original filename (or algorithmically derived name) TURL /grid/isis/guid/c4756f6e-7963-47ad-ac8a-59726afa4992 vs /grid/isis/NDXINTER/Instrument/data/cycle_08_5/INTER00000544.raw Strategy: always clone file to SRM, then register clone in LFC. (Fallback: register GridFTP SURL in other catalogue, or hack LFC, or use AMGA to keep track of replicas.) The former makes sense on the grid: register meaningful LFNs to point to GUID in LFC. The latter does not depend on LFC/replications. Metadata is associated to primary key. Avoid metadata in filenames! Metadata ISIS Metadata held in separate catalogue, the iCAT (not to be confused with the iRODS catalogue). Format is XML. iCAT uses Oracle. Data mover Neutron source at RAL Proposal FTS still supports “Classic SE” ASGC SRM interface to SRB will become preferred Doesn’t move metadata though. Instrument/experiment Dataset File • Dataset attributes: • Date, owner, run title, status, keywords, location) • Currently managing datasets with separate dataset sequence attribute: • Once one file is found, the rest of the dataset is located • Mirrors original use where metadata is kept with a single file Metadata migration File File Parameters iCAT metadata is hierarchical: associated with individual file or dataset. Current support is simplistic: dump metadata with any file in dataset (works in current limited scenarios) Data is held in SRB. SRB’s metadata facility is hardly used. Experiences Needs glue to make it work together. Can improve on original use, maybe Had to custom build metadata schema on gLite side Custom build metadata copier SRB metadata (key/value pairs) iCAT schema: Only basic attributes so far (datasets, instrument, owner) TODO: Other SRB users: eMinerals, eMaterials, RMCS Work on integration with job submission, maybe portal Track work on datasets: provenance Improve metadata support iCAT in Google code: http://code.google.com/p/icatproject/ Authors: Jens Jensen, STFC (corresponding) Sam Skipsey, University of Glasgow Chris Moreton-Smith, ISIS, STFC Data mover todo: Improve Modularise metadata porting Generalise? References: S Burke et al: gLite User Guide, CERN EDMS 722398 J Jensen, R Downing, M Hodges, D Ross: SRM and SRB interoperation F Bonifazi et al: LHCb experience with LFC replication, Proc CHEP 2007 M Gleaves: ICAT software suite Special thanks to Michael Gleaves and Brian Matthews, STFC, for iCAT discussions, and to Birger Koblitz, CERN, for AMGA support/suggestions EGEE-III INFSO-RI-222667 http://www.ngs.ac.uk/ http://www.isis.rl.ac.uk/ http://www.gridpp.ac.uk/