380 likes | 533 Views
Data Management. Ron Trompert SARA Grid Tutorial, 18-19 September 2006. Outline. Storage Infrastructures SRM Storage Elements in gLite Low Level Data Management LCG File Catalog (LFC) Datamanagement CLIs and APIs Examples FTS. Storage Infrastructures. Disk-only
E N D
Data Management Ron Trompert SARA Grid Tutorial, 18-19 September 2006
Outline • Storage Infrastructures • SRM • Storage Elements in gLite • Low Level Data Management • LCG File Catalog (LFC) • Datamanagement CLIs and APIs • Examples • FTS Grid Tutorial, RC RUG, 18-19 September 2006
Storage Infrastructures • Disk-only • Hierarchical storage management (HSM) • policy-based management of file backup and archiving in a way that uses storage devices economically and without the user needing to be aware of when files are being retrieved from or stored on backup storage media. • The hierarchy represents different types of storage media, such as disks systems, optical storage, or tape, each type representing a different level of cost and speed of retrieval when access is needed. For example, as a file ages in an archive, it can be automatically moved to a slower but less expensive form of storage. • HSM Software: TSM, DMF, CASTOR, Enstore, HPSS,… Grid Tutorial, RC RUG, 18-19 September 2006
Storage Infrastructures • HSM example at SARA Grid Tutorial, RC RUG, 18-19 September 2006
SRM • SRM standard • SRM implementations provide uniform access to heterogeneous storage resources on the Grid • Storage Resource Managers • SRM is a control protocol for: • Space reservation • File management • Pinning • Lifetime management • Replication • Protocol negotiation Grid Tutorial, RC RUG, 18-19 September 2006
SRM • SRM implementation • SRM I/F is implemented as a web service • Implementations: • dCache (disk/HSM) • DPM (disk) • CASTOR (HSM) • SRB (disk/HSM) • …. • SRM Examples • srmRm • srmLs • srmPrepareToPut • srmBringOnline • srmCopy • srmGetTransferProtocols • …. Grid Tutorial, RC RUG, 18-19 September 2006
Storage Elements in gLite • Classic SE • No SRM • Will become deprecated in the autumn of this year • Transfer protocols: gridftp • Storage type: disk • DPM • SRM • Transfer protocols: gridftp, secure rfio • Storage type: disk • dCache • SRM • Transfer protocols: gridftp, gsidcap • Storage type: disk, HSM Grid Tutorial, RC RUG, 18-19 September 2006
Low Level Data Management • GridFTP (all SEs) • globus-url-copy file:///home/ron/file \gsiftp://srm.grid.sara.nl/pnfs/grid.sara.nl/data/dteam/file • Third party transfer • globus-url-copy gsiftp://hostA/pathA gsiftp://hostB/pathB • Also edg-gridftp-ls, edg-gridftp-rm, edg-gridftp-mkdir etc. • Uberftp • Interactive gridftp client • ftp commands • Gsi authentication Grid Tutorial, RC RUG, 18-19 September 2006
Low Level Data Management • Gsidcap (dCache SEs) • dccp -p 20000:25000 /tmp/file \gsidcap://srm.grid.sara.nl:22128/pnfs/grid.sara.nl/data/dteam/file • 20000:25000 is derived from GLOBUS_TCP_PORT_RANGE environment variable • Secure rfio • rfcp /path/myfile \t2se01.physics.ox.ac.uk:/dpm/physics.ox.ac.uk/home/dteam/file • Srmcp ( ! Classic SEs ) • Srmcp file:////tmp/file \srm://srm.grid.sara.nl:8443//pnfs/grid.sara.nl/data/dteam/file Grid Tutorial, RC RUG, 18-19 September 2006
Information system • LDAP-based • Ldap servers running on service nodes (GRIS/BDII) • Ldap servers collecting the information for a site (site BDII) • Ldap servers collecting the information for all sites (BDII) • Need to set environment variable LCG_GFAL_INFOSYS • Needs to be set to a BDII • lcg-infosites • Example: finding an SE: > lcg-infosites --vo tutor se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 214632 1901097784 n.a tbn15.nikhef.nl 626880000 1163120000 n.a tbn18.nikhef.nl 488106596 368854044 n.a mu2.matrix.sara.nl Grid Tutorial, RC RUG, 18-19 September 2006
Information system • lcg-info • For more advanced searches:For example, finding out where to put your files>lcg-info --list-se --query 'SE=mu2.matrix.sara.nl’ --attrs Path- SE: mu2.matrix.sara.nl- Path /flatfiles/SE00/tutor • ldapsearch • For the real troopers among us Grid Tutorial, RC RUG, 18-19 September 2006
LFC • LFC stands for LCG File Catalog • LCG stands for LHC Computing Grid • LHC stands for Large Hadron Collider • User and programs produce and require data • Resource Broker can send (small amounts of) data to/from jobs: Input and Output Sandbox. Not recommended for large amounts of data • Data is stored on the grid • Located in Storage Elements • Several replicas of one file in different sites • Accessible by Grid users and applications from “anywhere” • Locatable by the WMS/RB (data requirements in JDL) • Also… • Data may be copied from/to local filesystems (WNs, UIs) to the Grid or opened remotely on the SE (GFAL,gsidcap,rfio). Grid Tutorial, RC RUG, 18-19 September 2006
LFC • LFC • Keeps track of the location of copies (replicas) of files on the Grid Grid Tutorial, RC RUG, 18-19 September 2006
Name conventions • Logical File Name (LFN) • An alias created by a user to refer to some item of data, e.g. “lfn:/grid/tutor/mydir/myfile” • Globally Unique Identifier (GUID) • A non-human-readable unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” • Site URL (SURL) (or Physical File Name (PFN) or Site FN) • The location of an actual piece of data on a storage system, e.g. “srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE) • Transport URL (TURL) • Locator of a replica + access protocol: understood by a SE, e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat” Grid Tutorial, RC RUG, 18-19 September 2006
TURL 11 : SURL 1 LFN 1 TURL 1k : : GUID : LFN i TURL j1 SURL j : TURL jl Naming conventions • How do they fit together? • LFC holds the mapping LFN-GUID-SURL LFC Grid Tutorial, RC RUG, 18-19 September 2006
LFC Grid Tutorial, RC RUG, 18-19 September 2006
LFC • LFN acts as main key in the database. It has: • Symbolic links to it (additional LFNs) • Unique Identifier (GUID) • System metadata • Information on replicas • One field of user metadata Grid Tutorial, RC RUG, 18-19 September 2006
LFC • Two kinds of LFC • Central LFCFor each VO, one site on the grid will publish a global catalog. This will record entries (file replicas or dataset entities) across the whole of the grid. • Local LFCLocal catalogs record the file replicas stored at that site's SEs only. Grid Tutorial, RC RUG, 18-19 September 2006
LFC • Provides: • User exposed transaction C/C++ API (+ auto rollback on failure) • Python wrapper provided (python module lfc) • Command line tools with administrative functionality • Hierarchical unix-like namespace and namespace operations for LFNs • lfn:/grid/<vo name>/mydir/myfile • lfc-mkdir, lfc-chmod • Integrated GSI Authentication + Authorization • Access Control Lists (Unix Permissions and POSIX ACLs) • Checksums • Sessions (multiple operations inside a single transaction ) • Bulk operations (inside transactions ) Grid Tutorial, RC RUG, 18-19 September 2006
LFC Summary of the LFC Catalog commands Grid Tutorial, RC RUG, 18-19 September 2006
LFC C/C++ API: Low level methods (many POSIX-like): lfc_setacl lfc_setatime lfc_setcomment lfc_seterrbuf lfc_setfsize lfc_starttrans lfc_stat lfc_symlink lfc_umask lfc_undelete lfc_unlink lfc_utime send2lfc lfc_access lfc_aborttrans lfc_addreplica lfc_apiinit lfc_chclass lfc_chdir lfc_chmod lfc_chown lfc_closedir lfc_creat lfc_delcomment lfc_delete lfc_deleteclass lfc_delreplica lfc_endtrans lfc_enterclass lfc_errmsg lfc_getacl lfc_getcomment lfc_getcwd lfc_getpath lfc_lchown lfc_listclass lfc_listlinks lfc_listreplica lfc_lstat lfc_mkdir lfc_modifyclass lfc_opendir lfc_queryclass lfc_readdir lfc_readlink lfc_rename lfc_rewind lfc_rmdir lfc_selectsrvr Grid Tutorial, RC RUG, 18-19 September 2006
LFC Interfaces • Integration with GFAL and lcg_utils APIs lcg-utils/GFAL access the catalog in a transparent way • Integration with the WMS • The RB can locate Grid files: allows for data based match-making • Jdl file: • InputData = "lfn:/grid/tutor/MyFile"; Grid Tutorial, RC RUG, 18-19 September 2006
Data Management CLIs & APIs • lcg_utils: lcg-* commands + lcg_* API calls • Provide (all) the functionality needed by the LCG user • Transparent interaction with file catalogs and storage interfaces when needed • Abstraction from technology of specific implementations • Grid File Access Library (GFAL): API • Adds file I/O and explicit catalog interaction functionality • Still provides the abstraction and transparency of lcg_utils Grid Tutorial, RC RUG, 18-19 September 2006
Data Management CLIs & APIs lcg-utils commands: Replica Management lcg-utils commands: File Catalog Interaction Grid Tutorial, RC RUG, 18-19 September 2006
Data Management CLIs & APIs lcg-utils C/C++ API: Grid Tutorial, RC RUG, 18-19 September 2006
Data Management CLIs & APIs • GFAL • Grid storage interactions today require using some existing software components: • The file catalog services to locate valid replicas of files in order to : • Download them to the user local machine • Move them from a SE to another one • Make job running on the worker node able to access and manage files stored on remote storage element. • The SRM software to ensure: • Files existence on disk or disk pool (they are recalled from mass storage if necessary) • Space allocation on disk for new files (they are possibly migrated to mass storage later) Grid Tutorial, RC RUG, 18-19 September 2006
Data Management CLIs & APIs • The GFAL Features • Hides interactions to the SRM to the end user • Provides a Posix-like interface for File I/O Operation • Posix calls prefixed with gfal_ • Based on shared libraries (both threaded e unthreaded version) • Needs only one header file (gfal_api.h) to write C applications • Supports following protocols : • file for local access, also lfn/guid • dcap, gsidcap and kdcap for dCache access protocol • rfio for CASTOR access protocol. • SRM • Access to SRMs in secure mode, i.e. using a valid Grid proxy obtained by voms-proxy-init command. Grid Tutorial, RC RUG, 18-19 September 2006
Examples • Using lcg utils and lfc commands: • Define the server hostname • The LFC server must be published in the BDII ($LCG_GFAL_INFOSYS) • Use environmental variable: $LFC_HOST=<LFC_server_hostname> • $LFC_HOST must be set Grid Tutorial, RC RUG, 18-19 September 2006
Examples Listing the entries of a LFC directory lfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--ds] path… where pathspecifies the LFN pathname (mandatory) • Remember that LFC has a directory tree structure • /grid/<VO_name>/<you create it> • All members of a VO have read-write permissions under their directory • You can set LFC_HOME to use relative paths > lfc-ls /grid/tutor/me > export LFC_HOME=/grid/tutor > lfc-ls -l me > lfc-ls -l -R /grid LFC Namespace Defined by the user -l : long listing -R : list the contents of directories recursively:Don’t use it! Grid Tutorial, RC RUG, 18-19 September 2006
Examples Creating directories in the LFC lfc-mkdir [-m mode] [-p] path... • Where pathspecifies the LFC pathname • Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog beforehand. • Examples: > lfc-mkdir /grid/tutor/me You can just check the directory with: > lfc-ls -l /grid/tutor/me drwxr-xrwx 0 19122 1077 0 Jun 14 11:36 demo Grid Tutorial, RC RUG, 18-19 September 2006
Examples Let us copy and register a file using lcg-utils > lcg-cr --vo tutor -l me/test -d mu2.matrix.sara.nl file:`pwd`/test guid:7b4efaef-bb0f-42a3-bb6f-bbe35080d105 > lcg-lr --vo tutor lfn:me/test sfn://mu2.matrix.sara.nl/flatfiles/SE00/tutor/generated/2006-09-18/file378fc829-351f-4558-8679-9d2ce530cbb4 > lfc-ls -l me -rw-rw-r-- 1 30010 2024 114 Sep 18 10:33 test Grid Tutorial, RC RUG, 18-19 September 2006
Examples Creating a symbolic link lfc-ln -s file linkname lfc-ln -s directory linkname Create a link to the specified fileor directory with linkname • Examples: > lfc-ln -s /grid/tutor/me/test /grid/tutor/aLink Let’s check the link using lfc-ls with long listing (-l): > lfc-ls -l lrwxrwxrwx 1 30010 2024 0 Sep 18 10:38 aLink -> /grid/tutor/me/test Original File Symbolic link Grid Tutorial, RC RUG, 18-19 September 2006
Examples Adding/deleting metadata information lfc-setcomment path comment lfc-delcomment path lfc-setcomment adds/replaces a comment associated with a file/directory in the LFC Catalog lfc-delcomment deletes a comment previously added • This is the only metadata (one field) supported by the catalog • Examples: > lfc-setcomment me/test “nice file” • Let’s see what happened: > lfc-ls --comment /grid/tutor/me/test /grid/tutor/me/test nice file Grid Tutorial, RC RUG, 18-19 September 2006
Examples Deleting the file lfc-rm lfc-rm removes file/link/directory only from the catalog lcg-del Lcg-del removes file from SEs and the lfns/links from the catalog • Examples, delete all replicas: > lcg-del –a --vo tutor guid:8e413879-7cb3-4260-af9f-6964392da7e8 • Example, delete only one replica: > lcg-del –a --vo tutor –s mu2.matrix.sara.nl guid:8e413879-7cb3-4260-af9f-6964392da7e8 Grid Tutorial, RC RUG, 18-19 September 2006
File Transfer Service • A batch system for submitting datatransfer jobs • For data intensive sciences • Currently in use in the LCG project Grid Tutorial, RC RUG, 18-19 September 2006
FTS • Allows for • Managed transfers by means of channels to sites • Channels are between sites i.e. CERN-SARA for example. • Site admins can adapt the configuration of incoming channels to their site, can switch their channel off etc. • Set priorities for different VOs. • Optimisation of network tuning parametres per channel Grid Tutorial, RC RUG, 18-19 September 2006
FTS • Command line interface • glite-transfer-cancel • Cancels a file transfer job • glite-transfer-list • Lists ongoing data transfer jobs • glite-transfer-status • Displays the status of an ongoing data transfer job • glite-transfer-submit • Submits a new data transfer job Grid Tutorial, RC RUG, 18-19 September 2006