1 / 29

gLite Data Services and Data Management

gLite Data Services and Data Management. Meteo VO Training, Belgrade 24-25. June 2008. Branko Marovic RCUB - UoB. Data Management. LCG-2 (LCG-2 User Guide, “man” pages) LCG-UTILS API – C/C++ LFC API – C/C++, Python GFAL API – C/C++, Python

melita
Download Presentation

gLite Data Services and Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. gLite Data Services and Data Management Meteo VO Training, Belgrade 24-25. June 2008 Branko Marovic RCUB - UoB The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no. 211338

  2. Data Management • LCG-2 (LCG-2 User Guide, “man” pages) • LCG-UTILS API – C/C++ • LFC API – C/C++, Python • GFAL API – C/C++, Python • http://grid-deployment.web.cern.ch/grid-deployment/gis/GFAL/GFALindex.html • SEEGRID Wiki “SG Using file replicas and RFIO: UI configuration, rfiod, usage in apps, limitations and workarounds” • http://wiki.egee-see.org/index.php/SG_Using_file_replicas_and_RFIO:_UI_configuration%2C_rfiod%2C_usage_in_apps%2C_limitations_and_workarounds • Configuring UI, SE, RB • Site testing of RFIO/GFAL • Typical problems and solutions • Java access to LFC and LCG-UTILS • Java LFC/GFAL wrapper • http://grid02.rcub.bg.ac.yu/LFCJavaAPI/index.html • Customizable LFC web front end (upload, list, replicate, delete) • http://grid02.rcub.bg.ac.yu/repmngr/ • gLite • http://grid-deployment.web.cern.ch/grid-deployment/documentation/DataManagement/R3.0/ Application Gridification 3/31

  3. Scope of data services in gLite • Simply, DMS provides all operation that all of us are used to performing • Uploading /downloading files • Creating file /directories • Renaming file /directories • Deleting file /directories • Moving file /directories • Listing directories • Creating symbolic links Application Gridification 4/31

  4. Scope of data services in gLite • Files that are write-once, read-many • Files cannot be changed unless remove or replaced • If users edit files then • They manage the consequences! • Maybe just create a new filename! • No intention of providing a global file management system • 3 service types for data • Storage • Catalogs • Transfer Application Gridification 5/31

  5. Data Issues and Grid Solutions • Resource centers need meet growing demand for storage • Storage Element capable to manage multiple disk pools • Disk Pool Manager (DPM), dCache, CASTOR • Data is stored on different storage systems technologies • Common interface required to hide underlying complexity • Storage Resource Manager (SRM) – storage management protocol • GridFTP – secure file transfer • Data is stored at different locations with separate namespace • File catalogue to provide uniform view of Grid data • LCG File Catalog (LFC) • Applications need to access Grid data management services • Data management API • GFAL Application Gridification 6/31

  6. Data Management • The Storage Element is the service that allows a user or an application to store data for future retrieval. In gLite, every SE must have a GSIFTP server, offering basically the same functionalitis of FTP but enhanced to support GSI security. • Files that are copied to a SE should then be registered in a catalog. A catalog is basically a database that maps the name of a file (logical file name) to its physical location (physical file name). • Files in a catalog may have more than one LFN (in principle, it has nothing to do with its real name), they can have more than one replica (that is, the aame file may be present on two different SE). What uniquely identifies them is the guid, grid unique identifier, a string of 40 bytes. Application Gridification 7/31

  7. Data management example Input “sandbox” DataSets info Output “sandbox” Storage Element 1 Storage Element 2 LCG FileCatalogue (LFC) “User interface” Resource Broker Max. 20MByte Input “sandbox” + Broker Info Output “sandbox” Computing Element 1st job writes and replicates output onto 2 SEs Application Gridification 8/31

  8. Data management example cont. LCG FileCatalogue (LFC) Input “sandbox” DataSets info Output “sandbox” Keep computation close to storage data Storage Element 1 Storage Element 2 Max. 20MByte “User interface” Resource Broker Input “sandbox” + Broker Info Output “sandbox” Computing Element 2nd job reads input from an SE Application Gridification 9/31

  9. Data management example File_on_se1 Myfile.dat guid File_on_se2 Storage Element1 Storage Element 2 “User interface” “Myfile.dat” LCG FileCatalogue (LFC) • File replicated onto 2 SEs Application Gridification 10/31

  10. Resolving logical file name Storage Element 1 Storage Element 2 LCG FileCatalogue (LFC) “User interface” “Myfile.dat” File_on_se1 (“SURL”: site URL) “GUID” Global Unique Identifier Myfile.dat “Logical filename” File_on_se2 (“SURL”: site URL) File content cannot change  No need to synchronize replicas Content is available on 2 SEs Application Gridification 11/31

  11. Name conventions • Logical File Name (LFN) • An alias created by a user to refer to some item of data, e.g. lfn:/grid/gilda/budapest23/run2/track1 • Globally Unique Identifier (GUID) • A non-human-readable unique identifier for an item of data, e.g. guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 • Site URL (SURL) (or Physical File Name (PFN) or Site FN) • The location of an actual piece of data on a storage system, e.g. srm://pcrd24.cern.ch/flatfiles/cms/output10_1 (SRM) sfn://lxshare0209.cern.ch/data/alice/ntuples.dat (Classic SE) • Transport URL (TURL) • Temporary locator of a replica + access protocol: understood by a SE, e.g. rfio://lxshare0209.cern.ch//data/alice/ntuples.dat Application Gridification 12/31

  12. Name conventions LFC has a directory tree structure lfn:/grid/<VO_name>/<you create it> LFC Namespace Defined by the user • Users primarily access and manage files through “logical filenames” • Mapping by the “LFC” catalogue server Application Gridification 13/31

  13. LFC directories • LFC directories = virtual directories • Each entry in the directory may be stored on different SEs LCG FileCatalogue (LFC) lfn:/grid/gilda/budapest23/run2/ Storage Element 1sfn://grid005.iucc.ac.il/storage/gilda/generated/2007-06-23/fileb233d43f-5bc6-4ede-a5fe-611d48be2ba5 input1 input2 Storage Element 2srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/2007-06-23/filea21ab3e2-8ff6-4a44-82a7-f2 input3 Storage Element 3sfn://trigriden01.unime.it/flatfiles/SE00/gilda/generated/2007-06-23/filec79a9e3c-2485-4206-a2a5-235f Storage Element 4sfn://grid005.iucc.ac.it/flatfiles/SE00/gilda/generated/2007-06-23/filec79a9e3c-2485-4206-a2a5-235f Application Gridification 14/31

  14. LCG File Catalog gLite supports two different types of catalogs: LFC (LCG File Catalog) and RLS (Replica Location Server). The catalog can be accessed using data management commands from the UI. Two environment variables must be set: the file catalog type and its address export LCG_CATALOG_TYPE=lfc export LFC_HOST=lfc-atlas-test.cern.ch There are several LFC hosts on LCG and they’re not synchronized, so the choice of the user has to be consistent throughout his activity! Usually, there’s a central LFC per VO, so that basically there are no risks of this kind. LFN in LFC have a particular form: they’re organized in hierarchical directory-like structure, having the following look lfn:/grid/<VO>/<dir>/<filename> Application Gridification 15/31

  15. Two sets of commands LFC = LCG File Catalogue • LCG = LHC Compute Grid • LHC = Large Hadron Collider • Use LFC commands to interact with the catalogue only • To create catalogue directory • List files • Used by you, your application and by lcg-utils (see below) • lcg-utils • Couples catalogue operations with file management • Keeps SEs and catalogue in step! • Copy files to/from/between SEs • Replicated Application Gridification 16/31

  16. LFC basics LFC has a directory tree structure /grid/<VO_name>/<you create it> LFC Namespace Defined by the user • All members of a given VO have read-write permissions in their directory • Commands look like UNIX with “lfc-” in front (often) Application Gridification 17/31

  17. Storage Element • Provides • Storage for files : massive storage system - disk or tape based • Transfer protocol (gsiFTP) ~ GSI based FTP server • Striped file transfer – cluster as back-end File request + VOMS proxy Authentication, authorization Storage Element server File system Application Gridification 18/31

  18. Data management commands Data management commands are of the form lcg-**. Some of them only access the catalog: lcg-aa add alias lcg-ra remove alias lcg-rf register file lcg-uf unregister file lcg-la list aliases lcg-lg list guid lcg-lr list replicas Some of them perform real data movement operations, usually updating the catalog about the new changes: lcg-cp copy locally a file (this command do not write on the catalog) lcg-cr copy and register a file on a SE lcg-del delete (physically) a file and its entry in the catalog lcg-rep replicate a file from a SE to another In order for these commands to work, besides the 2 catalog variables, another env variable must be set: export LCG_GFAL_INFOSYS=<BDII_address:2170> Application Gridification 19/31

  19. GFAL C API • GFAL (Grid File Access Library) is a POSIX interface for operation on file on Storage Element • Enable remote handling of files • Libraries are in C and can be included in C/C++ sources (GFAL Java API tomorrow!) • The most common of I/O operations are available, just prefix gfal_ to the function name (open(), read()…) • man gfal for further details • The destination SE must provide secure rfio (classic SEs don’t) • GFAL API Description • http://grid-deployment.web.cern.ch/grid-deployment/documentation/LFC_DPM/gfal/html Application Gridification 21/31

  20. GFAL API code sniffet Examples in gLite3 User Guide (Appendix F) • https://edms.cern.ch/file/722398//gLite-3-UserGuide.pdf int fd; struct stat remote_file_stat; fd = gfal_open(file_ref, O_RDONLY, 0644); cod_ex = gfal_stat(file_ref, &file_stat) ... cod_ex = gfal_read(fd,buffer,file_stat.st_size)); ... cod_ex = gfal_close(fd); Application Gridification 22/31

  21. LFC and LCG utils Storage Element 1 Storage Element 2 LCG File Catalogue (LFC) • List directory • Create a local file then upload it to an SE and register with a logical name (lfn) in the catalogue • Create a duplicate in another SE • List the replicas lfc-* “User interface” lcg-* Application Gridification 23/31

  22. LFC and LCG utils Storage Element 1 Storage Element 2 LCG File Catalogue (LFC) • List directory • Create a local file then upload it to an SE and register with a logical name (lfn) in the catalogue • Create a duplicate in another SE • List the replicas • Create a second logical file name for a file • Download a file from an SE to the UI lfc-* “User interface” ? lcg-* Application Gridification 24/31

  23. LFC commands There are, on the UI, some commands that directly interact with the LFC catalog. Due to its particular LFN structure, files in the LFC catalog can be browsed as if they were in a unix filesystem. Try this: > lfc-ls /grid/seegrid The lfc-ls command works just like a ls on a local filesystem (also allowing the -l option). In the same way, lfc-mkdir, lfc-chmod or lfc-chown behave almost like their corresponding brothers on unix. In spite of the easyness of LFC commands, usually only lfc-ls is used. Commands that perform actions on the catalog, that write on it or delete “directories” from it should be used with great caution: the risk is to cause inconsistencies between the catalog and the files on the SE. Data management command assure that such inconsistencies are not created. These commands write on the catalog but they also check that no “harm” is done to the system. Application Gridification 25/31

  24. LFC Catalog commands lfc-chmod Change access mode of the LFC file/directory lfc-chown Change owner and group of the LFC file-directory lfc-delcomment Delete the comment associated with the file/directory lfc-getacl Get file/directory access control lists lfc-ln Make a symbolic link to a file/directory lfc-ls List file/directory entries in a directory lfc-mkdir Create a directory lfc-rename Rename a file/directory lfc-rm Remove a file/directory lfc-setacl Set file/directory access control lists lfc-setcomment Add/replace a comment Summary of the LFC Catalog commands Application Gridification 26/31

  25. Low level commands There are some “low-level” commands made available to grid users that should be used with caution, working merely on the SE without updating the catalog. Anyway, 2 of them will prove to be real friends to anyone who has to look for files on the grid: edg-gridftp-ls gsiftp://<SE_address>/<dir>/ globus-url-copy <src_file> <dest_file> The first command lists the content of a directory on a remote SE, the second one is the base for every lcg tool that has to move data. The <src_file> and <dest_file> have to be in a fully qualified format: file:///<abs_path>/<file_name> for local files gsiftp://<SE_address>/<abs_path>/<file_name> for remote files Other useful low-level commands (to be used carefully!) are edg-gridftp-rm <URL> edg-gridftp-rmdir <URL> edg-gridftp-rename <src_URL> <dest_URL> Application Gridification 27/31

  26. File Transfer Service • FTS is a low level data movement service • Why is it needed? • Improves reliability for transfers • Provides asynchronous file transfer • schedule transfers when resources are available • Provides control of transfer properties (channel concept) • No catalogue interactions yet   users have to handle SURL Application Gridification 28/31

  27. FTS Concepts • Transfer Job • A set of source/destination pairs specifying files to transfer • Submitted to FTS for processing • Channel • A job is assigned to a channel after submission • Represents a point-to-point network link • Catch all channels are possible: any-to-me, me-to-any • Similar to a queue where you can specify • VO share for the queue • Number of concurrent file transfer • Number of concurrent streams (gridFTP) Application Gridification 29/31

  28. FTS architecture • Experiments interact viaweb-service • User: FileTransfer • Admin: ChannelManagement • VO agents assigns jobs to channels • Channel agents manages assigned file transfers • Monitoring and statistics can be collected via the DB • All components are decoupled from each other • Each interacts only with the database Application Gridification 30/31

  29. Summary of fts client commands FTS client Application Gridification 31/31

More Related