250 likes | 408 Views
EUChinaGrid 3rd tutorial , Beijing , Nov 2006. Grid Data Management Zongchang Yang 杨宗长. Introduction Grid Data Management Services File catalogues Data Management commands Hands on. Outline. Users and applications produce and require data
E N D
EUChinaGrid 3rd tutorial, Beijing, Nov 2006 Grid Data Management Zongchang Yang 杨宗长
Introduction • Grid Data Management Services • File catalogues • Data Management commands • Hands on Outline
Users and applications produce and require data • The Input / Output Sandbox is used for transferring relatively small files (< 20 MB) Users and applications need to handle files on the Grid • “Large” files are stored in permanent resources called SE = Storage Elements • SE are present at almost every site together with the computing resources Introduction
Grid Data Management Services enable users to: • move files in and out of the Grid • Replicate files on different SE’s • Locate files on various SE’s Data Management means movement and replication of files on grid elements Grid Data Management Services
Data transfer is done by a number of protocols (gsiftp, rfio, file, etc`) • Usage of a central file catalogue By using high level data management tools which enable transparency of the transport layer details (protocols) , storage location and the internal structure of the SE’s The SE is a “black box” Grid Data Management Services – cont’d
Logical File Name (LFN) • An alias created by the user to refer to some file • A LFN is of the form: lfn:/grid/<MyVO>/<MyDir>/<MyFile> • Example: lfn:/grid/gilda/importantResults/Test1240.dat • Globally Unique Identifier (GUID) • A file can always be identified by its GUID (based on UUID) • A GUID is of the form: guid:<unique_string> • All replicas of a file will share the same GUID • Example: guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 both lfn’s and guid’s refer to files (notreplicas) Files : name conventions
Replicas : name conventions • Storage URL (SURL) • (AKA: Physical/Storage File Name (PFN/SFN)) • Used by the RMS to find where the replica is physically stored • A SURL is of the form: sfn://<SE_hostname>/<VO_path>/<file_name> • Example: sfn://tbed1.cern.ch/flatfiles/SE00/gilda/project1/testSUTL.dat • Transport URL (TURL) • Temporary locator of a physical replica including the access protocol understood by a SE • A TURL is of the form: <protocol>://<SE_hostname>/<VO_path>/<filename> • Example: gsiftp://tbed1.cern.ch/gilda/project1/testTURL.dat provide info about the physical location of the replica
How do I keep track of all of the files I have on the Grid ? • Even if I remember all the lfn’s of my files, what about someone else's files ? • How does the Grid keep track of lfn-guid-surl associations ? • Well… for that we have a FILE CATALOG File Catalogs
RMC = Replica Management System RLS = Replica Location Service Logical File Name 1 Physical File SURL 1 Logical File Name 2 GUID Physical File SURL n Logical File Name n File Catalogs – cont’d
Replica Replica Replica srm://host.example.com/foo/bar host.example.com srm://host.example.com/foo/bar host.example.com srm://host.example.com/foo/bar host.example.com Symlink Symlink /grid/dteam/mydir/mylink /grid/dteam/mydir/mylink User Metadata System Metadata User Defined Metadata “size” => 10234 “cksum_type” => “MD5” “cksum” => “yy-yy-yy” LFN GUID Xxxxxx-xxxx-xxx-xxx- /grid/dteam/dir1/dir2/file1.root Symlink Replica /grid/dteam/mydir/mylink srm://host.example.com/foo/bar host.example.com File Catalogs – cont’d • The LFN acts as a main key in the database. It has: • Symbolic links to it (additional LFNs) • Unique Identifier (GUID) • System metadata • Information on replicas
lcg-cp Copies a Grid file to a local destination • lcg-cr Copies a file to a SE and registers the file in the LRC • lcg-del Deletes one file (either one replica or all replicas) • lcg-lg Gets the guid for a given lfn or surl Data Management commands
Data Management commands –cont’d • lcg-rep Copies a file from SE to SE and registers it in the LRC • lcg-aa Adds an alias in RMC for a given guid • lcg-la Lists the aliases for a given LFN, GUID or SURL • lcg-gt Gets the turl for a given surl and transfer protocol
lcg-lr Lists the replicas for a given lfn, guid or surl • lcg-ra Removes an alias in RMC for a given guid • lcg-rf Registers a SE file in the LRC (optionally in the RMC) • lcg-uf Un-registers a file residing on an SE from the LRC Data Management commands – cont’d
lfc-ls List file/directory entries in a directory. • lfc-mkdir Create directory. • lfc-rename Rename a file/directory. • lfc-rm Remove a file/directory. • lfc-chmod Change access mode of a file/directory • lfc-chown Change owner and group of a file/directory Data Management commands – cont’d
$ echo $LCG_GFAL_INFOSYS; echo $LCG_CATALOG_TYPE; echo $LFC_HOST gilda06.ihep.ac.cn:2170 lfc gilda07.ihep.ac.cn • If one or more of them has different or empty value, please set it (them) in this way: • export LCG_GFAL_INFOSYS=gilda06.ihep.ac.cn:2170 • export LCG_CATALOG_TYPE=lfc • export LFC_HOST=gilda07.ihep.ac.cn Environment variables
For each of the supported VO a separate "top level" directory exists under the "/grid" directory. You can see all the files that are stored for the gilda VO. First ensure you have a running VOMS proxy and then type: • $ lfc-ls -l /grid/gilda • You will see a listing of the contents of the /grid/gilda directory. • Rather than having to type an absolute path for every file and directory you use, it is instead possible to define a HOME directory from which you may use relative file/directory paths. Set this now by setting the environment variable LFC_HOME as follows: • $ export LFC_HOME=/grid/gilda/tutorial Listing file and directory
Before creating and uploading any of your own files create a personal directory for storage by typing: • $ lfc-mkdir $USER • To check that you have created your directory type: • $ lfc-ls -l and you should see your directory (plus possibly those of other attendees). Create directory
First create locally a simple text file: • $ echo "Put something here" > text_file.txt • The command used for this is lcg-cr (LCG copy and register). Type the following to store this file on the gilda04.ihep.ac.cn storage element : (use lcg-infosites in order to find which the available SE are) • $ lcg-cr --vo gilda file://$PWD/text_file.txt -l lfn:$USER/text_file.txt -d gilda04.ihep.ac.cn • The output should be something like this: • $ lcg-cr --vo gilda file://$PWD/text_file.txt -l lfn:$USER/text_file.txt -d gilda04.ihep.ac.cn • guid:030486cc-3c60-4551-a714-c3683a913d07 Upload file into SE
For several purpose, FTS for instance, is useful to know the file SURL (they can be many if the file has replicas somewhere). The appropriate command is lcg-lr (list-replicas) [lfn | guid] • $ lcg-lr --vo gilda lfn:$USER/text_file.txt • srm://gilda04.ihep.ac.cn/dpm/ihep.ac.cn/home/gilda/generated/2006-11-23/file6a08a1ef-c06a-4c4c-a170-da84719da050 Get the file SURL
A file can be stored on multiple SE's and then a running job can access the closest SE with the file on it, thus giving faster access times to the data. This also helps protect against failures/access difficulties with a particular SE. To find the list of SE'S available to you see the tutorial on lcg-infosites which can be found here. We will replicate the file just created to the SE egee016.cnaf.infn.it with the command • $ lcg-rep --vo gilda -d grid005.iucc.ac.il lfn:$USER/text_file.txt • There is no output from this command on success, but you can check that the replica was created by listing all the replicas of your file, that is done by using the LCG list replicas command: • $ lcg-lr --vo gilda lfn:$USER/text_file.txt • You should get two replicas listed, as here: • $ lcg-lr --vo gilda lfn:$USER/text_file.txt • sfn://grid005.iucc.ac.il/storage/gilda/generated/2006-11-23/file8aa62592-f3a5-40fe-aa0e-b0b419f7a095 • srm://gilda04.ihep.ac.cn/dpm/ihep.ac.cn/home/gilda/generated/2006-11-23/file6a08a1ef-c06a-4c4c-a170-da84719da050 Note how the path to where each file is stored is different. This demonstrates how the use of a "lfn" avoids the need to understand the local filesystem where the replica is actually stored. Replicate file between SE
As an example of some of some of the other functionality available a second lfn for the file you uploaded is created. This is very similar to the Unix symbolic link. A second lfn is created with the command: • $ lfc-ln -s $USER/text_file.txt $USER/text_file_symlink.txt • Note the file path for the source file was used so as to avoid problems with relative links being used from the wrong directory. If you now list the contents of your directory you should see both names, but the symbolic link has 0 size and also the link target is shown: • $ lfc-ls -l $USER • -rw-rw-r-- 1 554 102 19 Jul 19 15:36 text_file.txt • lrwxrwxrwx 1 554 102 0 Jul 19 15:37 text_file_symlink.txt -> /grid/gilda/tutorial/$USER/text_file.txt Create symbolic link
Having already uploaded a file the next step is to show downloading a file. To download the file you already uploaded using the new lfn you have just created use the command: • $ lcg-cp --vo gilda lfn:$USER/text_file.txt file://$PWD/text_file_copy.txt • You can check that the file you just downloaded is the same as the file you uploaded with cat command, you should find the same text you entered below. • $ cat text_file_copy.txt Download a file from SE to UI
You can delete a file from SE with lcg-del • $ lcg-del -a --vo gilda lfn:/grid/gilda/tutorial/$USER/text_file.txt • This will remove File Catalog entries as well • $ lfc-ls $USER • $ • complete by removing the working directory on file catalog • $ lfc-rm -r /grid/gilda/tutorial/$USER • $ lfc-ls /grid/gilda/tutorial/ | grep $USER Clean all !