260 likes | 387 Views
Advanced Data Movement and Management features of SRB By Michael Wan. SDSC/UCSD. Sput – upload files to SRB. [-fprabvsmMkKV] [-c container] [-D dataType] [-n replNum] [-N numThreads] [-S resourceName] [-P pathName] [-R retry_count] localFileName|localDirectory ... TargetName
E N D
Advanced Data Movement and Management features of SRB By Michael Wan SDSC/UCSD
Sput – upload files to SRB [-fprabvsmMkKV] [-c container] [-D dataType] [-n replNum] [-N numThreads] [-S resourceName] [-P pathName] [-R retry_count] localFileName|localDirectory ... TargetName Upload one or more local files and/or directories Default mode – sequential Sput –v /tmp/srb/lfile LOCAL:/tmp/srb/lfile->SRB:lfile | 84.315 MB | 13.219 MB/s | 6.38 s | 2005.07.29 21:41:56 Sls -l lfile fedsrbbrick8 0 demoResc 84314624 2005-07-29-15.18 % lfile
Sput – serial mode Peer-to-peer Request srbObjCreate srbObjWrite Sput 1 5 SRB server2 SRB server1 3 4 6 SRB agent SRB agent 2 Server(s) Spawning MCAT 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control R Data Transfer
Serial Mode Data Transfer Simple to Implement and Use Unix-like API – srbObjCreate, srbObjWrite Performance Issue 2 hops data transfer Single data stream One file at a time – overhead relatively high for small files MCAT interaction – query and registration Small buffer transfer Large files – Single Hop, multiple data streams Small files – Single Hop, multiple files at a time
Parallel Mode Data Transfer For large file transfer multiple data streams Single hop data transfer Two modes Server initiated Client initiated (for clients behind firewall) Up to 5 times speed up for WAN Two simple API – srbObjPut and srbObjGet Use –m (Server initiated), -M (Client initiated) options Available to all Scommands involving data transfer Sput, Sget, Srsync,Sreplicate, Scp, Sbkupsrb, SsyncD, Ssyncont
Parallel mode Data Transfer – Server Initiated Peer-to-peer Request Data transfer Sput -m srbObjPut + socket addr , port and cookie 6 1 SRB server2 5 SRB server1 3 4 SRB agent SRB agent 2 Connect to client MCAT 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control R
Parallel mode Data Transfer – Client Initiated Connect to server Data transfer Sput -M srbObjPut 8 1 6 7 SRB server2 SRB server1 3 4 SRB agent SRB agent 2 5 Return socket addr., port and cookie MCAT 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control R
Small files Data Transfer (Bulk operation) Upload/download large number of small files One file at a time – relative high overhead MCAT interaction, Small buffer transfer <= 0.5 sec/file for LAN, > 1 sec/files for WAN Bulk Operation Bulk data transfer transfer multiple files in a single large buffer (8 Mb) Bulk Registration Register large number of files (1,000) in a single call Multiple threads for transfer and registration Single Hop 3-10 times speedup Specify -b in Sput/Sget
Bulk Load Operation Bulk Data transfer thread 8 Mb buffer Query Resource Sput -b Return Resource Location 4 1 5 Bulk Registration threads SRB server2 3 SRB server1 Store Data in a temp file SRB agent SRB agent 2 6 MCAT 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control R Bulk Register Unfold temp file
Container - Archival of Small files Performance issues with storing/retrieving large number of small files to/from tape Container design physical grouping of small files Implemented with a Logical Resource A pool of Cache Resource for the frontend resource An Archival Resource for the backend resource The entire container is stored on tape as a single file Bulk operation with container – faster Container specific commands – Smkcont, Srmcont, Ssyncont, Slscont, Sreplcont
Summary of Data Transfer modes Serial - default mode Parallel - for large files Bulk - for large number of small files Container - Archiving small files (to tapes). Container + bulk - faster archival of small files
Sput (cont) -m parallel, server initiated connection -M parallel, client initiated connection client behind firewall problem -r recursive -b bulk (directories of small files) time Sput –r /tmp/srb/d200 d200a time Sput -b /tmp/srb/d200 d200b -k – register checksum computed by client Sput –kv /tmp/srb/mfile Schksum -l mfile -K checksum verification Client computes checksum Server independently computes checksum by reading back uploaded file
Sget – Download files from SRB Sget [-n n] [-N numThreads] [-pbfrvsmMV] [-T ticketFile | -t ticket] [-A condition] [-R retry_count] [-k] srbObj|Collection ... [localFile| Download one or more files from SRB to local file system -r recursive, -b bulk, -m parallel (server based), -M parallel (-N numTreads, client based), -k , -n replica number
Types of Data Transfer Local to SRB - Sput, Srsync SRB to Local - Sget, Srsync SRB to SRB - Scp, Sreplicate, Sbkupsrb, Srsync, Sphymove Third party transfer Server to Server data transfer, client not involved Parallel I/O
Third Party Data Transfer Scp srbObjCopy 1 SRB server SRB server 2 MCAT SRB agent SRB server2 3 5 SRB server1 SRB agent 6 SRB agent 4 R dataPut- socket addr., port and cookie Connect to server2 Data transfer R
Sreplicate {-n replicaNum] [-pr] [-S resourceName] [-P pathName] srbFile|collection …} makes a relica of srbFiles or collections Replica have same path but different replica number Use third party parallel transfer
Sreplicate/Sbkupsrb Sreplicate –S demoResc1 mfile Sls –l mfile fedsrbbrick8 0 demoResc 3029449 2005-07-29-15.37 % mfile fedsrbbrick8 1 demoResc1 3029449 2005-07-29-21.28 % mfile Sget –vn1 mfile Sreplicate –rS demoResc1 testdir Sls –lr testdir Sbkupsrb – similar to Sreplicate but won’t make a copy if a good copy already exist in the targetResc Sbkupsrb -S demoResc1 mfile Sls –l mfile Sbkupsrb -S demoResc2 mfile Sls –l mfile
Sphymove –move file to another resource Move file to another resource without making another replica Normally used by admin to move files around Used by the BBSRC project. Sphymove [-b|r] [–c container] [-S targetResource ] [-s sourceResource ] srbFile|srbCollection ... -b bulk, -r recursive (for collection) -c container – move files into container -S targetResource – move file to this resource if specified. -s sourceResource – If specified, move only files stored in the sourceResource to the targetResource. Otherwise, move all files that are not in the targetResource
Sphymove - cont Sphymove –b –s demoResc –S demoResc1 testdir Bulk move all files stored in demoResc in the ‘testdir’ collection to demoResc1 Sphymove –b –c myContainer testdir Bulk move all files in the testdir collection into the ‘myContainer’ container.
Scp {[-n n ] [-fpra] [-c container] [-S newResourceName] [-P newPathName] srcFile|srcCollection … targFile| targCollection SRB to SRB copy From one SRB path to another SRB path Use third party parallel transfer Cross zone copy -a write to all resources, -r recursive, -b bulk Scp –S demoResc1 –r testdir testdir1 Scp –S xyzResc /z1/a/b/c /Z2/x/y/z (cross zone)
Data Synchronization Srsync [-S resource] [-t tmpInxDir] [-rvamMls] sourceFile|sourceDirectory [....] targetFile|targetDirectory Similar to Unix rsync Modes: Local to SRB SRB to Local SRB to SRB Use checksum value for synchronization Data transfer only when checksums are different
Rrsync (cont) Srsync -vMr /tmp/srb/testdir s:testdir2 /tmp/srb/testdir/./Sget.c 11514 55818 N LOCAL:/tmp/srb/testdir/./Sget.c->SRB:Sget.c®_CHKSUM=55818 | 0.012 MB | 0.026 MB/s | 0.45 s | 2005.07.29 22:13:40 | 1 thr . Sls –lr testdir2 Change /tmp/srb/testdir/SgetD.c with an editor Srsync -vMr /tmp/srb/testdir s:testdir2 Srsync -vMr s:testdir2 testdir2 Srsync -vMr s:testdir2 s:testdir3
Schksum – checksum utility [-f|l|c] [-n replNum] [-rv] srbFile|collection ... Computes and lists checksum values of SRB files -f force recompute, -c verification mode, -l list mode, -r recursive, -v verbose mode, -n replNum, -s verify integrity based on size Example: Schksum -lr testdir Schksum -crv testdir2
SRB Proxy operation Perform operations on server on behalf of user Operation where data is located File format conversion, md5 checksum, subsetting and filtering, etc Two types of proxy operations Proxy commands Invoked by client using Spcommand Server fork and exec executable/script in bin/commands on server Pipe output back to client Proxy functions Functions built into server Well defined framework for writing proxy functions
Spcommand – proxy command Spcommand [-hc] [-H hostAddr | -d srbPath] command Command – an executable/script in bin/commands on server -H the server host address where the command should be executed -d execute on host where this srbPath is located Spcommand "hello mike" Hello mike from SRB world
SRB shell Ssh [-v][-c command] Put client into a SRB shell Make a one time connection to SRB Keep the connection open Use it for all subsequent Scommands Example : Ssh – put into an interactive Ssh shell Issue Scommand and UNIX command UNIX shell environment not supported Ssh –c bash - create an interactive bash session Ssh -c myShellScript