110 likes | 242 Views
Cross-site data transfer on TeraGrid using GridFTP. TeraGrid’06 Institute User Introduction to TeraGrid June 12 th 2006. by Krishna Muriki kmuriki@sdsc.edu SanDiego Supercomputer Center. TeraGrid: Integrating NSF Cyberinfrastructure. PSC. PU. UC/ANL. NCAR (mid-2006). IU. NCSA.
E N D
Cross-site data transfer on TeraGridusing GridFTP • TeraGrid’06 Institute • User Introduction to TeraGrid • June 12th 2006. • byKrishna Muriki • kmuriki@sdsc.edu • SanDiego Supercomputer Center
TeraGrid: Integrating NSF Cyberinfrastructure PSC PU UC/ANL NCAR (mid-2006) IU NCSA ORNL TACC SDSC
Data Transfer Performance • What impacts transfer rates? • Disk speed • Connectivity of disk to node • Node characteristics & load • Connectivity of node to WAN • For all networks • Bandwidth • Latency • Buffer Size • Protocol • Load • Encryption … node node 1 Gb/s switch 30 Gb/s WAN (TG Backbone) Multi Gb/s 30 Gb/s switch node
GridFTP - Protocol • GridFTP is based on FTP (But not the same) • More suitable: • Optimized for high-bandwidth networks • Optimized for Wide-area networks • More secure: • Authenticated Data Channels. • GSI security on control & data channels. (diff login names) • More performance: • Multiple data channels for parallel file transfers. • Reusable data channels • Command pipelining.
TeraGrid Transfer Environment • Dedicated Nodes for Data Transfer ( called gridftp server nodes ) • With direct access to disks and Network Backbone. • Transfer client & servers based on more efficient protocol. • GSI authentication and proxy certificates • provide authentication • enable secure transfers • Transfer requests can be integrated into job execution scripts • Moving input data to site(s) of job execution • Moving results to another file system, site, or archive
GridFTP Servers • GridFTP server : • Not all systems (nodes) on TeraGrid machines are GridFTP servers. • Only few nodes are configured to be GridFTP servers. • Configured to utilize the full TG backbone bandwidth. • Shared servers : • Login node : tg-login1.<site>.teragrid.org is a GridFTP server. • But a shared resource with many tasks running on this node. • Dedicated servers : • Node name : tg-gridftp.<site>.teragrid.org is a GridFTP server. • This single name resolves to more than one node at each site. • All the nodes are dedicated file transfer resources.
No client !!! • gridftp is not a command. • There is no client program with the name ‘gridftp’. • Recommended client programs: • tgcp • globus-url-copy • uberftp • Availability: • uberftp & globus-url-copy are part of CTSS*** • tgcp can be added through SoftENV. ( soft add +tgcp ).
globus-url-copy • %> globus-url-copy <source-url> <destination-url> • Command line client (Not interactive) • No feedback given back. (suitable for use in job scripts) • Multiple data channels (striped ) • Third party transfers • Source & Destination formats: • If local file --- file:<full path> • If remote file --- gsiftp://<hostname>/<full path> • Optimization parameters: • -tcp-bs <size> | -tcp-buffer-size <size> ( size in bytes ) • -p <parallelism> | -parallel <parallelism>
UberFTP • %> uberftp • Interactive file transfer client. • Not suitable for script transfers. • Less verbose syntax. • Less prone to errors. • Some useful commands: • put, get • mput, mget • Parallel, tcpbuf • Optimization parameters: • tg-login1> uberftpuberftp> parallel 2uberftp> tcpbuf 4194304TCP buffer set to 4194304 bytes
tgcp • %> tgcp • Command line program • Wrapper around globus-url-copy, RFT & cp commands. • Syntax same as globus-url-copy or RFT • Automatically picks optimization parameters • Users need not set the parameters • Some Useful options: • -big or –stripe : To transfer large files using striped guc. • -rft : To transfer using Reliable File Transfer (RFT) service.