260 likes | 431 Views
GridFTP Roadmap. Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory. Usability & Performance Packaging GridFTP as RPM GWFTP GridFTP GUI Automatic Firewall Traversal Sync feature for globus-url-copy. Packaging GridFTP as RPM.
E N D
GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory
Usability & Performance • Packaging GridFTP as RPM • GWFTP • GridFTP GUI • Automatic Firewall Traversal • Sync feature for globus-url-copy
Packaging GridFTP as RPM • Modify packaging of GridFTP and its dependencies • Make it suitable for packaging as an RPM • Make it compatible with major Linux distribution standards • Eventually some distribution might pick it up • GridFTP available as part of standard Linux distribution • Attract a whole new set of users • Put it in par with scp, standard ftp in terms of availability
GridFTP Where there’s FTP (GWFTP) • GridFTP has been in existence for some time and has proven to be quite robust and useful • Only few GridFTP clients available • FTP has innumerable clients • GWFTP - created to leverage the FTP clients • A proxy between FTP clients and GridFTP servers
GWFTP USER <GWFTP username> ::gsiftp://wiggum.mcs.anl.gov:2811/ GWFTP (GSI Credential) wiggum.mcs.anl.gov GridFTP Server (2811) FTP Client PASS GSI Authentication Get request Get request Data Data
Computation Institute GUIClient
GridFTP GUI • A Java Web Start Application • Updates automatically • Users always use the latest release • Transfer files and directories • Third-party transfer • Multiple concurrent transfers • Support authentication through MyProxy • Manage local and remote files and directories • Browse • Create and delete
Automatic Firewall Traversal • Control channel port is statically assigned • Data channel ports are dynamically assigned • GridFTP Protocol Changes • New commands to communicate the 4 tuple (src ip, src port, dst ip, dst port) to both ends of transfer • Use simultaneous Open/TCP splicing or Use a broker to open ports temporarily • Hooks in GridFTP to contact a broker at the right time
Firewall DATA GridFTP Source Server GridFTP Dest Server TCP 2811 TCP 2811 Client
Automatic traversal using a connection Broker CB CB IP 4 tuple IP 4 tuple Temporary hole Temporary hole GridFTP Source Server DATA GridFTP Dest Server TCP 2811 TCP 2811 Client
Sync feature for globus-url-copy • Check for the existence of a file at the destination before transferring • If exists, determine whether the source version is different from that of the destination • Based on how much the source has changed, optimize the transfer • Research into developing a logic that does not involve any changes to the GridFTP protocol
Reliability & Security • Improved restart mechanism • Improved memory management algorithm • Load balancing • Data channel security for SSH based GridFTP • GUMS authorization callout
Improved Restart Mechanism • globus-url-copy can recover from server and network failures • Can not recover from its own failure • Number of users including ESG, APS and SNS use this client to transfer large data sets with complex directory structures • Develop methods to enable globus-url-copy to recover from its failure
Fork Gfork architecture Client Server Host GridFTP Plugin GFork Server Control Channel Connections Client Inherited Links State Sharing Link GridFTP Server Instance GridFTP Server Instance GridFTP Server Instance Client
MemoryManagement • Optimistic memory provisioning by operating system • possible that under heavy loads GridFTP server can consume all of systems memory resources. • Gfork – xinted like super server daemon • Allows state to be maintained across connections • GridFTP plugin for Gfork has a simple memory limiting option • 90% of the memory to the first 10% of the allowed connections • Remaining connections receive half of what is available • Develop an improved memory management algorithm
IPC Load balancing capabilities • The separation of processes buys the ability to proxy • Allows for load balancing • Frontend can choose from a pool of DPIs to service a client request DPI DPI Frontend DPI DPI Client
SSH based GridFTP (GridFTP-Lite) sshd Client Port 22 exec ROOT popen ssh Authenticate Stdin/out (control channel) GridFTP Server USER 2811
Data Channel Security for SSH based GridFTP • SSH based GridFTP does not have data channel security • Investigate and prototype a way to let a client send a shared secret to both source and destination GridFTP servers • Used to secure the data channel(s) between the two servers • Shared secret can be used to authenticate, integrity-protect and encrypt the data channel • This feature will increase the adoption of SSH based GridFTP
GUMS Authorization Callout • GUMS – Grid User Management System • Grid identity mapping service • Maps grid identity to local site identity • Used in OSG 1. Authentication GridFTP Client 3. Obtain local identity from GUMS server /DC=org/DC=doegrids/OU=People/CN=John Bresnahanz GUMS server 2. Data transfer operations GUMS callout bresnaha 4. Access data as local identity Disk
GUMS Authorization Callout • Role based authorization using voms extended proxy 1. Authentication GridFTP Client 3. Obtain local identity from GUMS server /DC=org/DC=doegrids/OU=People/CN=John Bresnahanz GUMS server 2. Data transfer operations GUMS callout /VO=ATLAS/Group=USATLAS/Role=developer usatlasdev 4. Access data as local identity Disk
Quality of Service • Information provider • Provision end-point GridFTP resources • Integrate network provisioning • Integrate storage provisioning • Co-schedule data transfer resources
Information Provider • GridFTP information provider service • Max connections • Open connections • Load • Higher level services can utilize this information for scheduling data transfers • Help with selecting the appropriate replica of data
Control Channel Ad Provision end-point resources Data Movement Service (RFT replacement) Provision GridFTP GFTP Resource Broker GridFTP Info Provider GridFTP Server Resource Limiter CPU Memory BW Data Point
Control Channel Ad Integrate Network Provisioning Network Reservation Service Data Movement Service Reserve Bandwidth Bandwidth Token Provision GridFTP GFTP Resource Broker GridFTP Info Provider GridFTP Server Resource Limiter CPU Memory BW Data Point
Control Channel Ad Integrate Storage Provisioning Network Reservation Service Data Movement Service Provision Bandwidth Bandwidth Token Provision GridFTP Provision Storage GFTP Resource Broker GridFTP Info Provider GridFTP Server Lotman Resource Limiter File System CPU Memory BW Data Point
Co-schedule Data Transfer Resources Data Movement Service Provision Bandwidth Provision GridFTP and Storage resources Provision GridFTP and Storage resources Network Reservation Service Destination Data Point Source Data Point