260 likes | 487 Views
GridFTP. Steve Tuecke Argonne National Laboratory. Overview. Motivation for GridFTP Working Group Requirements GridFTP Solution GridFTP Working Group Documents Role of GridFTP Working Group. GridFTP Working Group Motivation.
E N D
GridFTP Steve Tuecke Argonne National Laboratory
Overview • Motivation for GridFTP Working Group • Requirements • GridFTP Solution • GridFTP Working Group Documents • Role of GridFTP Working Group
GridFTP Working Group Motivation • Data transfer solutions have been developed by the Globus Project over past ~5 years, GridFTP is 3rd generation • Grid Forum started ~1 year ago to promote and develop Grid technologies • Critical mass of people working in this area • Grid Forum GridFTP working group formed to foster the further specification and development of GridFTP • Community effort to move GridFTP forward
Some Important Definitions • Resource • Network protocol • Network enabled service • Application Programmer Interface (API) • Syntax • Software Development Kit (SDK)
Resource • Entity that is to be shared • Includes computers, storage, data, software • Does not have to be physical entity • Condor pool, distributed file system, … • Defined in terms of interfaces, not devices • E.g. LSF defines compute resource • Open/close/read/write defines access to a distributed file system, e.g. NFS, AFS, DFS
Network Protocol • A formal description of message formats and a set of rules for message exchange • Rules may define sequence of message exchanges • Protocol may define state-change in endpoint, e.g. state change • Good protocols designed to do one thing • Protocols can be layered • Examples of protocols • IP, TCP, TLS, FTP, HTTP, Kerberos
FTP Server Web Server HTTP Protocol FTP Protocol Telnet Protocol TLS Protocol TCP Protocol TCP Protocol IP Protocol IP Protocol Network Enabled Services • Implementation of a protocol that defines a set of capabilities • Protocol defines interaction with service • All services require protocols • Not all protocols are used to provide services (e.g. IP, TLS) • Examples: FTP and Web servers
API(Application Programming Interface) • A specification for a set of routines to facilitate application development • Refers to definition, not implementation, e.g. there are many implementations of MPI • Spec often language-specific (or IDL) • Routine name, number, order and type of arguments; mapping to language constructs • Behavior or function of routine • Examples • GSS API, MPI
Syntax • A specification for how a defined set of information is encoded into bits • A syntax may be defined as part of a protocol or API • Protocol messages have defined syntax • A syntax may be used as API function argument • But syntax can also stand alone • Good syntax designed to do one thing • Syntaxes can be layered • Examples • XML, ASN.1, X.509, LDIF
SDK(Software Development Kit) • A particular instantiation of an API • SDK consists of libraries and tools • Provides implementation of API specification • Can have multiple SDKs for an API • Examples of SDKs • MPICH, Motif Widgets
Multiple APIs but a Single ProtocolExample: TCP/IP • Multiple APIs: BSD sockets, Winsock, System V streams, … • Different programs use different APIs • Interoperability: programs using different APIs can exchange information Application Application WinSock API Berkeley Sockets API TCP/IP Protocol: Reliable byte streams
Application Application GSS-API GSS-API GSI SDK Kerberos SDK GSI protocol Kerberos protocol Different message formats, exchange sequences, etc. TCP/IP TCP/IP Single API, but Multiple ProtocolsE.g., GSS-API • GSS-API provides portability: any correct program compiles & runs on a platform • Does not provide interoperability: all processes must link against same SDK • E.g., GSI and Kerberos versions of GSS-API
I.e., Standard APIs and Protocols are Both Important: For Different Reasons • Standard APIs/SDKs are important • They enable application portability • But w/o standard protocols, interoperability is hard (every SDK speaks every protocol?) • Standard protocols are important • Enable cross-site interoperability • Enable shared infrastructure • But w/o standard APIs/SDKs, application portability is hard (different platforms access protocols in different ways)
Grid Data Needs • Transfer of large amounts of data (petabytes or terabytes) between storage systems • Access to large amounts of data (terabytes or gigabytes) by many geographically distributed applications and users for analysis, visualization, etc.
Requirements • Grid Security Infrastructure (GSI) and Kerberos support • Third-party control of data transfer • Parallel data transfer • Striped data transfer • Partial file transfer • Automatic negotiation of TCP buffer/window size • Support for reliable/recoverable data transfer
Candidate Standards • FTP • Defined by a set of IETF RFCs • No partial file, parallel/striped, GSI, etc • Separate control & data channels • WebDAV • New extension to http • No third party transfer, parallel/striped, etc. • Combined control & data channel
Separate Control & Data Channels • WebDAV combines control and data over single channel • FTP splits control and data • Supports multiple, user selectable data channel protocols • Advantage to split channels • Third party transfers handled cleanly • Can (cleanly) define new data channel protocols • E.g. parallel/striped transfer, automatic TCP buffer/window negotiation • Amenable to high-performance proxies • E.g. For firewalls, load balancing, etc.
GridFTP Solution • Built on existing FTP standards • RFC 949: File Transfer Protocol • RFC 2228: FTP Security Extensions • RFC 2389: Feature Negotiation for the File Transfer Protocol • Draft: FTP Extensions • Extends standards with • Additions to security extensions, partial file transfer, parallel/striped transfer, TCP buffer/window size tuning,
GridFTP Implementation Status • Modified wu-ftpd server • Most features • Modified ncftp client • Security, TCP buffer setting • Modified HPSS & Unitree ftpd server • Security • Globus Toolkit client and server SDKs, and command line tools • Most features • Striped FTP server (aka DPSS2)
GridFTP Working Group Documents • GridFTP: A Data Transfer Protocol for the Grid • Overview of working group activities and documents • Requirements • Informational draft • GridFTP: FTP Extensions for the Grid • Protocol specification
GridFTP Protocol Specifications • Existing standards • RFC 949: File Transfer Protocol • RFC 2228: FTP Security Extensions • RFC 2389: Feature Negotiation for the File Transfer Protocol • Draft: FTP Extensions • New drafts • GridFTP: FTP Extensions for the Grid
GridFTP APIs • Should there be standard API(s)? • Posix I/O • SRB client • grid_storage • globus_ftp_client • MPI-IO • HDF5 • etc • Beyond scope of this working group • Common protocol beneath these APIs would allow interoperability
Role of GridFTP Working Group • Bring together those who are interested in the future of GridFTP to help foster the… • continued specification and standardization of GridFTP • development of inter-operable GridFTP implementations • widespread adoption of GridFTP as a transfer protocol for the Grid • Develop drafts which together define GridFTP • May submit some of them to IETF • Move GridFTP forward to better address Grid data transfer requirements
NOT Goals of GridFTP Working Group • This working group will not start from first principles • Starting point is roughly GridFTP as it now exists • FTP base is assumed • Its not design by committee • Seeking rough consensus, with broad input • Draft authors and WG chair have final say
GF5 GridFTP Working Session • Is this appropriate for Grid Forum? • Who is interested in participating, and in what capacity? • Is the problem scoped appropriately (at least for now)? • What are the right drafts to write? • Establish rough timeline for drafts
A Call To Arms • The Grid Forum security working group needs to do more than just gather 3 times a year to chat about data management. • But Grid Forum is only appropriate for this activity if people meaningfully participate. • I will be doing this regardless. • But it will hopefully be done better and faster with broad participation. • If there is not meaningful participation, I won’t bother with the overhead of Grid Forum.