170 likes | 296 Views
HTTP as a better file transfer protocol default for SRM. By Owen Synge. Reason for SRM and default transfer protocol. Service Orientated API’s Provide specifications separate from the implementation Allows API specification to be compared Facilitates Interoperability
E N D
HTTP as a better file transfer protocol default for SRM By Owen Synge HTTP as a better file transfer protocol default for SRM
Reason for SRM and default transfer protocol • Service Orientated API’s • Provide specifications separate from the implementation • Allows API specification to be compared • Facilitates Interoperability • Service orientated API’s are well defined. • SRM Design Goals • Interoperability • Between implementations • Between Sites • Facilitating Site neutral Storage access • Performance • Transfer protocol defaults • Allow implementations to know what protocols Should/Shall be implemented allowing Interoperability HTTP as a better file transfer protocol default for SRM
Protocol introduction • FTP • http://www.w3.org/Protocols/rfc959/ • FTP is a stateful protocol • Concepts of Current working directory, control and data channels, etc • Initially intended for interactive use, now more general. • HTTP (v1.1) • http://www.w3.org/Protocols/rfc2616/rfc2616.html • HTTP is a stateless • One connection, all paths absolute, etc • protocol primarily intended to be managed within an application • typically a web browser on the client side. HTTP as a better file transfer protocol default for SRM
FTP Overview • FTP Control channel • Follows Telnet protocol. • Specifies parameters for the data connection. • data port, transfer mode, representation type, and structure. • Data connection can be between 2 remote hosts (3rd part copy) • Must be open while data is transmitted • FTP Data channel • Can operate in both directions simultaneously • Modal in behaviour • Data types (ASCII, EBDICC, Image) • Data Structures(File,Record, Page) • Default connection management behaviour may be overridden • Client or server may listens for transfer connections initiated by a client or server HTTP as a better file transfer protocol default for SRM
Http Overview • The HTTP protocol is a request/response protocol • A client sends a request to the server • URI, and protocol version, MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. • The server responds • status line, • message's protocol version • success or error code • a MIME-like message containing server information, entity metainformation, and possible entity-body content. • Can interact through proxys and caches. • In HTTP/1.0, a new connection per request/response. • In HTTP/1.1, a connection may be reused for many requests. HTTP as a better file transfer protocol default for SRM
FTP Diagram Control Channel negotiates data transfer streams between client and server systems. User Interface User Server Protocol Interpreter Control Channel User Protocol Interpreter File System Server DTP User DTP File System Dynamic Data Transfer Streams HTTP as a better file transfer protocol default for SRM
HTTP Diagram Client Server System Client initiates a connection with the server, the server the returns a response. The simple interface makes use of headers in request and response indicating the content and data being transported. This simplify the connection initiation considerably making initiating an HTTP request far faster than FTP. Client Server System Server System Parallel transfers initiated by multiple client Requests each using “Ranging” HTTP as a better file transfer protocol default for SRM
Comparison of HTTP and FTP for a SRM 1. • Performance • Availability of high performance servers • WuFTP and Apache are both highly performent. • Apache uses less CPU than WuFTP for same throughput • Little to no difference in measured maximum throughput • Both support parallel data streams • HTTP marginally faster with small files. • Results soon, Manchester focusing on HTTP 3rd Party copy! • Ambiguity/SRM functionality duplication • FTP provides much of the SRM API functionality including directory management, file management. • HTTP only provides duplication of Delete. • Less entry points leads to more consistent use. • Less functionality to develop/test with http as default transfer protocol. HTTP as a better file transfer protocol default for SRM
Comparison of HTTP and FTP for an SRM 2 • Availability of Client libraries • libCURL is well tested HTTP/FTP library • Unmodified libCURL can provide parallel data streams over HTTP. This may also be true for FTP. • Consistency of access • SOAP is typically used over http rather than ftp. • Single point of network connectivity if http used. • authentication needed only once in code base • Less server code required. • FTP specification is large and has minor ambiguities. • HTTP v1.1 is well defined and simpler. HTTP as a better file transfer protocol default for SRM
Comparison of HTTP and FTP for an SRM 3 • FTP Networking Issues not found with HTTP • FTP supports Active and Passive transfers • FTP protocol requires clients to act as servers so inbound connectivity may be requested by clients. • System administrators often use listening ports to establish compromised nature of service. • This required support is not a problem if data channels are set to listen at the server end and are initiated by the client. • GsiFTP requires a port range for its data transfer streams. • Port ranges cause deployment issues and repeated contact with site firewall administrators. • Firewalls often interpret unencrypted FTP control channels (Not with BBFTP or GsiFTP as they are encrypted). • End user issues caused by firewall + FTP • Admin only sets part of port range available causing issues under high load only, making it difficult to debug problem. • Clients inadvertently use active FTP on sites where it is blocked. HTTP as a better file transfer protocol default for SRM
Comparison of HTTP and FTP for an SRM 4 • Security • User Authentication • FTP has the concept of user authentication within its protocol • redundant as PKI infrastructures are the basis of Grids not user name/password based systems. • http must be wrapped in an authentication layer to provide user authentication • Request transformation. • FTP takes its content from the file system • Within Globus GsiFTP authz acts as an intermediary • HTTP may take its content from the file system but dynamic content assumed in specification. HTTP as a better file transfer protocol default for SRM
Stakeholders • SRM end user/Client Developer • The SRM end users want a consistent way to share data between tape and disk systems • SRM Server Developer • Maintaining, implementing and creating new functionality within their own SRM implementation • SRM Administrator • responsible for maintaining and implementing an SRM service which may use one of the many SRM implementation. This service will require one or more computers which may aggregate into a stand alone system. • Site networking administrators • Wants to keep network traffic under control through a site firewall or alternative practices. HTTP as a better file transfer protocol default for SRM
SRM end user/Client Developer preferences • A default data transfer protocol is required • This prevents Client applications failing as the lowest common denominator transfer protocol will always exist. So preventing clients only supporting say HTTP failing when they connect to servers only supporting FTP protocols. • HTTP v FTP • A client developer is unlikely to care if HTTP or FTP is the default protocol. • Good Client libraries exist in many libraries for both protocols. • HTTP and FTP both have the possibility of persisting security contexts. • HTTP is marginally simpler to develop with as its less statefull and has less error scenarios. • HTTP is simpler so less possibility of failure HTTP as a better file transfer protocol default for SRM
SRM Server Developer preferences • HTTP support is already required to support SOAP as defined in the SRM specification. • Less servers is less work • HTTP is stateless • This makes the system more scalable as less server side state must stored. • FTP is a richer protocol which assumes a file system • So we need an inter-position library to provide the file system redirection or AuthZ event handling • Requires considerable functionality and code. • FTP has file management functionality • Do I have to honour FTP renaming files? • FTP’s just transfer data? HTTP as a better file transfer protocol default for SRM
SRM Administrator preferences • Dynamic port usage • HTTP is on a single port • FTP is dependant on dynamic port selection. • Are all these ports in use or have I been hacked? • Server Firewalls are simpler. • Certificate to UID mapping • FTP assumes mapping to the server user accounts making management harder. • Can be bypassed but extra work required to manage users required by developers. • HTTP does not couple server accounts to the protocol. HTTP as a better file transfer protocol default for SRM
Network/Firewall Administrator preferences • Inbound connectivity • HTTP does not require in bound connectivity to clients • FTP may request inbound connectivity to clients • Dynamic port use • HTTP does not use dynamic ports • FTP does make use of dynamic data transfer ports. • Typically a range of 1000 for GsiFTP • Provided HTTP is not used on ports 80 or 8080 it is generally felt that it would simplify data transfer through the Firewall and the firewall rules. HTTP as a better file transfer protocol default for SRM
Summary: HTTP will provide real benefit over FTP based protocols • We should state a default data transfer protocol to aid interoperability. • So clients can expect a transfer protocol • All stake holders will have gains. • Mostly in deployment, administration and development. • FTP couples user, directory, and file management • overrides within implementations produce extra code/testing. • extra admin work if FTP functions not overridden. • Some improvements in scalability expected. • HTTP is a well understood stateless protocol. • More widely deployed than statefull FTP. • Some Grid applications already made the switch for GsiFTP to Gsi authenticated HTTP. (LHC Resource broker) • For performance and development effort required. HTTP as a better file transfer protocol default for SRM