150 likes | 300 Views
Grid HTTP/HTTPS extensions 16 December 2002. Andrew McNab, University of Manchester mcnab@hep.man.ac.uk. Overview. HTTPS as a grid protocol HTTP as a data protcol Multistream HTTP: curl-url-get Grid HTTP/HTTPS usage G-HTTPS Trusted Caches fileGridSite HTTPS server
E N D
Grid HTTP/HTTPS extensions 16 December 2002 Andrew McNab, University of Manchester mcnab@hep.man.ac.uk
Overview • HTTPS as a grid protocol • HTTP as a data protcol • Multistream HTTP: curl-url-get • Grid HTTP/HTTPS usage • G-HTTPS • Trusted Caches • fileGridSite HTTPS server • Third Party Transfers • curlfs for SlashGrid • Summary
HTTPS as a Grid protocol • HTTPS is an interesting and important protocol for several reasons: • it is by far the most widely deployed secure protocol • has a large amount of high quality software that we could leverage • has excellent interaction with Firewalls, Network Address Translation and Application Proxies • has the potential to solve some of the problems sites have with private IP farms • along with HTTP, is the basis for Web and Grid Services • HTTPS consists of HTTP/1.1 over an SSL connection • security done by SSL layer, using X509 certificates (including GSI) • HTTP/1.1 (rfc2616) and extensions like WebDAV (rfc2518) have a rich set of methods (GET, PUT, DELETE, COPY etc) headers (“Expires:” etc) and Errors (“413 Request Entity Too Large”) • so a standard way exists for many of the transfer operations we need
HTTP as a data protocol • Same advantages as HTTPS: large amount of existing high quality software, and good operation with Firewalls, NAT etc. • If we build secure HTTPS information/control services, easy to provide HTTP data services: • Do GET during HTTPS session, but server responds with redirect to HTTP data server? • So GridFTP Control & Data channels --> HTTPS Negotiate and HTTP Data connections • Kernel-based “zero-copy” HTTP servers like tux are very efficient • need to do something like that to fully use a machine’s gigabit interface • HTTP connection and a GridFTP data channel are same at TCP layer • but may want a way to specify TCP parameters to be used by HTTP server responding with data
Multistream HTTP • HTTP can support application-level multiple streams and striping by using the standard Range: header from RFC 2616 (HTTP/1.1) to set up many partial fetches. • This mechanism is supported by almost all modern web servers • eg Apache and RedHat’s tux kernel httpd • Multiple streams implemented by client splitting into threads • Each thread requests a block of the file from the server • As each request completes, thread finds next unfetched block and requests it • Striping by doing the same mechanism, but with more than one server • curl-url-get demonstrates both of these • source is 300 lines of C, in EDG CVS
curl-url-get examples • Rough tests done, copying files from Manchester to CERN • elapsed times in seconds, average of 10 copies of each type, alternated Size curl-url-get globus-url-copy streams 292M 64.6±6.1 62.1±4.9 20 292M 96.0±9.2 74.8±3.8 5 29M 7.1±1.4 6.9±1.8 20 29M 31.6±0.4 15.9±0.9 1 2.9M 0.49±0.07 2.24±0.10 20 2.9M 3.30±0.16 2.61±0.18 1 2.9K 2.15±0.04 20 2.9K 0.11±0.00 1.05±0.10 1
Extensions to HTTPS/HTTP • HTTPS/HTTP already have most of the functionality we need for Grid information/control/data transport • some of these come from several sources (eg the WebDAV RFC2518 not just HTTP/1.1 itself) and can be done different ways • so want to specify a sufficient subset for interoperability • However, can identify some extensions that are also needed: • delegation to HTTPS • some way of returning access control information along with data • other metadata too • may want to specify TCP parameters for bulk data tranfer
“G-HTTPS” • A proposal by Akos and me, for backwards compatible extensions to HTTPS • discussed on wp2-sec and wp7-security lists • Adds GSI proxy delegation to HTTPS using additional methods (eg PUT-PROXY) and headers (eg Delegation-ID) • Allows services to return generalised metadata in headers or by URL • initially this allows services to return the GACL ACL of a response for more efficient caching (ie sharing cached copies with other users.) • essential to include expiration and caching policy information too • Aim is to avoid breaking existing HTTPS systems and to achieve “pass through” compatibility: • even if HTTPS client or server software doesn’t understand extensions, they can make them available to the application which does
Example of delegation by HTTPS • Client issues GET-PROXY-REQ request, perhaps with a message body specifying any extensions required in the proxy cert • Server generates a key and a certificate request, returns this in the response message body. • Client signs this, and returns it in the body of a PUT-PROXY request • Need a Delegation-ID header in the above exchanges so can keep track of the delegation session • may want to maintain delegation sessions for the same user at one server, but with different amounts of delegation • Subsequent GET, PUT etc actions carry on using the Delegation-ID • Non G-HTTPS server will respond with “501 Method not implemented” to above methods
Application of delegation: Trusted Caches • Many information services are going to need delegation, but Trusted Caches are one purely file transfer application of this • Existing HTTPS isn’t cache-able: • connection from client to origin server for trust to mechanism work • So best you get is opaque proxying/tunneling of SSL • With delegation, can improve this: • identifies a caching server it trusts (in its VO maybe?) • delegates a credential to it • makes an HTTP proxy request via HTTPS: GET http://a.b.c/def • caching server fetches this using delegated credential, gives it to client • if can get an ACL for this file, may be able to return file from cache in subsequent requests • also means that only real HTTPS works, not other things hidden in SSL
fileGridSite • Read (GET) well supported by HTTPS servers. • However, write (PUT, DELETE, MOVE, COPY) usually left to CGI programs, servlets etc. • Access control also usually limited to client IP or HTTP passwords. • fileGridSite adds Grid authorisation and write operation support to Apache • a cut-down version of GridSite (used for https://marianne.in2p3.fr) • file rather than webpage orientated (no fancy headers on HTML etc) • uses GACL to handle the Access Control Lists • can work with mod_ssl-GSI so clients can authenticate with a GSI proxy • Turns an Apache webserver into a Grid HTTPS fileserver with the key functionality of a GridFTP server.
fileGridSite examples with curl • Curl is a standard HTTP/HTTPS command line client (cf wget) • Get a file using GSI proxy in /tmp/x509up_u100 • curl --capath /etc/grid-security/certificates/ --cert /tmp/x509up_u100 https://a.b.com/example1.txt • Copy a file to the fileGridSite server with HTTP PUT: • curl --capath /etc/grid-security/certificates/ --cert /tmp/x509up_u100 --upload-file /tmp/example2.txt https://a.b.com/example2.txt • Delete a file with HTTP DELETE: • curl --capath /etc/grid-security/certificates/ --cert /tmp/x509up_u100 --request DELETE https://a.b.com/example2.txt • Create a directory with PUT to …/ • curl --capath /etc/grid-security/certificates/ --cert /tmp/x509up_u100 --request PUT https://a.b.com/newdir/
Adding delegation to fileGridSite • Doing this as a demonstration of G-HTTPS extensions • Delegation needed for Third Party Transfers • Use COPY from WebDAV RFC2518 which allows source or destination to be absolute URL’s • Spec actually allows “fourth party” too, involving two remote URL’s and the transfer being tunneled through the server. • Delegation also useful for fileservers which need credentials to access local storage • to get token for local AFS cell (Lyon have had to work around this with GridFTP servers)
curlfs for SlashGrid • curl is built on top of a general library, libcurl • handles persistent HTTP and HTTPS connections, SSL setup etc • To add HTTP and HTTPS filesystems to SlashGrid, have made a libcurl filesystem plugin: curlfs • This maps parts of the URL space into the local filesystem: • https://a.b.com/newdir/ ---> /grid/https/a.b.com/newdir/ • Works with any standard HTTP or HTTPS server • rpm -i /grid/http/datagrid.in2p3.fr/distribution/globus/beta-21/RPMS/* • SlashGrid framework provides GSI proxy or full cert/key to curlfs so it can make authenticated requests. • Write with HTTP/1.1 PUT and DELETE being added to curlfs • Will complement fileGridSite support for these on server side
Summary • HTTPS as a grid protocol • G-HTTPS extensions being worked out • HTTP as a data protocol • even a quick multistream HTTP hack seems very competitive • fileGridSite HTTP(S) server has been written • supports read/write with standard utilities like curl • third party transfers being added as demonstration of delegation • curlfs written for SlashGrid: maps URL’s into filesystem • Source code for curl-url-get, fileGridSite, curlfs is in EDG CVS • See http://www.gridpp.ac.uk/authz/ for more details