180 likes | 292 Views
GridSite Storage. Andrew McNab University of Manchester. Outline. GridSite's evolution File servers htcp command Storage farms File location POSIX access SRM etc. GridSite evolution. Started as web content management www.gridpp.ac.uk “library-ised”, for reuse of GridSite components
E N D
GridSite Storage Andrew McNab University of Manchester
Outline • GridSite's evolution • File servers • htcp command • Storage farms • File location • POSIX access • SRM etc
GridSite evolution • Started as web content management • www.gridpp.ac.uk • “library-ised”, for reuse of GridSite components • EDG/LCG Logging & Bookkeeping; LCAS • GridSite CGI becomes GridSite Apache module • 3rd party CGI/PHP on top of this: GOC etc • Web Services like gLite WM Proxy on CE's • Storage is current expansion area for GridSite
GridSite philosophy • Aim to reuse as much as possible from mainstream Web and Web Services worlds • Applies both to software and standards • Reduces work needed and ongoing support overhead • We use Apache, OpenSSL, curl, gSOAP, libxml, ... • Aim for ease of configuration and operation • Try to keep everything in httpd.conf file • Autoconfigure hostname etc as much as possible
File servers • Apache web servers are already simple file servers • GridSite adds directory or file level access control, in terms of certificate DN, lists of DNs or VOMS attributes • Also allows PUT, MOVE and DELETE http(s) methods for writing to disk • All specified by the RFCs, but usually not implemented • Along with third-party COPY method, this gives Apache + mod_gridsite very similar functionality to GridFTP • but with the fine grained, VOMS-aware access control
htcp command • htcp is similar to scp (or globus-url-copy) • Allow copy of files to or from remote server, using grid proxy, VOMS etc credentials: • htcp /tmp/myfile.txt https://grid0.hep.man.ac.uk/dir/ • Variants htls, htll, htrm, htmv also allow users to examine remote directories, delete files or rename them. • htls https://grid0.hep.man.ac.uk:488/dir/ • Support GridSite's “GridHTTP” mode for authentication via HTTPS but bulk file copy via HTTP.
Storage farms • As we've gone up in storage, have gone up in number of nodes with “storage” disk • BaBar UK c.2000 with a Sun raid array and farm of Linux CPU nodes at each site • Sites like Manchester, all storage now on CPU nodes • Traditional cluster mechanisms like NFS+automount don't scale up to 1000 nodes • So dCache, DPM, xrootd etc have emerged to give access to these disks from other CPU nodes
GridSite storage • Our idea is to use GridSite/Apache file servers on the CPU/Disk nodes • Aim to be as “democratic” as possible, since it removes single points of failure/overload • Query the state of the files on the disk rather than duplicate this information in a database • This will require use of lock-files on disk for some meta data (compare pool accounts) • Provide access via HTTP(S), GridFTP, htcp, POSIX files
File location • How to find files on a node without a database? • By querying the nodes directly. • We do this with multicast • RFC2756 describes HyperText Cache Protocol which we use to format “Do you have?” queries and responses • Added a multicast UDP responder to GridSite/Apache module, configured via 2 “SiteCast” lines in httpd.conf • File server just looks for file and replies if it has it • HTCP round trip time between client and server is usually between 200 and 900 microseconds
POSIX file access • htcp command supports SiteCast file location • htcp --domain sitecast.hep.man.ac.uk --groups 224.0.0.111:777 https ://sitecast.hep.man.ac.uk:488/file.txt /tmp/file.txt • But would like to be able to access files on other nodes • without applications having to know about this • or without having to copy files temporarily to this CPU node • Need POSIX-like access, as we had with NFS etc • So we've revived the SlashGrid part of GridSite • This hasn't been actively developed since 2003
SlashGrid • Use FUSE kernel module (mainstream in Linux 2.6.14) • Connects slashgrid daemon to the /grid part of the filesystem • Daemon acts on open(), read(), write(), unlink() etc. • We use the code from htcp commands to generate HTTP(S) GET, HEAD, PUT, MOVE, DELETE requests • either absolute URLs or SiteCast location URLs • Uses GSI proxies (including VOMS) if present • emacs /grid/https/n0.hep.man.ac.uk:488/mcnab/notes.txt • TFile::Open(“/grid/https/sitecast.hep.man.ac.uk/d1/file34.root”)
GridFTP access • Clients currently assume GridFTP access • So we want to give access to “Apache” files in /var/www/html • But without breaking .gacl access control • We've added a /grid/local/ filesystem • maps requests to local /var/www/html/ directory • enforces any .gacl access restrictions • identifies user from pool accounts, applies DN Lists (including from VOMS) • Run standard GridFTP server on this filesystem, in chroot mode • SiteCast works with gsiftp:// URLs and /grid/local/ directories
So what we have... • Transparent access to files on any other CPU/Disk node on the local farm • No need to maintain a database of file location • No resyncing, backing up database, building DB farm etc • Files on dead nodes automatically disappear • Unless there is another replica, which is used automatically • Read/write access via HTTP(S) (htcp, wget, Firefox), GridFTP (globus-url-copy, lcg-cp, ...) and POSIX (/grid/...) • All the fine-grained, VOMS-aware access control from GridSite is available, irrespective of the access protocol
What's needed • SRM of course! • We are designing an SRM using SiteCast as a backend instead of a database • Map SRM “chmod” functions to modification of GACL policies • This gives us VOMS level access control of files • Be able to use SiteCast to locate free space + reservation • Create space lockfiles to reserve space (ie sparse files) • Global disk quotas across the site • Allocate disk on N nodes to a VO?
What's needed (2) • A set of scripts or services which • monitor SiteCast requests to identify “busy” files and make more replicas of them • remove unused/expired files • enforces changes to global quotas by shifting / expiring files • This way of working means there is no state stored in a “site management” box • it can go down, be rebooted, reinstalled etc and the day to day business of running jobs carries on.
Advantages • SiteCast makes it easy to find replicas, ignoring dead nodes • GridSite supports searching multiple multicast groups in order • “Query this rack, then other racks, then other machine room.” • Could form virtual Tier-2's by sharing multicast groups: • SRM at each physical site is able to find files located in the others. CPU nodes can transparently access them via /grid • SlashGrid retries on server error, including the SiteCast query • Will automatically switch to another replica even during read() • Admins can replicate files off nodes to be taken down without disturbing running jobs
Architecture Client application GridFTP htcp cmds /grid/ HTTP(S) Queries by multicast GridFTP in chroot Apache/mod_gridsite UDP responder and HTTP(S) file access /grid/local/ Node1 /var/www/html/... Node2 Node3 Node4
Conclusion • Have combined GridSite file servers and SlashGrid clients to provide transparent access to files on a storage farm. • Access also by HTTP/HTTPS/GridHTTP and GridFTP • This uses multicast HTCP queries (“SiteCast”) to find replicas of files • Have avoided the need for a database, by ensuring file operations on servers are atomic and using actual file states • Now looking at adding an SRM interface and space location + reservation via SiteCast