1 / 14

CASTOR SRM v1.1 experience

CASTOR SRM v1.1 experience. Presentation at SRM meeting 01/09/2004, Berkeley Olof Bärring, CERN-IT. Outline. CASTOR SRM v1.1 implementation Interoperability tests Problems found SRM specification GSI SRM @ GGF: GSM WG Input to the definition of SRM-Basic Conclusions and outlook.

zahina
Download Presentation

CASTOR SRM v1.1 experience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CASTOR SRM v1.1 experience Presentation at SRM meeting 01/09/2004, Berkeley Olof Bärring, CERN-IT

  2. Outline • CASTOR SRM v1.1 implementation • Interoperability tests • Problems found • SRM specification • GSI • SRM @ GGF: GSM WG • Input to the definition of SRM-Basic • Conclusions and outlook CASTOR SRM v1.1 experience

  3. CASTOR SRM v1.1 • Implements the vital operations • get, put, getRequestStatus, setFileStatus, getProtocols • No-ops: • pin, unPin, getEstGetTime, getEstPutTime • Implemented but optionally disabled (requested by LCG) • advisoryDelete • CASTOR GSI (CGSI) plug-in for gSOAP • Also used in GFAL • Evolution @ CERN: • First prototype in summer 2003 • First production version deployed in December 2003 • Other sites having deployed the CASTOR SRM • CNAF (INFN/Bologna) • PIC (Barcelona) CASTOR SRM v1.1 experience

  4. SRM request repository Grid services GSI GSI SRM gridftp CASTOR disk cache Local clients stager RFIO CASTOR name space CASTOR tape archive Volume Manager Tape queue Tape mover CASTOR SRM v1.1 CASTOR SRM v1.1 experience

  5. Deployment castorgrid.cern.ch DNS loadbalancing Test/dev node gridftp01 srm gridftp gridftp02 srm gridftp gridftp03 srm gridftp gridftp04 srm gridftp gridftp05 srm gridftp gridftp06 srm gridftp RFIO&co CASTOR (stager, nameserver, ...) CASTOR SRM v1.1 experience

  6. Interoperability tests • CASTOR SRM has been running interoperability tests with various clients, notably • GFAL (Jean-Philippe) • EDG replica manager (Peter) • FNAL/dCache SRM (Timur) CASTOR SRM v1.1 experience

  7. Problems found • The interoperability problems can be classified as: • Due to problems with the SRM specification • Due to assumptions in SRM or SOAP implementations • Due to GSI incompatibilities • The debugging of GSI incompatibilities is by far the most difficult and time consuming CASTOR SRM v1.1 experience

  8. Problems with SRM spec (1) • Lack of enumeration • All enumeration-like types are strings • Client needs to find a common denominator (e.g. cast all strings in capital letters) • Request and file state lifecycles • Concise for ‘put’ or ‘get’ • Draft proposal submitted by Timur for ‘copy’. Not yet adopted by CASTOR SRM implementation. • Undefined for ‘mkPermanent’, ‘pin’, ‘unpin’ (probably irrelevant for the latter two)? • Request history • What an SRM should with requests that have reached the “Done” or “Failed” status CASTOR SRM v1.1 experience

  9. Problems with SRM spec (2) • Immutability of request identifier • Request id is a 32 bit word • Unspecified if an SRM can reuse request ids for finished (“Done” or “Failed”) requests • SURL (Site URL) semantics • Is it an URL or URI? • If URL, does it support relative and absolute paths? • If URI  name space is virtually flat for an arbitrary client • Pin lifetime • Pin lifetime is defined to be subject for site policy • No way to query the remaining pin lifetime for a particular file • Current definition appears useless for any practical purpose CASTOR SRM v1.1 experience

  10. Problems with SRM spec (3) • Exception handling and error propagation • Unspecified if a multi-file request should fail when a subset of the files got an error • Unspecified if and when an SRM can do retries • Only one error message, global for all files in a multi-file request, is available for reporting • Format and contents of error message undefined • advisoryDelete != delete • It may be vital to know what the effect is • No effect at all (if so, what happens if SURL is reused for a new file?) • Only remove disk resident copy (if so, when?) • Remove HSM file (if so, when?) • Directory creation on the fly for ‘put’ requests • If a ‘put’ requests specifies a SURL corresponding to a path for which one or several sub-directory levels do not exist, should it create the missing dirs on the fly (provided the client has the appropriate permissions)? CASTOR SRM v1.1 experience

  11. Problems due to SRM or SOAP implementation details • SRM WSDL discovery • FNAL client put severe constraints on the wsdl publication • Bug in gSOAP v2.3 WSDL importer • Various bugs in CASTOR SRM found but not reported here  CASTOR SRM v1.1 experience

  12. GSI problems (1) • CASTOR (GSI) – EDG RC (Java TrustManager) • TrustManager does not use GSI default of SSL handshake + credential delegation, but just a SSL handshake • TrustManager client would not work with SSL 3.0, which is forced by GSI • Solution: EDG RC uses CoG (Globus Java Security Implementation) instead • CASTOR (GSI) – FNAL dCache (Java CoG) • FNAL client only used a limited number of algorithms for encryption that were not matching those provided by standard GSI • Limited Proxy certificate • GSI error reporting not working properly CASTOR SRM v1.1 experience

  13. GSI problems (2) • Administration and deployment issues • EDG globus patch for supporting for dynamic pool accounts requires GRIDMAPDIR environment to be declared, even if default location was used for the security files • configuration problems (right Root CA not trusted) • CERN CA changed the Certificate naming scheme (number added at the end of DN). New certificates were not automatically propagated (to, for instance, FNAL). • The effort for debugging GSI problems will scale with the number of SRM implementations • Establishing a ‘SRM reference implementation’ for certifying new servers and clients would help CASTOR SRM v1.1 experience

  14. Conclusions and outlook • CASTOR SRM v1.1 is in production since a couple of months at CERN and some other CASTOR Tier-1 sites • SRM interoperability does not come for free • Definition not concise enough, room for too much site specific interpretation • Is GSI interoperability an illusion and, if so, will it continue to be so?  • We have currently no plans for a CASTOR SRM v2.1 implementation. Would rather like to tighten up SRM v1.1 in the context of the GGF GSM WG and the SRM-Basic definition CASTOR SRM v1.1 experience

More Related