200 likes | 350 Views
CERN SRM Development. CHEP06 - Mumbai. Benjamin Coutourier Shaun de Witt. Background. Original version based on SRM 1.1 Specification implemented by CERN Latest version based on SRM 2.1.1 Specification Collaborative Effort CERN (CH) RAL (UK)
E N D
CERN SRM Development CHEP06 - Mumbai Benjamin Coutourier Shaun de Witt
Background • Original version based on SRM 1.1 Specification implemented by CERN • Latest version based on SRM 2.1.1 Specification • Collaborative Effort • CERN (CH) • RAL (UK) • Based on modified WSDL (http://sdm.lbl.gov/srm-wg/srm.v2.1.1.modified.wsdl)
Tools • Based on modified WSDL (http://sdm.lbl.gov/srm-wg/srm.v2.1.1.modified.wsdl) • Selected gsoap-2.7.2
Tools • cgsi-soap plugin • Oracle (10.2.1) • umbrello (http://uml.sourceforge.net) • g++ (3.2.3) • valgrind
Design Objectives • Low latency • Short requests handled synchronously • Longer requests (involving CASTOR stager) mostly handled asynchronously • Multi-threading architecture • Robustness • Asynchronous requests stored in database
Design Objectives • Interoperability • Actually a common theme with all SRMs • Using common WSDL • Tested CASTOR SRM with DCACHE clients and DCACHE SRM with CASTOR clients • Robustness • Load testing submitting many requests near simultaneously – using Tier1 machines
Client Client Client Clients Database Design CASTOR CASTOR Nameserver CASTOR Stager SRM Server SRM Daemon SRM
Design • Significant reuse of CASTOR code • dlf • threadpools • database services • IObject model
Server Design • Thread pool • default 10 threads but can be overridden • Currently no maximum, but it should probably exist • Soap backlog • default 40 messages, but can be overridden
Daemon Design • Four dedicated threads • pool of threads for PUT requests • pool of threads for GET requests • single thread for COPY request • single thread for SRM Garbage collection • Selection from database triggered by database entry (TBC).
Data Flow Summary • Directory Functions • client – server – nameserver • PrepareToXXX, Copy, putDone • client – server – daemon – stager • Other Data Transfer • client - server • Space Management • client - server
Development Issues • gsoap • Steep learning curve • default namespace issues • sometimes generated ns1__, sometimes ns2__ • We explicitly use srm__ • API changes between minor releases using same wsdl • Meaning the generated API’s.
Development Issues • Umbrello • Not as robust or well documented as similar commercial tools • Spent several days recovering from undocumented problems. • ORACLE • Need matching versions of client and sever libraries (not v9 clients and v10 servers anyway
Interoperability issues • SRM Specs do not state when/where to use status codes • For a request like srmRm with multiple files • If any file succeeds, we return SUCCESS • If all files fail, we return FAILURE • Each file that is successful, we return DONE • Each file that fails we return FAILURE
Interoperability Issues • Explanation in return status • CASTOR SRM returns empty string • DPM SRM returns NULL • Type Promotion • Castor only supports Permanent file types • If client requests volatile or durable – • SRM returns SUCCESS • Return PERMANENT is file structure
Status • By end of January • All methods implemented except Permission functions • Full regression test suite available • Still to do • Permission functions • VOMS integration • Complete memory leak checking • Thread Tuning/Signal handling/documentation
Status • Few issues with interface to CASTOR still need investigating. • Some methods only log first DLF call • Some APIs which accept multiple files only return a single result.
CASTOR specific • Only permanent files supported • Space reservation is notional • Handled entirely within SRM with no reference to CASTOR • CASTOR storage considered semi-infinite • srmLs limits number of returns • Configurable limit • Set to 2048 currently
CASTOR specific • Suspend/Resume not supported • Dynamic space compacting not supported • Pin lifetimes are advisory • Used in weighting CASTOR garbage collection policy • Pins are applied once files are staged • putDone issued or file staged.
Castor Specific • Non-static TURL • Need to call status to get new TURL • srmRmdir does not support recursion