300 likes | 357 Views
SRM 2.2 Issues Well, er, and 2.3 too. Jens Jensen (STFC RAL/GridNet2) On behalf of GSM-WG OGF22, Cambridge, MA. This Talk. Deviates from previous principles of being for beginners Technical Less polished… May be useful for others… Expose standard and protocol process
E N D
SRM 2.2 IssuesWell, er, and 2.3 too Jens Jensen (STFC RAL/GridNet2) On behalf of GSM-WG OGF22, Cambridge, MA
This Talk • Deviates from previous principles of being for beginners • Technical • Less polished… • May be useful for others… • Expose standard and protocol process • Not many answers – kickstart(restart) process • Combines the two sessions • Input (mainly) from dCache, CASTOR, StoRM
Aims • Revisit specification • Implementations’ deviations from OGF specifications • Ensure another group can interoperate • If someone else were to start from scratch • E.g. SRB (ASGC work) • Aim is not to start work on 2.3 • I.e. the aim is not – not the aim is to not, not that aim is not to start • If that makes sense
A Very Brief History • Spec from 2006 • Then came implementations • Then came WLCG • …revisit spec • Now getting experiences • …revisit spec, highlight issues • …think about next steps
Philosophies • Manage diverse storage systems (but nothing else) • User interface (not admin) • Open Standard • A standard is not a standard until it is a standard (next slide) • Open participation (no fees, no closed societies) • Protect storage from Grid? • Encourage best practices? • Encourage uniformity? Allow diversity? • The File is the unit of currency (not datasets)
Compare OASIS • “Approved within an OASIS Committee,” • “Submitted for public review,” • “Implemented by at least three organizations,” • “And finally ratified by the Consortium's membership at-large.” • We would add that the three implementations “must interoperate”!
WLCG • Wide deployment • “Now get experience” with WLCG • MoU: Significant changes to spec… • Do they make sense? Process. • What about smaller customers? • Tape1Disk1=ONLINE_AND_NEARLINE? • …No. In cache does not mean always in cache
Space Tokens on Get • srmPrepareToPut uses a space token (description) • srmPrepareToGet doesn’t • Also for srmBringOnline • Problem for many implementations • dCache, CASTOR • dCache: MSS doesn’t see space token • StoRM: not needed
Other get issues • Getting directories? • Not supported? • Or special permissions required? • Also to apply for large bulk requests?
Finance Use Cases • Ezio Corso (ICTP/E-Grid) (StoRM) • Compare EGEE industry liaison • “Complexity of financial instruments” • “more stringent risking and reporting requirements” • “Point solution” grids inefficient (silo) • Big computing makes data bottleneck • Access control by individuals
Spaces • Access Control on spaces • Also to be published in GLUE 1.3 schema as ACBR on VOInfo • Reserving subspaces of spaces • Summarising spaces for Owner • Query space status?
What is a Space Anyway? • A collection at least one of physical storage component area? • With a common baseline set of capabilities (access latency etc)? • Not to even mention “free” space, “used” space, etc. • Tricky to define • Even more tricky to measure • Still more tricky to get agreement
What is a Space anyway? • Is everything a space? • Suggestion to have toplevel static spaces • Is disk a space? Or can space have disk? • Spaces can be named by token descrs • Always named by space token descr? • Can be referenced by path? Non-uniquely? • Can be referenced (non-uniquely) by capabilities? • Is a (static) space an SA?
Space Behaviour • What happens if a file is released? • Space given back to the Space? • Space does not re-grow? • Permanent file in limited space? • Used to be: not permitted • Now, space is shrunk and released • Keep token around, or permit recycling?
Permissions • Simple Unixy (POSIX) permissions • Default permissions on directories • Inheritance from above? • Consistent with space permissions, if applicable? • Default (per VO?) • Permit for roles and groups? • Stage in permission (protect write cache) • Not the same as reading
Permissions • StoRM calls out to LFC • Access control API in SRM not adequate • Use LFC’s API • Multiple StoRMs can share an LFC • => Can synchronise between SE and LFC
Return Codes • SRM_REQUEST_QUEUED • SRM_REQUEST_INPROGRESS • srmCopy()
Use of GSI authentication • Currently using SOAP over GSI sockets • GSI needed for delegation • Delegation needed for srmCopy() (only) • Incompatible with SSL • Proposal to use gLite delegation • SOAP API specifically for delegation • AstroGrid uses home-made REST-based • Not using WS-Anything • Many are Java only, too complex, not mature
FileStorageType • Volatile, Durable, Permanent • Should have been: • ReleaseWhenExpired, WarnWhenExpired, NeverExpire • Avoid confusion with overloaded term from 1.1 – wrongly named in spec. • What is done on Durable/WarnWE timeout? (“raise error condition”)
Access Latency • OFFLINE not defined • Not used by WLCG • But does that mean it doesn’t exist? • ONLINE_AND_NEARLINE mentioned • LOST… • UNAVAILABLE…
Default • Certain aspects of API optional • Standard default? • Or implementation-defined default? • E.g., “default” space • Default filesize on put? • Is it 1? • Is it implementation dependent? Space dependent? • Is it returned?
Implicit pinning Implicit reservations Implicit lifetimes Implicit changes on action: Implicit changes on expiry Surprising for users? Complicates implementations? What if permission denied for implicit action? What is reasonable? Implicit
Explicit but unknown • Changing spaces (capabilities) • WLCG restricted D1T1 <-> D0T1 (more or less)
Best Practices for Clients • Propagate errors to user • Clean up after yourself… • Even after unclean exit • Should SRM use request timeout and keepalive? • Cancel at any point? • Or only when queueing
srmCopy • Was always slightly tricky (also in 1.0 1.1) • Needs delegation (GSI problem) • How and when does client check status • What if remote host is not an SRM2? • Push modes and pull modes – and firewalls • And then the GridFTP modes (push/pull) • And the GridFTP streams • Can’t always get good results if implementation uses defaults or tries to guess • No way to set most parameters
srmLs problem • Classical problem with large directories • Exercise: on a normal filesystem ls -R dir with large directories. While you wait, try to use the system. • Large data volumes in SOAP • Attachment supported? • Truncate, offset
Which bits are optional…? • Many features • Most parameters • TExtraInfo
Continue this process Define terminology Assess “damage” 2.3 No, not yet Too soon, not enough experience with 2.2 Adaption difficult Options Do nothing Too late (WLCG) Document differences Retrofit things into 2.2 Add to 2.2 (incremental) Postpone to “2.3” Postpone to 3.1 Next Steps
Future Stuff • WSRF • Rich Wellner (2004) • (WSRT?) • Avoid duplication • Compare OGSA-D-Arch • Proposes modular architecture for data
More Capabilities • Integrity checking • Act when integrity checking fails? • Service description, agreement (dynamic) • File content • Data sets, chunks • Dynamic resource allocation • Networks, additional storage, disk servers (now known as virtualisation) • Recovery