100 likes | 206 Views
LHCb input to DM and SM TEGs. Introduction. We have already provided some input during our dedicated session of the TEG Here are a list of questions we want to address for each session Not exhaustive, limited to one slide per session
E N D
Introduction • We have already provided some input during our dedicated session of the TEG • Here are a list of questions we want to address for each session • Not exhaustive, limited to one slide per session • It would have been useful to hear what are the proposals for evolution of SM and DM as well • These should be proposals, not diktat • Extensive discussions are needed with users and sites before embarking on full scale “prototypes” that no longer are prototypes • Whatever the future is, WLCG must ensure the long term support (no external “firm”) • Do not underestimate the amount of work needed for users to adapt • Therefore plan well ahead, gather functionality requirements… F2F Data and Storage Management TEG, Amsterdam
Data Placement and Federation • Does it imply replica catalogs are no longer needed? • How is brokering of jobs done? • Random or “where most files are” • When is data transfer performed? • By the WMS: implies a priori brokering, incompatible with pilot jobs • By the job: inefficiency of slots • Is it a cache (life time?) or just a download facility? • What is the advantage compared to placement using popularity? • Limited number of “master replicas” (e.g. 2) • Add replicas when popularity increases • Remove replicas when popularity decreases • … but still with a catalog and job brokering • What is the minimal number of sites for which it becomes worth it? F2F Data and Storage Management TEG, Amsterdam
WAN protocols and FTS • We need third party transfer! • http 3rd party transfer? OK if commercial, why support it ourselves? • We need a transfer “batch” system! • Asynchronous bulk files transfer • Whatever reliable and efficient underlying protocol is used is just fine… • There is a need to define the service class where the file has to be put (or one service class per SE) • What about the dedicated network (OPN)? • Requires a service for using it? • Not all bells and whistles may be necessary • The real point is for a user (experiment): • Transfer this list of LFNs to this SE (SE = storage class at site) • The actual physical source is irrelevant • The TS should discover whether there is an online replica, if not it should bring it online before making the transfer • Ideally it (or the SE) should register the new replica (keep consistency) • All this was already said in… Mumbai (February 2006)! • FTS 3 was looking promising… why is it dead? F2F Data and Storage Management TEG, Amsterdam
Management of Catalogues and Namespaces • See Data placement… • Do we need a replica catalog? • LHCb answer is YES: we want to be able to do brokering of jobs • May only contain information on the SEs (+ file metadata + usability flags) • Do we need a catalog with URLs? • Not necessarily: the URL can be formed from the SE information and the LFN (trivial catalog), as SE information is quite static. • Do we need a single URL (used for transfers and for protocol access)? • No problem as long as the access is transparent and fast • See SRM slide for more comments… • Namespace vs storage class? F2F Data and Storage Management TEG, Amsterdam
Security and Access Control • We MUST protect our data from deletion • LHCb doesn’t care about protecting from access so much • The current situation is INACCEPTABLE • Anyone with little knowledge (available on the web) can delete all files in Castor! • VOMS (or equivalent) identification and authorisation is a MUST! What about ARGUS? • Identity and role • Currently in Castor we have only 2 (uid, gid)! • Protection done by the LFC, but all backdoors are open • Backdoors should be closed (nsrm, stager_xxx active commands…) • Explicit “delete” permission would be desirable • Change of DN should be straightforward (not trivial, but OK in LFC, DPM) • Action from VO administrator F2F Data and Storage Management TEG, Amsterdam
Separation of disk and tape • We need two (and only two) storage classes: • T1D0 and T0D1 • This is because no space (storage class) change is possible in some implementations of SRM • T1D0 has two functions: • Archive of T0D1 • Permanent storage of read-few data (RAW, RECO) • For this the BringOnline functionality is mandatory • We need to access the data directly from the MSS without a need for replication onto another storage • Pinning is also a must (suboptimal usage of tape drives without it) • Help to the garbage collector F2F Data and Storage Management TEG, Amsterdam
Storage Interfaces: SRM and Clouds • Clouds??? • We need an interface for: • BringOnline • Pinning • Defining the storage class (unless different endpoints are used, i.e. different SEs) • Currently (Mumbai) this is done by gfal (what is its future?) • SRM is far from perfect but… • It provides the above • All efforts put into defining a standard were a miserable failure… don’t expect any other interface will be any better • … but… • We could probably wrap the minimal functionality on top of the SE native interface, if available • BringOnline and pinning not available for dCache except in SRM • Can xroot provide this functionality? • Isn’t there the danger it becomes as clumsy as SRM depending on the implementation? F2F Data and Storage Management TEG, Amsterdam
Storage IO, LAN Protocols • What is wrong with POSIX file: protocol? • Very efficient in the StoRM-GPFS implementation used at CNAF • Of course abuse is could be a danger (recursive ls) but this could be taken into account in the implementation (throttling) • Almost anybody can now write a fuse plugin to make it happen, so why not use a powerful commercial protocol? • Should access protocols be more than protocols? • i.e. interact behind the scene with the MSS, discover the file location etc… • Can a tURL be just an access point? • <protocol>://diskserver.cern.ch//<path> • … or better file:<path> • Avoid accepting URLs like “/castor/cern.ch/…” • Needs to be fixed in application layer? No “guesses”? • Do we need different URLs for different operations? • Transfer and posix-like access F2F Data and Storage Management TEG, Amsterdam
Conclusion • Whatever the future is: • Consider we need a permanently running system • No disruption of service of more than 24 hours for migration • Millions of replicas are in the current system and… it works (even when not optimal) • Any drastic change requires a lot of work on the user side (experiments framework) • Old and new systems must exist in parallel • Requirements may be different depending on the Computing Model of experiments • 7 to 12 analysis centers (LHCb) is different from 50 to 70 centers • Solutions may not be universal and complication may not be required F2F Data and Storage Management TEG, Amsterdam