200 likes | 324 Views
Data and Storage Evolution in Run 2. Wahid Bhimji. Contributions / conversations /emails with many e.g.: Brian Bockelman . Simone Campana , Philippe Charpentier , Fabrizio Furano , Vincent Garonne, Andrew Hanushevsky , Oliver Keeble . Sam Skipsey …. Introduction.
E N D
Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations /emails with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier,FabrizioFurano, Vincent Garonne, Andrew Hanushevsky, Oliver Keeble. Sam Skipsey…
Introduction • Already discussed some themes in Copenhagen WLCG wkshp • Improve efficiency; flexibility; simplicity. • Interoperation with wider ‘big-data’ world. • Try to cover slightly different ground here, under similar areas: • WLCG technologies: activities since then. • ‘WiderWorld’ technologies. • Caveats: • Not discussing networking. • Accepting some things as ‘done’ (on-track) (e.g. FTS3, commissioning of xrootdfederation; LFC migration). • Told to ‘stimulate discussion’: • This time discussion -> action: lets agree some things ;-) .
Outline • WLCG activities • Data federations/remote access • Operating at Scale. • Storage Interfaces • SRM , WebDav and Xrootd. • Benchmarking and I/O • Wider World • Storage hardware technology • Storage systems, Databases • ‘Data Science’ • Discussion items
Storage Interfaces: SRM • All WLCG experiments will allow non-SRM disk-only resources by or during Run 2. • CMS already claim this – (and ALICE don’t use..) • ATLAS validating in coming months (after Rucio migration) use of WebDavfor deletion (proto-service exists); FTS3 non-SRM transfers; and alternative namespace-based space reporting. • LHCb“testing the possibility to bypass SRM for most of the usages except tape-staging. … more work than anticipated ... But for run2, hopefully this will be all solved and tested.” • Must offer as stable /reliable a service with alternative used. • Also some sites have desire for VO reservation / quota such as provided by SRM spacetokens which should be covered by alternative (but doesn’t need to be user definable like SRM).
Xrootddata federations • Xrootd-based data federation in production • All LHC experiments using a fallback to remote access • Need to incorporate last sites … • Being tested at scale ATLAS Failover usage (12 weeks) example (R.Gardner) : See pre-GDB data access And SLAC federation workshop
Xrootd data federations • Monitoring highly developed. But not quite 100% coverage and could be more used… A. Beche– pre GDB
Remote read and data federations at scale • Not all network links are perfect. Storage servers require tuning. Eg. Alice experiences from pre-GDB
Remote read at scale • Sharing between hungry VOs could be a challenge. Analysis jobs vary: CMS quote < 1 MB/s; Alice Average 2 MB/s; ATLAS H->WW hammercloud benchmark needs 20 MB/s to be 100% cpu eff. • Sites can use their own network infrastructure to protect. Vos shouldn’t try and mirco-manage but strong desire for storage plugins (e.g. xrootd throttling plugin) E.g. ATLAS H->WW being throttled by 1Gig NAT – corresponding decrease in event rate
HTTP / WebDav • As do DPM, dCache, StoRM • So will be universally available. • Monitoring – much available (e.g. in Apache) but not currently in WLCG. • XrdHTTP is done (in Xrootd4) – offers potential for xrootd sites to have http interface. FabrizioFurano : pre-GDB:
Http/WebDav: Experiments • CMS no current plans. LHCb will use if best protocol at site. • ATLAS plan use of WebDav for: • User put/get. • Deletion instead of SRM • FTS or job read if best performing • Find deployment (despite being used for Rucio rename) not stably at 100% Sylvain Blunier:
Benchmarking and I/O • Continuing activity to understand (distributed) I/O E.g. M. Tadel – Federated Storage Wkshp • Important developments in ROOT I/O, e.g.: • Thread-safety (or “thread-usability”) • TTreeCache configurable with environment variable • Cross protocol redirection. • ROOT 6 (cling/ C+11) increases possibilities SeeROOT IO Workshop
Underlying Storage Technology • Technologies in use for Run 2 already here or in development. • Magnetic disk: current increases in capacity (to 6T) using current technology, further potential for capacity (shingles, HAMR) but performance not in line • Existing Flash SSDs and hybrids • NVRAM improvements (now really really soon now …(?) …) • Would be expensive for WLCG use (though not compared to RAM) Memristor Phase change memory
Storage Systems • ‘Cloud’ (non-POSIX) scalable solutions • Algorithmic data placement. • RAIN fault tolerance becoming common / standard. • “Software defined storage” • E.gCeph, HDFS + RAIN, Vipr • WLCG sites interested in using such technologies and we should be flexible enough to use it.
Protocols, Databases • Http -> SPDY -> Http2 • Session reuse • Smaller headers • NoSQL -> NewSQL • Horizontally-scalable • Main memory LSSY qservdattabase (D. BoutignyOSG Meeting Apr 2014.) xrootd protocol
Data science • Explosion in industry interest. • Outside expertise in data science could help even the most confident science discipline (ATLAS analysis is < 400th on leader board now)
Relaxing requirements … • For example, having an appropriate level of protection for data readability • Removing technical read protection would not change practical protection as currently non-VO site admins can read it; and no-one can interpret our data. • Storage developers should first demonstrate the gain (performance or simplification) and we could push this. • Similarly for other barriers towards, for example object-store-like scaling and integration of non-HEP resources…
Summary and discussion/action points • Flexible/remote access: remaining sites need to deploy xrootd(and http for atlas). Use at scale will need greater use of monitoring, tuning and tools for protecting resources. • Protocol zoo: experiments must commit to reduce in Run 2 (e.g. in ‘return’ for dav / xrootd remove rfio, srm… ) • Wider world: ‘data science’, databases, storage technologies. Convene (and attend) more outside-WLCG workshops to share. • Scalable resources: We should aim to be able to incorporate a disk site that has noWLCG specific services / interfaces • BDII, Accounting, X509, perfsonar, SRM, ‘package reporter’