1 / 20

Data and Storage Evolution in Run 2

Data and Storage Evolution in Run 2. Wahid Bhimji. Contributions / conversations /emails with many e.g.: Brian Bockelman . Simone Campana , Philippe Charpentier , Fabrizio Furano , Vincent Garonne, Andrew Hanushevsky , Oliver Keeble . Sam Skipsey …. Introduction.

nuru
Download Presentation

Data and Storage Evolution in Run 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations /emails with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier,FabrizioFurano, Vincent Garonne, Andrew Hanushevsky, Oliver Keeble. Sam Skipsey…

  2. Introduction • Already discussed some themes in Copenhagen WLCG wkshp • Improve efficiency; flexibility; simplicity. • Interoperation with wider ‘big-data’ world. • Try to cover slightly different ground here, under similar areas: • WLCG technologies: activities since then. • ‘WiderWorld’ technologies. • Caveats: • Not discussing networking. • Accepting some things as ‘done’ (on-track) (e.g. FTS3, commissioning of xrootdfederation; LFC migration). • Told to ‘stimulate discussion’: • This time discussion -> action: lets agree some things ;-) .

  3. Outline • WLCG activities • Data federations/remote access • Operating at Scale. • Storage Interfaces • SRM , WebDav and Xrootd. • Benchmarking and I/O • Wider World • Storage hardware technology • Storage systems, Databases • ‘Data Science’ • Discussion items

  4. The LHC world

  5. Storage Interfaces: SRM • All WLCG experiments will allow non-SRM disk-only resources by or during Run 2. • CMS already claim this – (and ALICE don’t use..) • ATLAS validating in coming months (after Rucio migration) use of WebDavfor deletion (proto-service exists); FTS3 non-SRM transfers; and alternative namespace-based space reporting. • LHCb“testing the possibility to bypass SRM for most of the usages except tape-staging. … more work than anticipated ... But for run2, hopefully this will be all solved and tested.” • Must offer as stable /reliable a service with alternative used. • Also some sites have desire for VO reservation / quota such as provided by SRM spacetokens which should be covered by alternative (but doesn’t need to be user definable like SRM).

  6. Xrootddata federations • Xrootd-based data federation in production • All LHC experiments using a fallback to remote access • Need to incorporate last sites … • Being tested at scale ATLAS Failover usage (12 weeks) example (R.Gardner) : See pre-GDB data access And SLAC federation workshop

  7. Xrootd data federations • Monitoring highly developed. But not quite 100% coverage and could be more used… A. Beche– pre GDB

  8. Remote read and data federations at scale • Not all network links are perfect. Storage servers require tuning. Eg. Alice experiences from pre-GDB

  9. Remote read at scale • Sharing between hungry VOs could be a challenge. Analysis jobs vary: CMS quote < 1 MB/s; Alice Average 2 MB/s; ATLAS H->WW hammercloud benchmark needs 20 MB/s to be 100% cpu eff. • Sites can use their own network infrastructure to protect. Vos shouldn’t try and mirco-manage but strong desire for storage plugins (e.g. xrootd throttling plugin) E.g. ATLAS H->WW being throttled by 1Gig NAT – corresponding decrease in event rate

  10. HTTP / WebDav • As do DPM, dCache, StoRM • So will be universally available. • Monitoring – much available (e.g. in Apache) but not currently in WLCG. • XrdHTTP is done (in Xrootd4) – offers potential for xrootd sites to have http interface. FabrizioFurano : pre-GDB:

  11. Http/WebDav: Experiments • CMS no current plans. LHCb will use if best protocol at site. • ATLAS plan use of WebDav for: • User put/get. • Deletion instead of SRM • FTS or job read if best performing • Find deployment (despite being used for Rucio rename) not stably at 100% Sylvain Blunier:

  12. Benchmarking and I/O • Continuing activity to understand (distributed) I/O E.g. M. Tadel – Federated Storage Wkshp • Important developments in ROOT I/O, e.g.: • Thread-safety (or “thread-usability”) • TTreeCache configurable with environment variable • Cross protocol redirection. • ROOT 6 (cling/ C+11) increases possibilities SeeROOT IO Workshop

  13. The rest of the world

  14. Underlying Storage Technology • Technologies in use for Run 2 already here or in development. • Magnetic disk: current increases in capacity (to 6T) using current technology, further potential for capacity (shingles, HAMR) but performance not in line • Existing Flash SSDs and hybrids • NVRAM improvements (now really really soon now …(?) …) • Would be expensive for WLCG use (though not compared to RAM) Memristor Phase change memory

  15. Storage Systems • ‘Cloud’ (non-POSIX) scalable solutions • Algorithmic data placement. • RAIN fault tolerance becoming common / standard. • “Software defined storage” • E.gCeph, HDFS + RAIN, Vipr • WLCG sites interested in using such technologies and we should be flexible enough to use it.

  16. Protocols, Databases • Http -> SPDY -> Http2 • Session reuse • Smaller headers • NoSQL -> NewSQL • Horizontally-scalable • Main memory LSSY qservdattabase (D. BoutignyOSG Meeting Apr 2014.) xrootd protocol

  17. Data science • Explosion in industry interest. • Outside expertise in data science could help even the most confident science discipline (ATLAS analysis is < 400th on leader board now)

  18. Discussion

  19. Relaxing requirements … • For example, having an appropriate level of protection for data readability • Removing technical read protection would not change practical protection as currently non-VO site admins can read it; and no-one can interpret our data. • Storage developers should first demonstrate the gain (performance or simplification) and we could push this. • Similarly for other barriers towards, for example object-store-like scaling and integration of non-HEP resources…

  20. Summary and discussion/action points • Flexible/remote access: remaining sites need to deploy xrootd(and http for atlas). Use at scale will need greater use of monitoring, tuning and tools for protecting resources. • Protocol zoo: experiments must commit to reduce in Run 2 (e.g. in ‘return’ for dav / xrootd remove rfio, srm… ) • Wider world: ‘data science’, databases, storage technologies. Convene (and attend) more outside-WLCG workshops to share. • Scalable resources: We should aim to be able to incorporate a disk site that has noWLCG specific services / interfaces • BDII, Accounting, X509, perfsonar, SRM, ‘package reporter’

More Related