1 / 47

Standard Interfaces to Grid Storage DPM and LFC Update

Standard Interfaces to Grid Storage DPM and LFC Update. Ricardo Rocha, Alejandro Alvarez ( on behalf of the LCGDM team ). EMI INFSO-RI-261611. Main Goals. Provide a lightweight, grid-aware storage solution Simplify life of users and administrators Improve the feature set and performance

jason
Download Presentation

Standard Interfaces to Grid Storage DPM and LFC Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Standard Interfaces to Grid StorageDPM and LFC Update Ricardo Rocha, Alejandro Alvarez ( on behalf of the LCGDM team ) EMI INFSO-RI-261611

  2. Main Goals Provide a lightweight, grid-aware storage solution Simplify life of users and administrators Improve the feature set and performance Use standard protocols Use standard building blocks Allow easy integration with new tools and systems 2

  3. Architecture (Reminder) NFS HTTP/DAV XROOT GRIDFTP DPM NS CLIENT HEAD NODE NFS HTTP/DAV XROOT DISK NODE GRIDFTP DISK NODE DISK NODE RFIO DISK NODE • Separation between metadata and data access • Direct data access to/from Disk Nodes • Strong authentication / authorization • Multiple access protocols 3

  4. Deployment & Usage • DPM is the most widely deployed grid storage system • Over 200 sites in 50 regions • Over 300 VOs • ~36 PB (10 sites with > 1PB) • LFC enjoys wide deployment too • 58 instances at 48 sites • Over 300 VOs 4

  5. Deployment & Usage • DPM is the most widely deployed grid storage system • Over 200 sites in 50 regions • Over 300 VOs • ~36 PB (10 sites with > 1PB) • LFC enjoys wide deployment too • 58 instances at 48 sites • Over 300 VOs 5

  6. Software Availability • DPM and LFC are available via gLite and EMI • But gLite releases stopped at 1.8.2 • From 1.8.3 on we’re Fedora compliant for all components, available in multiple repositories • EMI1, EMI2 and UMD • Fedora / EPEL • Latest production release is 1.8.4 • Some packages will never make it into Fedora • YAIM (Puppet? See later…) • Oracle backend 6

  7. Software Availability • DPM and LFC are available via gLite and EMI • But gLite releases stopped at 1.8.2 • From 1.8.3 on we’re Fedora compliant for all components, available in multiple repositories • EMI1, EMI2 and UMD • Fedora / EPEL • Latest production release is 1.8.4 • Some packages will never make it into Fedora • YAIM (Puppet? See later…) • Oracle backend 7

  8. System Evaluation

  9. System Evaluation • ~1.5 years ago we performed a full system evaluation • Using PerfSuite, out testing framework • https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Performance • ( results presented later are obtained using this framework too ) • It showed the system had serious bottlenecks • Performance • Code maintenance (and complexity) • Extensibility 9

  10. Dependency on NS/DPM daemons • All calls to the system had to go via the daemons • Not only user / client calls • Also the case for our frontends (HTTP/DAV, NFS, XROOT, …) • Daemons were a bottleneck, and did not scale well • Short term fix (available since 1.8.2) • Improve TCP listening queue settings to prevent timeouts • Increase number of threads in the daemon pools • Previously statically defined to a rather low value • Medium term (available since 1.8.4, with DMLite) • Refactor the daemon code into a library 10

  11. Dependency on NS/DPM daemons • All calls to the system had to go via the daemons • Not only user / client calls • Also valid for our new frontends (HTTP/DAV, NFS, XROOT, …) • Daemons were a bottleneck, and did not scale well • Short term fix (available since 1.8.2) • Improve TCP listening queue settings to prevent timeouts • Increase number of threads in the daemon pools • Previously statically defined • Medium term (available since 1.8.4, with DMLite) • Refactor the daemon code into a library 11

  12. GET asynchronous performance • DPM used to mandate asynchronous GET calls • Introduces significant client latency • Useful when some preparation of the replica is needed • But this wasn’t really our case (disk only) • Fix (available with 1.8.3) • Allow synchronous GET requests 12

  13. GET asynchronous performance • DPM used to mandate asynchronous GET calls • Introduces significant client latency • Useful when some preparation of the replica is needed • But this wasn’t really our case (disk only) • Fix • Allow synchronous GET requests 13

  14. Database Access • No DB connection pooling, no bind variables  • DB connections were linked to daemon pool threads • DB connections would be kept for the whole life of the client • Quicker fix • Add DB connection pooling to the old daemons • Good numbers, but needs extensive testing… • Medium term fix (available since 1.8.4 for HTTP/DAV) • DMLite, which includes connection pooling • Among many other things… 14

  15. Database Access • No DB connection pooling, no bind variables  • DB connections were linked to daemon pool threads • DB connections would be kept for the whole life of the client • Quicker fix • Add DB connection pooling to the old daemons • Good numbers, but needs extensive testing… • Medium term fix (available since 1.8.4 for HTTP/DAV) • DMLite, which includes connection pooling • Among many other things… 15

  16. Dependency on the SRM • SRM imposes significant latency for data access • It has its use cases, but is a killer for regular file access • For data access, only required for protocols not supporting redirection (file name to replica translation) • Fix (all available from 1.8.4) • Keep SRM for space management only (usage, reports, …) • Add support for protocols natively supporting redirection • HTTP/DAV, NFS 4.1/pNFS, XROOT • And promote them widely… • Investigating GridFTP redirection support (seems possible!) 16

  17. Other recent activities… • ATLAS LFC consolidation effort • There was an instance at each T1 center • They have all been merged into a single CERN one • Similar effort was done at BNL for the US sites • HTTP Based Federations • Lots of work in this area too… • But there will be a separate forum event on this topic • Puppet and Nagios for easy system administration • Available since 1.8.3 for DPM • Working on new manifests for DMLite based setups • And adding LFC manifests, to be tested at CERN 17

  18. Other recent activities… • ATLAS LFC consolidation effort • There was an instance at each T1 center • They have all been merged into a single CERN one • The same effort was done at BNL for the US sites • HTTP Based Federations • Lots of work in this area too… • But there will be a separate forum event on this topic • Puppet and Nagios for easy system administration • In production since 1.8.3 for DPM • Working on new manifests for DMLite based setups • And adding LFC manifests, to be tested at CERN 18

  19. Future Proof with DMLite

  20. Future Proof with DMLite • DMLite is our new plugin based library • Meets goals resulting from the system evaluation • Refactoring of the existing code • Single library used by all frontends • Extensible, open to external contributions • Easy integration of standard building blocks • Apache2, HDFS, S3, … https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite 20

  21. Future Proof with DMLite • DMLite is our new plugin based library • Meets goals resulting from the system evaluation • Refactoring of the existing code • Single library used by all frontends • Extensible, open to external contributions • Easy integration of standard building blocks • Apache, HDFS, S3, … https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite 21

  22. DMLite: Interfaces User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Plugins implement one or multiple interfaces • Depending on the functionality they provide • Plugins are stacked, and called LIFO • You can load multiple plugins for the same functionality • APIs in C/C++/Python, plugins in C++ (Python soon) 22

  23. DMLitePlugin: Legacy User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Interacts directly with the DPNS/DPM/LFC daemons • Simply redirects calls using the existing NS and DPM APIs • For full backward compatibility • Both for namespace and pool/filesystem management 23

  24. DMLitePlugin: MySQL User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Refactoring of the MySQL backend • Properly using bind variables and connection pooling • Huge performance improvements • Namespace traversal comes from Built-in Catalog • Proper stack setup provides fallback to Legacy Plugin 24

  25. DMLitePlugin: Oracle User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Refactoring of the Oracle backend • What applies to the MySQL one, applies here • Better performance with bind variables and pooling • Namespace traversal comes from Built-in Catalog • Proper stack setup provides fallback to Legacy Plugin 25

  26. DMLitePlugin: Memcache User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Memory cache for namespace requests • Reduced load on the database • Much improved response times • Horizontal scalability • Can be put over any other Catalog implementation 26

  27. DMLitePlugin: Memcache User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Memory cache for namespace requests • Reduced load on the database • Much improved response times • Horizontal scalability • Can be put over any other Catalog implementation 27

  28. DMLite Plugin: Hadoop/HDFS User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • First new pool type • HDFS pool can coexist with legacy pools, … • In the same namespace, transparent to frontends • All HDFS goodies for free (auto data replication, …) • Catalog interface coming soon • Exposing the HDFS namespace directly to the frontends 28

  29. DMLite Plugin: S3 User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Second new pool type • Again, can coexist with legacy pools, HDFS, … • In the same namespace, transparent to frontends • Main goal is to provide additional, temporary storage • High load periods, user analysis before big conferences, … • Evaluated against Amazon, now looking at Huawei and OpenStack 29

  30. DMLite Plugin: VFS User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Third new pool type (currently in development) • Exposes any mountable filesystem • As an additional pool in an existing namespace • Or directly exposing that namespace • Think Lustre, GPFS, … 30

  31. DMLite Plugins: Even more… • Librarian • Replica failover and retrial • Used by the HTTP/DAV frontend for a Global Access Service • Profiler • Boosted logging capabilities • For every single call, logs response times per plugin • HTTP based federations • ATLAS Distributed Data Management (DDM) • First external plugin • Currently under development • Will expose central catalogs via standard protocols • Writing plugins is very easy… 31

  32. DMLite Plugins: Development 32 http://dmlite.web.cern.ch/dmlite/trunk/doc/tutorial/index.html

  33. DMLite Plugins: Development 33 http://dmlite.web.cern.ch/dmlite/trunk/doc/tutorial/index.html

  34. DMLite: Demo Time

  35. Standard Frontends

  36. Standard Frontends • Standards based access to DPM is already available • HTTP/DAV and NFS4.1 / pNFS • But also XROOT (useful in the HEP context) • Lots of recent work to make them performant • Many details were already presented • But we needed numbers to show they are a viable alternative • We now have those numbers! 36

  37. Frontends: HTTP / DAV https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/WebDAV • Frontend based on Apache2 + mod_dav • In production since 1.8.3 • Working with PES on deployment in front of the CERN LFC too • Can be for both get/put style (=GridFTP) or direct access • Some extras for full GridFTP equivalence • Multiple streams with Range/Content-Range • Third party copies using WebDAV COPY + Gridsite Delegation • Random I/O • Possible to do vector reads and other optimizations • Metalink support (failover, retrial) • With 1.8.4 it is already DMLite based 37

  38. Frontends: HTTP / DAV https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/WebDAV • Frontend based on Apache2 + mod_dav • In production since 1.8.3 • Working with PES on deployment in front of the CERN LFC too • Can be for both get/put style (=GridFTP) or direct access • Some extras for full GridFTP equivalence • Multiple streams with Range/Content-Range • Third party copies using WebDAV COPY + Gridsite Delegation • Random I/O • Possible to do vector reads and other optimizations • Metalink support (failover, retrial) • With 1.8.4 it is already DMLite based 38

  39. Frontends: NFS 4.1 / pNFS https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/NFS41 • Direct access to the data, with a standard NFS client • Available with DPM 1.8.3 (read only) • Write enabled early next year • Not yet based on DMLite • Implemented as a plugin to the Ganesha server • Only kerberos authentication for now • Issue with client X509 support in Linux (server ready though) • We’re investigating how to add this • DMLite based version in development 39

  40. Frontends: NFS 4.1 / pNFS https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/NFS41 • Direct access to the data, with a standard NFS client • Available with DPM 1.8.3 (read only) • Read / write early next year • Not yet based on DMLite • Implementation based on the Ganesha server • Only kerberos authentication for now • Issue with client X509 support • We’re investigating how to add this • DMLite based version in development 40

  41. Frontends: XROOTD https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Xroot • Not really a standard, but widely used in HEP • Initial implementation in 2006 • No multi-vo support, limited authz, performance issues • New version 3.1 (rewrite) available with 1.8.4 • Multi VO support • Federation aware (already in use in ATLAS FAX federation) • Strong auth/authz with X509, but ALICE token still available • Based on the standard XROOTD server • Plugins for XrdOss, XrdCmsClient and XrdAccAuthorize • Soon also based on DMLite (version 3.2) 41

  42. Frontends: Random I/O Performance • HTTP/DAV vs XROOTD vs RFIO • Soon adding NFS 4.1 / pNFS to the comparison 42

  43. Frontends: Random I/O Performance 43 • HTTP/DAV vs XROOTD vs RFIO • Soon adding NFS 4.1 / pNFS to the comparison

  44. Frontends: Random I/O Performance 44 • HTTP/DAV vs XROOTD vs RFIO • Soon adding NFS 4.1 / pNFS to the comparison

  45. Frontends: Hammercloud • We’ve also run a set of Hammercloud tests • Using ATLAS analysis jobs • These are only a few of all the metrics we have 45

  46. Performance: Showdown • Big thanks to ShuTing and ASGC • For doing a lot of the testing and providing the infrastructure • First recommendation is to phase down RFIO • No more development effort on it from our side • HTTP vs XROOTD • Performance is equivalent, up to sites/users to decide • But we like standards… there’s a lot to gain with them • Staging vs Direct Access • Staging not ideal… requires lots of extra space on the WN • Direct Access is performant if used with ROOT TTreeCache 46

  47. Summary & Outlook • DPM and LFC are in very good shape • Even more lightweight, much easier code maintenance • Open, extensible to new technologies and contributions • DMLite is our new core library • Standards, standards, … • Protocols and building blocks • Deployment and monitoring • Reduced maintenance, free clients, community help • DPM Community Workshops • Paris 03-04 December 2012 • Taiwan 17 March 2013 (ISGC workshop) • A DPM Collaboration is being setup 47

More Related