470 likes | 597 Views
Standard Interfaces to Grid Storage DPM and LFC Update. Ricardo Rocha, Alejandro Alvarez ( on behalf of the LCGDM team ). EMI INFSO-RI-261611. Main Goals. Provide a lightweight, grid-aware storage solution Simplify life of users and administrators Improve the feature set and performance
E N D
Standard Interfaces to Grid StorageDPM and LFC Update Ricardo Rocha, Alejandro Alvarez ( on behalf of the LCGDM team ) EMI INFSO-RI-261611
Main Goals Provide a lightweight, grid-aware storage solution Simplify life of users and administrators Improve the feature set and performance Use standard protocols Use standard building blocks Allow easy integration with new tools and systems 2
Architecture (Reminder) NFS HTTP/DAV XROOT GRIDFTP DPM NS CLIENT HEAD NODE NFS HTTP/DAV XROOT DISK NODE GRIDFTP DISK NODE DISK NODE RFIO DISK NODE • Separation between metadata and data access • Direct data access to/from Disk Nodes • Strong authentication / authorization • Multiple access protocols 3
Deployment & Usage • DPM is the most widely deployed grid storage system • Over 200 sites in 50 regions • Over 300 VOs • ~36 PB (10 sites with > 1PB) • LFC enjoys wide deployment too • 58 instances at 48 sites • Over 300 VOs 4
Deployment & Usage • DPM is the most widely deployed grid storage system • Over 200 sites in 50 regions • Over 300 VOs • ~36 PB (10 sites with > 1PB) • LFC enjoys wide deployment too • 58 instances at 48 sites • Over 300 VOs 5
Software Availability • DPM and LFC are available via gLite and EMI • But gLite releases stopped at 1.8.2 • From 1.8.3 on we’re Fedora compliant for all components, available in multiple repositories • EMI1, EMI2 and UMD • Fedora / EPEL • Latest production release is 1.8.4 • Some packages will never make it into Fedora • YAIM (Puppet? See later…) • Oracle backend 6
Software Availability • DPM and LFC are available via gLite and EMI • But gLite releases stopped at 1.8.2 • From 1.8.3 on we’re Fedora compliant for all components, available in multiple repositories • EMI1, EMI2 and UMD • Fedora / EPEL • Latest production release is 1.8.4 • Some packages will never make it into Fedora • YAIM (Puppet? See later…) • Oracle backend 7
System Evaluation • ~1.5 years ago we performed a full system evaluation • Using PerfSuite, out testing framework • https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Performance • ( results presented later are obtained using this framework too ) • It showed the system had serious bottlenecks • Performance • Code maintenance (and complexity) • Extensibility 9
Dependency on NS/DPM daemons • All calls to the system had to go via the daemons • Not only user / client calls • Also the case for our frontends (HTTP/DAV, NFS, XROOT, …) • Daemons were a bottleneck, and did not scale well • Short term fix (available since 1.8.2) • Improve TCP listening queue settings to prevent timeouts • Increase number of threads in the daemon pools • Previously statically defined to a rather low value • Medium term (available since 1.8.4, with DMLite) • Refactor the daemon code into a library 10
Dependency on NS/DPM daemons • All calls to the system had to go via the daemons • Not only user / client calls • Also valid for our new frontends (HTTP/DAV, NFS, XROOT, …) • Daemons were a bottleneck, and did not scale well • Short term fix (available since 1.8.2) • Improve TCP listening queue settings to prevent timeouts • Increase number of threads in the daemon pools • Previously statically defined • Medium term (available since 1.8.4, with DMLite) • Refactor the daemon code into a library 11
GET asynchronous performance • DPM used to mandate asynchronous GET calls • Introduces significant client latency • Useful when some preparation of the replica is needed • But this wasn’t really our case (disk only) • Fix (available with 1.8.3) • Allow synchronous GET requests 12
GET asynchronous performance • DPM used to mandate asynchronous GET calls • Introduces significant client latency • Useful when some preparation of the replica is needed • But this wasn’t really our case (disk only) • Fix • Allow synchronous GET requests 13
Database Access • No DB connection pooling, no bind variables • DB connections were linked to daemon pool threads • DB connections would be kept for the whole life of the client • Quicker fix • Add DB connection pooling to the old daemons • Good numbers, but needs extensive testing… • Medium term fix (available since 1.8.4 for HTTP/DAV) • DMLite, which includes connection pooling • Among many other things… 14
Database Access • No DB connection pooling, no bind variables • DB connections were linked to daemon pool threads • DB connections would be kept for the whole life of the client • Quicker fix • Add DB connection pooling to the old daemons • Good numbers, but needs extensive testing… • Medium term fix (available since 1.8.4 for HTTP/DAV) • DMLite, which includes connection pooling • Among many other things… 15
Dependency on the SRM • SRM imposes significant latency for data access • It has its use cases, but is a killer for regular file access • For data access, only required for protocols not supporting redirection (file name to replica translation) • Fix (all available from 1.8.4) • Keep SRM for space management only (usage, reports, …) • Add support for protocols natively supporting redirection • HTTP/DAV, NFS 4.1/pNFS, XROOT • And promote them widely… • Investigating GridFTP redirection support (seems possible!) 16
Other recent activities… • ATLAS LFC consolidation effort • There was an instance at each T1 center • They have all been merged into a single CERN one • Similar effort was done at BNL for the US sites • HTTP Based Federations • Lots of work in this area too… • But there will be a separate forum event on this topic • Puppet and Nagios for easy system administration • Available since 1.8.3 for DPM • Working on new manifests for DMLite based setups • And adding LFC manifests, to be tested at CERN 17
Other recent activities… • ATLAS LFC consolidation effort • There was an instance at each T1 center • They have all been merged into a single CERN one • The same effort was done at BNL for the US sites • HTTP Based Federations • Lots of work in this area too… • But there will be a separate forum event on this topic • Puppet and Nagios for easy system administration • In production since 1.8.3 for DPM • Working on new manifests for DMLite based setups • And adding LFC manifests, to be tested at CERN 18
Future Proof with DMLite • DMLite is our new plugin based library • Meets goals resulting from the system evaluation • Refactoring of the existing code • Single library used by all frontends • Extensible, open to external contributions • Easy integration of standard building blocks • Apache2, HDFS, S3, … https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite 20
Future Proof with DMLite • DMLite is our new plugin based library • Meets goals resulting from the system evaluation • Refactoring of the existing code • Single library used by all frontends • Extensible, open to external contributions • Easy integration of standard building blocks • Apache, HDFS, S3, … https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite 21
DMLite: Interfaces User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Plugins implement one or multiple interfaces • Depending on the functionality they provide • Plugins are stacked, and called LIFO • You can load multiple plugins for the same functionality • APIs in C/C++/Python, plugins in C++ (Python soon) 22
DMLitePlugin: Legacy User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Interacts directly with the DPNS/DPM/LFC daemons • Simply redirects calls using the existing NS and DPM APIs • For full backward compatibility • Both for namespace and pool/filesystem management 23
DMLitePlugin: MySQL User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Refactoring of the MySQL backend • Properly using bind variables and connection pooling • Huge performance improvements • Namespace traversal comes from Built-in Catalog • Proper stack setup provides fallback to Legacy Plugin 24
DMLitePlugin: Oracle User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Refactoring of the Oracle backend • What applies to the MySQL one, applies here • Better performance with bind variables and pooling • Namespace traversal comes from Built-in Catalog • Proper stack setup provides fallback to Legacy Plugin 25
DMLitePlugin: Memcache User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Memory cache for namespace requests • Reduced load on the database • Much improved response times • Horizontal scalability • Can be put over any other Catalog implementation 26
DMLitePlugin: Memcache User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Memory cache for namespace requests • Reduced load on the database • Much improved response times • Horizontal scalability • Can be put over any other Catalog implementation 27
DMLite Plugin: Hadoop/HDFS User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • First new pool type • HDFS pool can coexist with legacy pools, … • In the same namespace, transparent to frontends • All HDFS goodies for free (auto data replication, …) • Catalog interface coming soon • Exposing the HDFS namespace directly to the frontends 28
DMLite Plugin: S3 User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Second new pool type • Again, can coexist with legacy pools, HDFS, … • In the same namespace, transparent to frontends • Main goal is to provide additional, temporary storage • High load periods, user analysis before big conferences, … • Evaluated against Amazon, now looking at Huawei and OpenStack 29
DMLite Plugin: VFS User domain Namespace domain Pool domain I/O domain UserGroupDb PoolManager IODriver Catalog INode PoolDriver IOHandler PoolHandler • Third new pool type (currently in development) • Exposes any mountable filesystem • As an additional pool in an existing namespace • Or directly exposing that namespace • Think Lustre, GPFS, … 30
DMLite Plugins: Even more… • Librarian • Replica failover and retrial • Used by the HTTP/DAV frontend for a Global Access Service • Profiler • Boosted logging capabilities • For every single call, logs response times per plugin • HTTP based federations • ATLAS Distributed Data Management (DDM) • First external plugin • Currently under development • Will expose central catalogs via standard protocols • Writing plugins is very easy… 31
DMLite Plugins: Development 32 http://dmlite.web.cern.ch/dmlite/trunk/doc/tutorial/index.html
DMLite Plugins: Development 33 http://dmlite.web.cern.ch/dmlite/trunk/doc/tutorial/index.html
Standard Frontends • Standards based access to DPM is already available • HTTP/DAV and NFS4.1 / pNFS • But also XROOT (useful in the HEP context) • Lots of recent work to make them performant • Many details were already presented • But we needed numbers to show they are a viable alternative • We now have those numbers! 36
Frontends: HTTP / DAV https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/WebDAV • Frontend based on Apache2 + mod_dav • In production since 1.8.3 • Working with PES on deployment in front of the CERN LFC too • Can be for both get/put style (=GridFTP) or direct access • Some extras for full GridFTP equivalence • Multiple streams with Range/Content-Range • Third party copies using WebDAV COPY + Gridsite Delegation • Random I/O • Possible to do vector reads and other optimizations • Metalink support (failover, retrial) • With 1.8.4 it is already DMLite based 37
Frontends: HTTP / DAV https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/WebDAV • Frontend based on Apache2 + mod_dav • In production since 1.8.3 • Working with PES on deployment in front of the CERN LFC too • Can be for both get/put style (=GridFTP) or direct access • Some extras for full GridFTP equivalence • Multiple streams with Range/Content-Range • Third party copies using WebDAV COPY + Gridsite Delegation • Random I/O • Possible to do vector reads and other optimizations • Metalink support (failover, retrial) • With 1.8.4 it is already DMLite based 38
Frontends: NFS 4.1 / pNFS https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/NFS41 • Direct access to the data, with a standard NFS client • Available with DPM 1.8.3 (read only) • Write enabled early next year • Not yet based on DMLite • Implemented as a plugin to the Ganesha server • Only kerberos authentication for now • Issue with client X509 support in Linux (server ready though) • We’re investigating how to add this • DMLite based version in development 39
Frontends: NFS 4.1 / pNFS https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/NFS41 • Direct access to the data, with a standard NFS client • Available with DPM 1.8.3 (read only) • Read / write early next year • Not yet based on DMLite • Implementation based on the Ganesha server • Only kerberos authentication for now • Issue with client X509 support • We’re investigating how to add this • DMLite based version in development 40
Frontends: XROOTD https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Xroot • Not really a standard, but widely used in HEP • Initial implementation in 2006 • No multi-vo support, limited authz, performance issues • New version 3.1 (rewrite) available with 1.8.4 • Multi VO support • Federation aware (already in use in ATLAS FAX federation) • Strong auth/authz with X509, but ALICE token still available • Based on the standard XROOTD server • Plugins for XrdOss, XrdCmsClient and XrdAccAuthorize • Soon also based on DMLite (version 3.2) 41
Frontends: Random I/O Performance • HTTP/DAV vs XROOTD vs RFIO • Soon adding NFS 4.1 / pNFS to the comparison 42
Frontends: Random I/O Performance 43 • HTTP/DAV vs XROOTD vs RFIO • Soon adding NFS 4.1 / pNFS to the comparison
Frontends: Random I/O Performance 44 • HTTP/DAV vs XROOTD vs RFIO • Soon adding NFS 4.1 / pNFS to the comparison
Frontends: Hammercloud • We’ve also run a set of Hammercloud tests • Using ATLAS analysis jobs • These are only a few of all the metrics we have 45
Performance: Showdown • Big thanks to ShuTing and ASGC • For doing a lot of the testing and providing the infrastructure • First recommendation is to phase down RFIO • No more development effort on it from our side • HTTP vs XROOTD • Performance is equivalent, up to sites/users to decide • But we like standards… there’s a lot to gain with them • Staging vs Direct Access • Staging not ideal… requires lots of extra space on the WN • Direct Access is performant if used with ROOT TTreeCache 46
Summary & Outlook • DPM and LFC are in very good shape • Even more lightweight, much easier code maintenance • Open, extensible to new technologies and contributions • DMLite is our new core library • Standards, standards, … • Protocols and building blocks • Deployment and monitoring • Reduced maintenance, free clients, community help • DPM Community Workshops • Paris 03-04 December 2012 • Taiwan 17 March 2013 (ISGC workshop) • A DPM Collaboration is being setup 47