1 / 33

DPM Status & Roadmap

DPM Status & Roadmap. Ricardo Rocha ( on behalf of the DPM team ). DPM Overview. HEAD NODE. DPNS. DPM. SRM. HTTP. NFS. FILE METADATA OPS. RFIO HTTP NFS XROOT. CLIENT. FILE ACCESS OPS. GRIDFTP. RFIO. HTTP. NFS. XROOT. DISK NODE(s). DPM Core. 1.8.2, Testing, Roadmap.

miya
Download Presentation

DPM Status & Roadmap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DPM Status & Roadmap Ricardo Rocha ( on behalf of the DPM team ) EMI INFSO-RI-261611

  2. DPM Overview HEAD NODE DPNS DPM SRM HTTP NFS FILE METADATA OPS RFIO HTTP NFS XROOT CLIENT FILE ACCESS OPS GRIDFTP RFIO HTTP NFS XROOT DISK NODE(s)

  3. DPM Core 1.8.2, Testing, Roadmap

  4. DPM 1.8.2 – Highlights • Improved scalability of all frontend daemons • Especially with many concurrent clients • By having a configurable number of threads • Fast/Slow in case of the dpm daemon • Faster DPM drain • Disk server retirement, replacement, … • Better balancing of data among disk nodes • By assigning different weights to each filesystem • Log to syslog • GLUE2 support

  5. DPM Core – Testing Activity • Improved validation & testing • Collaboration with ASGC for this purpose (thanks!) • Hammercloud tests running regularly • They started with a 400 core setup, we looked at the issues, now moving to 1000 cores to increase load • Example run • http://hammercloud.cern.ch/atlas/10006472/test/ • To be used extensively for stress testing • Covering all components: DPM, RFIO, GRIDFTP, NFS, HTTP, … • Results will benefit other sites too

  6. DPM Core – Testing HC using RFIO Thanks to ShuTing for the plots ( preliminary results ) HC using GridFTP Example GridFTPvs RFIO

  7. DPM Core - Testing • Big contribution from openlab student • Martin Hellmich, University of Edinburgh • Detailed analysis of DPM internals • Detecting bottlenecks in specific transfer / access phases Example… but we have a lot more results which we are now investigating

  8. DPM Core – Roadmap • Package consolidation: EPEL compliance • Fixes in multi-threaded clients • Replace httpg with https on the SRM • Improve dpm-replicate (dirs and FSs) • GUIDs in DPM • Synchronous GET requests • Reports on usage information • Quotas • Accounting metrics • HOT file replication 1.8.3 1.8.4 1.8.5

  9. Beta Components HTTP/DAV, NFS, Nagios, Puppet, Perfsuite, Catalog Sync, Contrib Tools

  10. Beta Components: Overview • Faster releases • Monthly releases since June • Separate yum repository • Already in use by several sites • Including sites in the UK https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Components

  11. Beta Components: PerfSuite Overview

  12. Performance Suite • Set of tools to easily trigger bunches of tests • With different configurations • Common wrapper, many tests • Existing suites • POSIX Transfers: RFIO, NFS • GET/PUT Transfers: HTTP, GSIFTP • ROOT • More coming… • Used for most results presented later https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Performance#Perfsuite

  13. Performance Suite • Set of tools to easily trigger test bunches • With different configurations • Common wrapper, many tests • Existing suites • POSIX Transfers: RFIO, NFS • GET/PUT Transfers: HTTP, GSIFTP • ROOT • More coming… • Used for most results presented later https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Performance#Perfsuite Sample Configuration test_rfcp(c:5,s:{1M 2M 4M 8M 16M 32M 64M 128M 256M 512M 1G})x3 test_nfs(m:/mnt/nfs41,c:5,s:{1M 2M 4M 8M 16M 32M 64M 128M 256M 512M 1G})x3

  14. Beta Components: HTTP / DAV Overview, Performance, Roadmap

  15. HTTP / DAV: Overview https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/WebDAV GET LFC 1 REDIRECT GET / PUT 2 CLIENT DPM HEAD REDIRECT GET / PUT DPM DISK 3 DATA

  16. HTTP / DAV: Overview GET LFC 1 REDIRECT GET / PUT 2 CLIENT DPM HEAD REDIRECT GET / PUT DPM DISK 3 DATA

  17. HTTP: Client Support • Recommendation: browser/curl for GET, curl for PUT • Chrome Issue 9056 submitted for proxy support

  18. DAV: Client Support • Updated analysis based on initial one from dCache • Recommendation: Cadaver for *nix, Windows explorer

  19. HTTP vsGridFTP: Multiple streams • Not explicit in the HTTP protocol • But needed for even higher performance • Especially in the WAN • So we added it, with some semantics • Small wrapper around libcurl • PUT with ‘0 bytes’ && null content-range == end of write • Submitted patch to libcurl to allow ssl session reuse among parallel requests

  20. HTTP vsGridFTP: 3rd Party Copies • Implemented using WEBDAV COPY • Requires proxy certificate delegation • Using gridsite delegation, with a small wrapper client • Requires some common semantics to copy between SEs (to be agreed) • Common delegation portType location and port • No prefix in the URL ( just http://<server>/<sfn> )

  21. HTTP vsGridFTP: 3rd Party Copies Example of FTS usage

  22. HTTP / DAV: Performance Ongoing Evaluation • Xeon 4 Cores 2.27GHz • 12 GB RAM • 1 Gbit/s links • No difference detected in LAN with different number of streams • But early results do show a big difference on the WAN • lcg-cp configured to use gridftp • File registration & transfer times considered in both cases

  23. HTTP / DAV: Issues & Roadmap • Towards a first production release • Testing with large number of concurrent clients • Finish up the WAN performance tests • And after that • Further testing of 3rd party copy with larger files • Finish validation against other implementations • Validate usage via ROOT • Improved GET on the LFC • PUT support on the LFC (?)

  24. Beta Components: NFS 4.1 / pNFS Overview, Performance, Roadmap

  25. NFS 4.1/pNFS: Why? • Industry standard (IBM, NetApp, EMC, …) • Free clients (with free caching) • Strong security (GSSAPI) • Parallel data access • Easier maintenance • … • But you know all this by now…

  26. NFS 4.1/pNFS: Overview OPEN https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/NFS41 1 LAYOUTGET 2 METADATA SERVER GETDEVICEINFO 3 CLOSE 7 CLIENT OPEN 4 DISK SERVER(s) READ / WRITE 5 CLOSE 6

  27. NFS4.1 / pNFS: Client • pNFS support in linux kernel from >= 2.6.38 • nfs-utils >= 1.2.3 • Latest Fedora and Debian Sid have it • We provide packages for EL5 • Enabled pNFS in the elrepo mainline kernel • nfs-utils and AFS module we package ourselves

  28. NFS4.1 / pNFS: Performance Ongoing Evaluation • IOZONE Results • Server • Xeon 4 Cores 2.27GHz • 12 GB RAM • 1 Gbit/s links • Client • Dual core • 2 GB RAM • 100 Mbit/s link

  29. NFS4.1 / pNFS: Performance Ongoing Evaluation • NFS vs RFIO • Server • Xeon 4 Cores 2.27GHz • 12 GB RAM • 1 Gbit/s links • Client • Dual core • 2 GB RAM • 100 Mbit/s link • 8 KB block sizes RFIO read misbehaving in this test… investigating

  30. NFS4.1 / pNFS: Issues & Roadmap • Towards a first production release • Tests with a faster network link • Testing with a larger number of concurrent clients • WAN testing • Enable bigger block sizes • And after that • X509 certificate support • Still not figured out… needs a strong focus • Further validation with other implementations

  31. Beta Components: Even more… Puppet, Nagios, Contrib, Catalog Sync

  32. Even more components… • Catalog Synchronization • Check Fabrizio’s talk next Monday (EGI Forum Lyon) • DPM Admin contrib package • Contribution from GridPP • Now package and distributed with the DPM components • http://www.gridpp.ac.uk/wiki/DPM-admin-tools • Nagios monitoring plugins for DPM • Available now • https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Monitoring • Puppet templates • Available now in beta • https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Puppet

  33. Conclusion • 1.8.2 fixes many scalability and performance issues • But we continue testing and improving • Popular requests coming in next versions • Accounting, quotas, easier replication • Beta components getting to production state • Standards compliant data access • Simplified setup, configuration, maintenance • Metadata consistency and synchronization • And much more extensive testing • Performance test suites, regular large scale tests

More Related