1 / 44

Efficient Access to Many Small Files in a Grid Filesystem

Efficient Access to Many Small Files in a Grid Filesystem. Douglas Thain and Christopher Moretti University of Notre Dame. Efficient Access to Many Small (and Big) Files in a Grid Filesystem. Douglas Thain and Christopher Moretti University of Notre Dame. Abstract.

LionelDale
Download Presentation

Efficient Access to Many Small Files in a Grid Filesystem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Access toMany Small Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame

  2. Efficient Access to ManySmall (and Big) Files in a Grid Filesystem Douglas Thain and Christopher Moretti University of Notre Dame

  3. Abstract • Many grid data tools focus on transferring, storing, and managing large (GB-TB) files. • But, many users need to manage, transfer, and process lots (1000s) of small (KB-MB) files. • We describe protocols and interfaces for manipulating many small files over wide area networks. (Doesn’t hurt large files, either.) • Implemented in the Chirp file system. • Performance: • Best case: order of magnitude improvement. • Worst case: no slower than before.

  4. The Small File Problem

  5. Who has lots of small files? • Anyone using a batch system. • One file for submit, input, output, error, log... • Anyone using a large software package. • Executables, libraries, config files... • Anyone using a filesystem like a database. • Genomics, astronomy, physics... • Anyone who likes to write shell scripts. • foreach host in list ssh $host > $host.output

  6. Why is this a problem? • Users do the “sensible” thing: • foreach file in (list) do transfer done • The “sensible” thing performs miserably: • New TCP Connection • SSL Authentication • Configuration Operations • Slow Start Again • Result is KB/s on a GB/s link.

  7. Why not just use tar? • If you can, you should! • Sometimes you cannot: • The system semantics demand multiple files. • Packing and unpacking can be very slow. • Not enough disk space to unpack. • Different apps select different data subsets. • Using an existing script or program. • Users don’t know or care that it’s a dist system, why should they change?

  8. The Challenge:How to design interfacesso that users get the expectedperformance and behavior?

  9. Chirp and Parrot:A Grid Filesystem

  10. Requirements for a Grid Filesystem • Transparent access to files in the same manner as a local Unix filesystem. • Non privileged deployment at both client and server. (root not possible on the grid.) • User control over policies for naming, caching, consistency, and fault tolerance. • Flexible access controls for sharing. • Good performance on both small and large files.

  11. Chirp/Parrot – A Grid Filesystem Ordinary Unix Program Authentication: Kerberos / Globus / Hostname / Unix No Privs Needed! Automatic Recovery unix system calls ptrace trap Single TCP Stream Chirp Parrot Protocol: open / pread / pwrite / close stat / mkdir / rmdir / unlink getfile / putfile / movefile No Privs Needed! Ordinary Unix Filesystem Authorization: kerberos:joe@nd.edu RWLDA globus:/O=ND/CN=Joe RWLDA hostname:*.nd.edu RL group:server.nd.edu/team RWL

  12. Ordinary Unix Commands > parrot tcsh > ls /chirp alpha.nd.edu beta.nd.edu ... > cd /chirp/alpha.nd.edu/mydir > cp /tmp/bigdata . > emacs mydata.txt

  13. Parrot Specific Commands > parrot tcsh > parrot_whoami globus:/O=ND/CN=Joe > parrot_getacl /chirp/alpha.nd.edu/ kerberos:joe@nd.edu RWLDA globus:/O=ND/CN=Joe RWL hostname:*.nd.edu RL

  14. App App Parrot Parrot App App App App App Parrot Parrot Parrot Parrot Parrot App Cert Parrot Chirp as Remote Filesystem Grid Site A Grid Site B Secured by GSI Chirp Server Grid Middleware Unix Filesystem

  15. App App Parrot Parrot App App App App App Parrot Parrot Parrot Parrot Parrot aux db dir server Chirp as Cluster Filesystem Grid Site A Grid Site B Chirp Server Chirp Server Chirp Server Chirp Server Unix Filesystem Unix Filesystem Unix Filesystem Unix Filesystem

  16. http://www.cse.nd.edu/~ccl/viz

  17. Sample Applications • Image Processing for Biometrics • Moretti et al, PCGRID 2007 • Bioinformatics on EGEE • Blanchet et al, Grid 2006 • High Energy Physics on LCG • Sfiligoi et al, CHEP 2005, • Molecular Dynamics Repository • Wozniak et al, HPDC 2005 • Remote DB Access on EDG • Klous et al, CCPE 2005

  18. Protocols for Small Files

  19. What About FTP? • FTP is a great data transfer system, but it was never designed to be a file system: • New TCP stream per data transfer. • New TCP stream for each directory list. • Lots of connections can overwhelm net devices. • Coarse errors: 550 for all file system errors. • Semantic problems: e.g. empty directory. • Unix access controls, (But, see SecPAL) • Wildly varying implementations and support.

  20. FTP Protocol Reminder Control Connection AUTH GSSAPI MIC MIC PORT RETR FTP Client FTP Server Data Connection Minimum of four round trips (plus auth overhead) to fetch a file + loss of TCP window. AUTH GSSAPI MIC MIC Data Transfer Common practice is new control connection for every data transfer!

  21. What About NFS? • NFS was designed for a local area network among (relatively) trusted hosts. • Fine-grained file access very slow on WAN. • Kernel support and root assistance needed to start server, mount client, change target. • Unix UID for ownership, access control. • Need to bind to privileged port, often filtered. • Use of “file handles” to refer to files makes it very difficult to build a user-level server. + lots of lookup operations over the WAN.

  22. NFS Protocol Reminder lookup(00,a) lookup(10,b) lookup(20,c) ... NFS Client NFS Server read 4KB read 4KB read 4KB ... On a WAN, throughput limited to 4KB/latency. 10ms = 400 KB/s 100ms = 40 KB/s

  23. Chirp Hybrid Protocol Overview auth globus (8 RTT) open read write close ... getfile(“mydata”) putfile(“otherdata”,size) Chirp Client Chirp Server size and data data

  24. Protocol Comparison • FTP - Stream per File • Latency = 4+ RTT for each file • Throughput = TCP limit after slow start • NFS – Remote Procedure Call • Latency = 1 RTT for each file • Throughput = block size / latency • Chirp - Hybrid • Latency = 1 RTT for each file • Throughput = TCP limit in steady state

  25. Local Area Performance

  26. Wide Area Performance

  27. Real WAN Performance

  28. Interfaces for Small Files

  29. Standard Unix Copy cp /tmp/source /chirp/B/target cp open(source) open(target) loop: read/write Parrot read open write open(source) Local Chirp open(source) read open write Chirp Server Local Disk

  30. Problem:The system does not know the context of the operation!Solution:Introduce a higher-level operationcopyfile that exploits the context.

  31. copyfile(source,target) open(source) putfile(target) open(source) putfile(target) Improved Copy with Copyfile cp /tmp/source /chirp/B/target new cp Parrot Local Chirp Chirp Server Local Disk

  32. Is it reasonable to modify cp? • Installation: • Cannot modify /bin/cp. • Install new parrot_cp • Alias cp or link named “cp” in PATH. • Backwards compatibility: • parrot_cp without Parrot falls back to normal. • Ordinary cp on Parrot behaves as before. • Parrot_cp on a different filesystem falls back.

  33. copyfile(source,target) thirdput(source,B,target) thirdput(source,B,target) putfile(target) Improved Copy with Copyfile cp /chirp/A/source /chirp/B/target new cp Parrot Chirp Chirp Server A Chirp Server B

  34. thirdput(/mydir/X,B,/mydir/X) mkdir(mydir) thirdput(/mydir/X,B,/mydir/Y) setacl(mydir) thirdput(/mydir/X,B,/mydir/Z) mydir ACL X Y Z Directory Copy cp –r /chirp/A/mydir /chirp/B/mydir cp Parrot Chirp Server A Chirp Server B mydir ACL X Y Z

  35. thirdput(/mydir,B,/mydir) mydir ACL X Y Z Improved Directory Copy cp –r /chirp/A/mydir /chirp/B/mydir cp Parrot mkdir putfile*3 setacl Chirp Server A Chirp Server B mydir ACL X Y Z

  36. Third Party Performance

  37. You get the idea... ls –la D • Original: getdir D + N*stat • Improved: getlongdir D • rm –rf D • Original: getdir D + N*unlink (recursive) • Improved: rmall D • md5sum F • Original: open F + N*read + close • Improved: md5 F

  38. Final Example ls –la /chirp/alpha/data md5sum /chirp/alpha/data/* cp -r /chirp/alpha/data /chirp/beta/data md5sum /chirp/beta/data/* rm –rf /chirp/alpha/data

  39. ls -la md5 cp rm cp md5 Original Implementation app parrot chirp server A chirp server B

  40. ls -la md5 cp rm md5 Improved Implementation app parrot chirp server A chirp server B

  41. Performance on Script

  42. The Challenge:How to design interfacesso that users get the expectedperformance and behavior?

  43. Summary • Good small file performance requires attention to low level network protocols. • getfile, putfile, thirdput, rmall, checksum • Exploiting protocols requires minor changes to the Unix I/O interface. • copyfile, rmall, checksum, others? • Easy to apply those changes in a user transparent way. • cp, rm, md5sum all operate as normal • Usable performance in a wide-area FS.

  44. For more information... • Douglas Thain • dthain@nd.edu • Chris Moretti • cmoretti@nd.edu • Parrot and Chirp • http://www.cctools.org

More Related