1 / 12

Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix

SDM CENTER. Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas. Active Storage in Parallel Filesystems. Active Storage exploits the old concept of moving computing to the data source

lamont
Download Presentation

Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SDM CENTER Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas

  2. Active Storage in Parallel Filesystems • Active Storage exploits the old concept of moving computing to the data source • Avoids data movement across the network in parallel machine by allowing applications use compute resources on the I/O nodes of the cluster for data processing Traditional Approach Active Storage Y=foo(X) Y=foo(X) Y P P FS x FS P P Network Network P P FS FS P P compute nodes I/O nodes compute nodes I/O nodes 2

  3. Example • BLAS DSCAL on disk Y = α.Y • Experiment • Traditional:The input file is read from filesystem, and the output file is written to the same file system. The input file has 120,586,240 doubles. • Active Storage:Each server receives the factor, reads the array of doubles from its disk locally, and stores the resulting array on the same disk. Each server processes 120,586,240/N doubles, where N is the number of servers • Speedup contributed to avoiding data movement between client and servers 3

  4. Related Work Y=foo(X) • Active Disk/Storage concept was introduced a decade ago to use Processing resources ‘Near’ the disk • On the Disk Controller. • On Processors connected to disks. • Reduce network bandwidth/latency limitations. • References • DiskOS Stream Based model (ASPLOS’98: Acharya, Uysal, Saltz) • Active Storage For Large-Scale Data Mining and Multimedia (VLDB ’98: Riedel, Gibson, Faloutsos) • Research proved Active Disk idea interesting, but • Difficult to take advantage of in practice • Processors in disk controllers not designed for the purpose • Vendors have not been providing SDK 4

  5. MDS MDS MDS MDS Lustre Architecture O(10000) Client NAL Directory Metadata & concurrency File IO & Locking OST OST OST OST OST OST O(10) OST OST OST OST Recovery, File Status, File Creation OST OST OST OST O(1000) OST OST OST OST 5

  6. User Space Application LLITE LOV OSC NAL Lustre Client • Application IO requests • LLITE module implements Linux VFS layer • LOV stripes object and targets IO to correct Object Client • OSC packages up request for transmission over the NAL 6

  7. NAL OST OBDfilter ext3 Lustre Object Storage Server • Requests arrive from Portals NAL • Object Storage Target directs Request to appropriate lower level OBD • OBDfilter presents ext3 as Object Based Disk 7

  8. Current Implementation of Active Storage User Space Processing Component NAL • Extra Module passes data, until told to pipe data elsewhere • Data is sent to user space process through Unix Character Device File. • Processed Data is written back to disk • Pattern: 1W->2W OST ASDEV ASOBD OBDfilter ext3 8

  9. Lustre OST Lustre OST Lustre OST Lustre MDS Lustre OSS 0 Lustre OSS 38 Lustre OSS 39 SC’2004 StorCloud Most Innovative Use Award Gigabit Network Client System • 320 TB Lustre • 984 400GB disks • 40 Lustre OSS's running Active Storage • 4 Logical Disks (160 OST’s) • 2 Xeon Processors • 1 MDS • 1 Client creating files Sustained 4GB/s Active Storage write processing 10

  10. Real Time Visualization Time OST # 11

  11. Active Storage Processing Patterns 12

  12. Status and Future Work • Status • Proof of concept 1W->2W code works now • Difficult to administer and use – 2 people • Memory copies between user and kernel space • Future • Implement other processing patterns • Optimize performance by eliminating memory copies • Implement Active Storage for PVFS • Support different striping in files • HDF, NetCDF, Database Stored Procedure Calls …. 13

More Related