280 likes | 405 Views
2007 Scientific Data Management All Hands Meeting Snoqualmie, WA. Active Storage and Its Applications. Jarek Nieplocha, Juan Piernas-Canovas Pacific Northwest National Laboratory. Outline. Description of the Active Storage Concept New Implementation of Active Storage Programming Framework
E N D
2007 Scientific Data Management All Hands Meeting Snoqualmie, WA Active Storage and Its Applications Jarek Nieplocha, Juan Piernas-Canovas Pacific Northwest National Laboratory
2 Outline Description of the Active Storage Concept New Implementation of Active Storage Programming Framework Examples and Applications
3 Active Storage in Parallel Filesystems Active Storage exploits the old concept of moving computing to the data source to avoid data transfer penalties applications use compute resources on the storage nodes Storage nodes are full-fledged computers with lots of CPU power available, and standard OSes and Processors Traditional Approach Active Storage Y=foo(X) Y Y=foo(X) P P x FS FS P P Network Network P P FS FS P P compute nodes compute nodes I/O nodes I/O nodes
Example • BLAS DSCAL on disk Y = α .Y • Experiment • Traditional:The input file is read from filesystem, and the output file is written to the same file system. The input file has 120,586,240 doubles. • Active Storage:Each server receives the factor, reads the array of doubles from its disk locally, and stores the resulting array on the same disk. Each server processes 120,586,240/N doubles, where N is the number of servers • Speedup contributed to using multiple OSTs and avoiding data movement between client and servers (no network bottleneck) 4
5 Related Work Active Disk/Storage concept was introduced a decade ago to use processing resources ‘near’ the disk On the Disk Controller. On Processors connected to disks. Reduce network bandwidth/latency limitations. References DiskOS Stream Based model (ASPLOS’98: Acharya, Uysal, Saltz) Active Storage For Large-Scale Data Mining and Multimedia (VLDB ’98: Riedel, Gibson, Faloutsos) Research proved Active Disk idea interesting, but Difficult to take advantage of in practice Processors in disk controllers not designed for the purpose Vendors have not been providing SDK Y=foo(X)
6 Lustre Architecture MDS MDS MDS MDS Client Client O(10000) Client Client Network Directory Metadata & concurrency File IO & Locking OST OST OST OST OST OST O(10) OST OST OST OST Recovery, File Status, File Creation OST OST OST OST O(1000) OST OST OST OST
7 Active Storage in Kernel Space When the client writes to the file A: ASOBD makes a copy of data, and sends it to ASDEV The PC reads from and writes to the char device Original data in A, processed data in B Processing component User space Kernel space NAL Char device OST Active Storage Module ASDEV ASOBD OBDfilter Ldiskfs Disk A B
8 Active Storage ApplicationHigh Throughput Proteomics 1 Experiment per hour 5000 spectra per experiment 4 MByte per spectrum Per instrument: 20 Gbytes per hour 480 Gbytes per day Next generation technology will increase data rates x200 9.4 Tesla High Throughput Mass Spectrometer • Application Problem • Given 2 float input number for target mass and tolerance, find all the possible protein sequences that would fit into specified range • Active Storage Solution • Each OST receives its part of the float pair sent • by the client stores the resulting processing • output in its Lustre OBD (object-based disk)
9 SC’2004 StorCloud Most Innovative Use Award Lustre OST Lustre OST Lustre OST Lustre MDS Gigabit Network Client System Lustre OSS 0 Lustre OSS 38 Lustre OSS 39 • Proteomics Application • 320 TB Lustre • 984 400GB disks • 40 Lustre OSS's running Active Storage • 4 Logical Disks (160 OST’s) • 2 Xeon Processors • 1 MDS • 1 Client creating files Sustained 4GB/s Active Storage write processing
10 Active Storage in User Space Problems with the Kernel Space implementation Portability, maintenance, extra memory copies We developed a User Space implementation Most file system allows the storage nodes to be clients Most file system allows to create files with a given layout Our framework launches Processing Components on the storage nodes which have the files to be processed Processing Components read from and write to local files Highly Portable Implementation Used with Lustre 1.6, PVFS2 2.7 Bug in Lustre 1.4 (and SFS): frequent kernel crashes when mounting the file system on the storage nodes Held initial discussions with IBM on GPFS port
11 Active Storage in User Space Parallel Filesystem's Clients Compute Node Compute Node Compute Node Compute Node ..... Network Interconnect ASRF asmaster ASRF Processing Component Processing Component ..... Data I/O Traffic Active Storage Runtime Framework Storage Node0 Metadata Server Storage NodeN-1 Parallel Filesystem's Components (also clients of the filesystem)
12 Performance Evaluation • AMINOGEN Bioinformatics Application • Input file: ASCII file, mass and tolerance pairs, one per line. Total size = 44 bytes • Output file: binary file which contains amino acid sequences. Total size = 14.2 GB Overall execution time
13 Enhanced Implementation ofActive Storage for Striped Files Striped Files broadly used for performance not supported by earlier AS work Enhanced Implementation Use striping data from filesystem New component: AS Mapper Locality awareness in Processing Component: compute on local chunks Climate application with netCDF Computes statistics of key variables from Global Cloud Resolving simulation (U. Colorado) Eliminated >95% network traffic Processing Component Active Storage Runtime Framework Processing component read call write call Contiguous file 0 1 2 3 4 ... LIBAS read Local chunks Write Local chunks GLIBC read write Local chunks 2 6 10 14 18 ...
14 Examples and ApplicationsJuan Piernas-Canovas
15 Parallel Filesystem's Clients Comp. Node /lustre Comp. Node /lustre Comp. Node /lustre Comp. Node /lustre Network Interconnect MDS & MGS asmaster OST31 /lustre OST43 /lustre dscal dscal Doubles.20 Doubles.15 Data I/O Traffic Doubles.20.out Doubles.15.out Parallel Filesystem's Components Active Storage in DSCAL Example
16 Non-Striped Files <?xml version="1.0"?> <rule> <match> <pattern>/lustre/doubles.*</pattern> </match> <program> <path arch="any">/lustre/dscal</path> <arguments>12345.67890 @ @.out</arguments> </program> </rule> /lustre/doubles.15.out in OST43 (new file) /lustre/doubles.15 in OST43
17 Climate Application • Collaboration with SciDAC GCRM SAP (Karen) • Problem: Compute averages for variables generated from scientific simulation • stored in striped output files • geodesic grid • netCDF data format • Objective: Optimize performance by exploiting data locality in AS Processing Components to minimize network traffic
18 Non-Striped Files <?xml version="1.0"?> <rule> <match> <pattern>/lustre/doubles.*</pattern> </match> <program> <path arch="any">/lustre/dscal</path> <arguments>12345.67890 @ @.out</arguments> </program> </rule> /lustre/doubles.20.out in OST31 (new file) /lustre/doubles.20 in OST31 Execution: /lustre/asd/asmaster /lustre/dscal.xml
19 Processing Patterns In user space, it is easy to support different processing patterns: Client data stream Client data stream Active Storage Active Storage PC PC 1W0 1W#W
20 No Output File (Pattern 1W0) <?xml version="1.0"?> <rule> <match> <pattern>/lustre/doubles.*</pattern> </match> <program> <path arch="any">/lustre/dscal1</path> <arguments>12345.67890 @</arguments> </program> </rule> /lustre/doubles.15 in OST43
21 Several Output Files (Pattern 1W#W) <?xml version="1.0"?> <rule> <match> <pattern>/lustre/doubles.*</pattern> </match> <program> <path arch="any">/lustre/dscal3</path> <arguments>12345.67890 @ @.out @.err</arguments> </program> </rule> /lustre/doubles.15.err in OST43 (new file) /lustre/doubles.15.out in OST43 (new file) /lustre/doubles.15 in OST43
22 Transparent Access to Striped Files <?xml version="1.0"?> <rule> <match> <pattern>/lustre/doubles.*</pattern> </match> <program> <path arch="any">/lustre/dscal</path> <arguments>12345.67890 @{hidechunks} @{copystriping,hidechunks}.out</arguments> </program> </rule> Transparent access to the chunks of the input file Transparent access to the chunks of the output file New output file with the same striping of the input file
23 Mapper and Striped netCDF Files Network Interconnect ...... ASRF ASRF ASRF ASRF PC PC asmaster Mapper(0, 2) Storage Node0 Storage Node1 Storage NodeN-1 Storage Node2 Metadata Server Data I/O Traffic ...... Header Var. data Striped netCDF file ...... Var. data ...... Var. data ...... Var. data Var. data
24 Processing of netCDF files <?xml version="1.0"?> <rule> <stdfiles> <stdout>@.out-${NODENAME}</stdout> </stdfiles> <match> <pattern>/lustre/data.*</pattern> </match> <program> <path arch="any">/lustre/processnetcdf.py</path> <arguments>@ ta</arguments> </program> <mapper> <path arch="any">/lustre/netcdfmapper.py</path> <arguments>@ ta ${CHUNKNUM} ${CHUNKSIZE}</arguments> </mapper> </rule> Non-striped output file /lustre/data.37.out-ost43 Striping information of /lustre/data.37 Variable name in the netCDF file /lustre/data.37
25 PVFS2 support <?xml version="1.0"?> <rule> <match> <pattern>/lustre/doubles.*</pattern> </match> <program> <path arch="any">/lustre/dscal</path> <arguments>12345.67890 @{hidechunks} @{copystriping,hidechunks}.out</arguments> </program> <filesystem> <type>pvfs</type> <mntpoint>/pvfs2</mntpoint> </filesystem> </rule> PVFS2
26 Local File System with Virtual Striping <?xml version="1.0"?> <rule> <match> <pattern>/lustre/doubles.*</pattern> </match> <program> <path arch="any">/lustre/dscal</path> <arguments>12345.67890 @{hidechunks} @{copystriping,hidechunks}.out</arguments> </program> <filesystem> <type>localfs</type> <striping>8:1048576</striping> </filesystem> </rule> Local file system Virtual striping: - stripe size: 1MB - stripe count: 8
27 Further Information Technical paper J. Piernas, J. Nieplocha, E. Felix, “Evaluation of Active Storage Strategies for the Lustre Parallel Filesystem”, Proc. SC’07 Website: http://hpc.pnl.gov/projects/active-storage Upcoming release in December 2007 Support for Lustre 1.6, PVFS2, and Linux local file systems Source code available now under request. Just send us an e-mail! Jarek Nieplocha <jarek.nieplocha@pnl.gov> Juan Piernas-Canovas <juan.piernascanovas@pnl.gov>
Questions? Active Storage and Its Applications Jarek Nieplocha, Juan Piernas-Canovas Pacific Northwest National Laboratory