410 likes | 595 Views
Research @ Northeastern University. I/O storage modeling and performance David Kaeli Soft error modeling and mitigation Mehdi B. Tahoori. I/O Storage Research at Northeastern University. David Kaeli Yijian Wang Department of Electrical and Computer Engineering Northeastern University
E N D
Research @ Northeastern University • I/O storage modeling and performance • David Kaeli • Soft error modeling and mitigation • Mehdi B. Tahoori April 2005
I/O Storage Research at Northeastern University David Kaeli Yijian Wang Department of Electrical and Computer Engineering Northeastern University Boston, MA kaeli@ece.neu.edu
Outline • Motivation to study file-based I/O • Profile-driven partitioning for parallel file I/O • I/O Qualification Laboratory @ NU • Areas for future work April 2005
Important File-base I/O Workloads • Many subsurface sensing and imaging workloads involve file-based I/O • Cellular biology – in-vitro fertilization with NU biologists • Medical imaging – cancer therapy with MGH • Underwater mapping – multi-sensor fusion with Woods Hole Oceanographic Institution • Ground-penetrating radar – toxic waste tracking with Idaho National Labs April 2005
Air Mine Soil The Impact of Profile-guided Parallelization on SSI Applications • Reduced the runtime of a single-body Steepest Descent Fast Multipole Method (SDFMM) application by 74% on a 32-node Beowulf cluster • Hot-path parallelization • Data restructuring • Reduced the runtime of a Monte Carlo scattered light simulation by 98% on a 16-node Silicon Graphics Origin 2000 • Matlab-to-C compliation • Hot-path parallelization • Obtained superlinear speedup of Ellipsoid Algorithm run on a 16-node IBM SP2 • Matlab-to-C compliation • Hot-path parallelization April 2005
Limits of Parallelization • For compute-bound workloads, Beowulf clusters can be used effectively to overcome computational barriers • Middlewares (e.g., MPI and MPI/IO) can significantly reduce the programming effort on parallel systems • Multiple clusters can be combined, utilizing Grid Middleware (Globus Toolkit) • For file-based I/O-bound workloads, Beowulf clusters and Grid systems are presently ill-suited to exploit the potential parallelism present on these systems April 2005
Outline • Motivation to study file-based I/O • Profile-driven partitioning for parallel file I/O • I/O Qualification Laboratory @ NU • Areas for future work April 2005
Parallel I/O Acceleration • The I/O bottleneck • The growing gap between the speed of processors, networks and underlying I/O devices • Many imaging and scientific applications access disks very frequently • I/O intensive applications • Out-of-core applications • Work on large datasets that cannot fit in main memory • File-intensive applications • Access file-based datasets frequently • Large number of file operations April 2005
Introduction • Storage architectures • Direct Attached Storage (DAS) • Storage device is directly attached to the computer • Network Attached Storage (NAS) • Storage subsystem is attached to a network of servers and file requests are passed through a parallel filesystem to the centralized storage device • Storage Area Network (SAN) • A dedicated network to provide an any-to-any connection between processors and disks April 2005
… P P P Multiple Processes (i.e. MPI-IO) Disk Multiple disks (i.e. RAID) … P P P P … … Disk Disk Disk Disk Disk Disk Data Partitioning Data Striping I/O Partitioning P An I/O intensive application Disk April 2005
I/O Partitioning • I/O is parallelized at both the application level (using MPI and MPI-IO) and the disk level (using file partitioning) • Ideally, every process will only access files on local disk (though this is typically not possible due to data sharing) • How to recognize the access patterns? • Profile-guided approach April 2005
Profile Generation Run the application Capture I/O execution profiles Apply our partitioning algorithm Rerun the tuned application April 2005
I/O traces and partitioning • For every process, for every contiguous file access, we capture the following I/O profile information: • Process ID • File ID • Address • Chunk size • I/O operation (read/write) • Timestamp • Generate a partition for every process • Optimal partitioning is NP-complete, so we develop a greedy algorithm • We have found we can use partial profiles to guide partitioning April 2005
Greedy File Partitioning Algorithm for each IO process, create a partition; for each contiguous data chunk { total up the # of read/write accesses on a process-ID basis; if the chunk is accessed by only one process assign the chunk to the associated partition; if the chunk is read (but never written) by multiple processes duplicate the chunk in all partitions where read; if the chunk is written by one process, but later read by multiple { assign the chunk to all partitions where read and broadcast the updates on writes; else assign the chunk to a shared partition; }} For each partition sort chunks based on the earliest timestamp for each chunk; April 2005
Parallel I/O Workloads • NASA Parallel Benchmark (NPB2.4)/BT • Computational fluid dynamics • Generates a file (~1.6 GB) dynamically and then reads it back • Writes/reads sequentially in chunk sizes of 2040 Bytes • SPEChpc96/seismic • Seismic processing • Generates a file (~1.5 GB) dynamically and then reads it back • Writes sequential chunks of 96 KB and reads sequential chunks of 2 KB • Tile-IO • Parallel Benchmarking Consortium • Tile access to a two-dimensional matrix (~1 GB) with overlap • Writes/reads sequential chunks of 32 KB, with 2KB of overlap • Perf • Parallel I/O test program within MPICH • Writes a 1 MB chunk at a location determined by rank, no overlap • Mandelbrot • An image processing application that includes visualization • Chunk size is dependent on the number of processes April 2005
RAID Node Beowulf Cluster P2-350Mhz P2-350Mhz P2-350Mhz 10/100Mb Ethernet Switch Local PCI-IDE Disk Local PCI-IDE Disk P2-350Mhz P2-350Mhz P2-350Mhz RAID Node April 2005
DAS configuration Linux box, Western Digital WD800BB (IDE), 80GB, 7200RPM Beowulf cluster (base configuration) Fast Ethernet 100Mbits/sec Network Attached RAID - Morstor TF200 with 6-9GB drives Seagate SCSI disks, 7200rpm, RAID-5 Local attached IDE disks – IBM UltraATA-350840, 5400rpm Fibre channel disks Seagate Cheetah X15 ST-336752FC, 15000rpm Hardware Specifics April 2005
Write/Read Bandwidth NPB2.4/BT SPECHPC/seis April 2005
MPI-Tile Perf Write/Read Bandwidth Mandelbrot April 2005
Profile training sensitivity analysis • We have found that IO access patterns are independent of file-based data values • When we increase the problem size or reduce the number of processes, either: • the number of IOs increases, but access patterns and chunk size remain the same (SPEChpc96, Mandelbrot), or • the number of IOs and IO access patterns remain the same, but the chunk size increases (NBT, Tile-IO, Perf) • Re-profiling can be avoided April 2005
Execution-driven Parallel I/O Modeling • Growing need to process large, complex datasets in high performance parallel computing applications • Efficient implementation of storage architectures can significantly improve system performance • An accurate simulation environment for users to test and evaluate different storage architectures and applications April 2005
Execution-driven I/O Modeling • Target applications: parallel scientific programs (MPI) • Target machine/Host machine: Beowulf clusters • Use DiskSim as the underlying disk drive simulator • Direct execution to model CPU and network communication • We execute the real parallel I/O accesses and meanwhile, calculate the simulated I/O response time April 2005
Validation – Synthetic I/O Workload on DAS April 2005
Local I/O traces Local I/O traces Local I/O traces Local I/O traces LAN/WAN Network File System RAID controller Filesystem metadata Logical file access addresses I/O traces I/O requests Disk Sim Simulation Framework - NAS April 2005
LAN/WAN FileSystem FileSystem FileSystem FileSystem I/O traces I/O traces I/O traces I/O traces Disk Sim Disk Sim Disk Sim Disk Sim • A variety of SAN where disks are distributed across the network and each • server is directly connected to a single device • File partitioning • Utilize I/O profiling and data partitioning heuristics to distribute portions of • files to disks close to the processing nodes Simulation Framework – SAN direct April 2005
Hardware Specifications April 2005
Publications • “Profile-guided File Partitioning on Beowulf Clusters,” Journal of Cluster Computing, Special Issue on Parallel I/O, to appear 2005. • “Execution-Driven Simulation of Network Storage Systems,” Proceedings of the 12th ACM/IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), October 2004, pp. 604-611. • “Profile-Guided I/O Partitioning,” Proceedings of the 17th ACM International Symposium on Supercomputing, June 2003, pp. 252-260. • “Source Level Transformations to Apply I/O Data Partitioning,” Proceedings of the IEEE Workshop on Storage Network Architecture And Parallel IO, Oct. 2003, pp. 12-21. • “Profile-Based Characterization and Tuning for Subsurface Sensing and Imaging Applications,” International Journal of Systems, Science and Technology, September 2002, pp. 40-55. April 2005
Summary of Cluster-based Work • Many imaging applications are dominated by file-based I/O • Parallel systems can only be effectively utilized if I/O is also parallelized • Developed a profile-guided approach to I/O data partitioning • Impacting clinical trials at MGH • Reduced overall execution time by 27-82% over MPI-IO • Execution-driven I/O model is highly accurate and provides significant modeling flexibility April 2005
Outline • Motivation to study file-based I/O • Profile-driven partitioning for parallel file I/O • I/O Qualification Laboratory @ NU • Areas for future work April 2005
I/O Qualification Laboratory • Working with Enterprise Strategy Group • Develop a state-of-the-art facility to provide independent performance qualification of Enterprise Storage systems • Provide a quarterly report to ES customer base on the status of current ES offerings • Work with leading ES vendors to provide them with custom early performance evaluation of their beta products April 2005
I/O Qualification Laboratory • Contacted by IOIntegrity and SANGATE for product qualification • Developed potential partners that are leaders in the ES field • Initial proposals already reviewed by IBM, Hitachi and other ES vendors • Looking for initial endorsement from industry April 2005
I/O Qualification Laboratory • Why @ NU • Track record with industry (EMC, IBM, Sun) • Experience with benchmarking and IO characterization • Interesting set of applications (medical, environmental, etc.) • Great opportunity to work within the cooperative education model April 2005
Outline • Motivation to study file-based I/O • Profile-driven partitioning for parallel file I/O • I/O Qualification Laboratory @ NU • Areas for future work April 2005
Internet RAID 100Mbit/s 1Gbit/s 31 sub-nodes 8 sub-nodes joulian.hpcl.neu.edu keys.ece.neu.edu Areas for Future Work • Designing a Peer-to-Peer storage system on a Grid system by partitioning datasets across geographically distributed storage devices Head node Head node April 2005
Areas for Future Work • Reduce simulation time by identifying characteristic “phases” in I/O workloads • Apply machine learning algorithms to identify clusters of representative I/O behavior • Utilize K-Means and Multinomial clustering to obtain high fidelity in simulation runs utilizing sampled I/O behavior “A Multinomial Clustering Model for Fast Simulation of Architecture Designs”, submitted to the 2005 ACM KDD Conference. April 2005