180 likes | 270 Views
MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh , Samer Al-Kiswany, Matei Ripeanu , Emalayan Vairavanathan, (and many others from UBC, ANL, ORNL) Networked Systems Laboratory ( NetSysLab ) University of British Columbia http://n etsyslab.ece.ubc.ca .
E N D
MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu, Emalayan Vairavanathan, (and many others from UBC, ANL, ORNL) Networked Systems Laboratory (NetSysLab) University of British Columbia http://netsyslab.ece.ubc.ca
Networked Systems Laboratory (NetSysLab) University of British Columbia A golf course … … a (nudist) beach (… and 199 days of rain each year)
The Landscape Diverse workload characteristics Workflows Data Analysis Checkpointing C C C C Storage System Middleware Challenge: Design an efficient storage system middleware Supercomputers Cloud Computing Desktop Grids Diverse platform capabilities
2.5K IO Nodes 160K cores GPFS IO rate : 8GBps = 51KBps / core 10 Gb/s Switch Complex Hi-Speed Network 24 servers 850 MBps per 64 nodes 2.5 GBps per node Motivation: Underprovisioned storage systems on manyHPC platforms (e.g., BlueGene/P at ANL) The shared storage is a bottleneck There are underutilized resources close to application
2.5K IO Nodes 160K cores GPFS IO rate : 8GBps = 51KBps / core 10 Gb/s Switch Complex Shared data-store 24 servers 850 MBps per 64 nodes 2.5 GBps per node Solution: a temporary shared datastore Nodes dedicated to an application Storage system coupled with the application’s execution
2.5K IO Nodes 160K cores GPFS IO rate : 8GBps = 51KBps / core 10 Gb/s Switch Complex Shared data-store 24 servers 850 MBps per 64 nodes 2.5 GBps per node Benefits Storage closer to the application. Ability to specialize
Evaluation: Harnessing ‘Close to Application’ Underutilized Resources Zhang et. al., “Design and Evaluation of a Collective I/O Model for Loosely-coupled Petascale Programming”, MTAGS ’08. Overall: 1.52x Exploiting the underutilized resources can critically improve the storage system performance
Evaluation: Specialization • Deduplication benefits a checpointing workload • 3x higher throughput • 25-70% less storage space and network effort • Scales to hundreds of clients MosaStore throughput at larger scale (pool of 35 nodes) Experiment by: Henry Monti (VirginiaTech) on Cray XT4 cluster at ORNL Specialization can critically improve the storage system performance [S. Al-Kiswany, M. Ripeanu, S. Vazhkudai, A. Gharaibeh, “stdchk: A Checkpoint Storage System for Desktop Grid Computing”, ICDCS ‘08]
Summary so far • MosaStore: versatile storage architecture, that : • Exploits underutilized resources ‘close`to the application. • Supports specialization and configurability • System is • Configured at deployment time • Deployment lifetime coupled with that of the target application. [S. Al-Kiswany, A. Gharaibeh, M. Ripeanu, “The Case for a Versatile Storage System”, HotStorage’09]
FS API CM Cross-layer Optimizations Automating config. choice Versatile Storage StoreGPU How to harness massively multicore processors to support storage system operations? [HPDC ’08, JoCC‘09, IPCCC’09, HPDC`10] Configurable and extensible storage system that can be specialized for a broad set of apps. [ICDCS ’08, HotStorage ’09] Can one enable cross-layer optimizations? [HPDC HotTopics’08, CCGrid`12, WSLF`11] How I choose a good configuration for my application? [ERSS`11¸ GRID`10] MosaStore-Storage System Prototype Goals: (1) exploration platform, and (2) support for large-scale computational science research projects.
Today: applications and storage systems treat data items uniformly Opportunity: additional information can enable differentiated treatment of data items • Application Storage System • Applications can present hints on the desired use of the data: e.g., desired replication levels, caching, data importance, etc • Storage System Application • Storage can expose storage-level attributes e.g., file location characteristics, file health status, POSIX API Custom Metadata Our use-case: A workflow aware file system
Workflow Applications • File based communication • Irregular and application-dependant data access • 100000s of process, runs for weeks • Generate large I/O volumes (100TB cumulative). Montage workflow 512 BG/P cores, GPFS intermediate file system Source [Zhao et. al, 2012]
I/O patterns in Workflow Applications • Pipeline • Broadcast • Reduce • Scatter • Gather Case studies in storage access by loosely coupled petascale applications, Wozniak et al, PDWS, 2009
Application: Montage Stage - 10 Reduce pattern Stage - 9 Pipeline pattern Stage - 5 Reduce pattern Stages 6, 7,8 Pipeline pattern • <
I/O Patterns and Storage Optimizations Data-item specific patterns and optimizations! Need for information flows in both directions Idea: Cross-layer communication to support this
A workflow-aware file system • Thesis: cross-layer communication supported by file-level metadata • the key mechanism to enable a workflow-aware file system • Progress so far: promising evaluation of potential gains (CCGrid`12) • Next step: build the system and evaluate it with applications (?SC`12)
FS API CM Cross-layer Optimizations Automating config. choice Versatile Storage StoreGPU Harnessing massively multicore processors to support storage system operations. [HPDC ’08, JoCC‘09, IPCCC’09, HPDC`10] Configurable and extensible storage system that can be specialized for a broad set of apps. [ICDCS ’08, HotStorage ’09] Enablbidirectional cross-layer optimizations. [HPDC HotTopics’08, CCGrid`12, WSLF`11] How I choose a good configuration for my application? [ERSS`11¸ GRID`10] MosaStore-Storage System Prototype Goals: (1) exploration platform, and (2) support for large-scale computational science research projects.