1 / 18

MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh , Samer Al-Kiswany,

MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh , Samer Al-Kiswany, Matei Ripeanu , Emalayan Vairavanathan, (and many others from UBC, ANL, ORNL) Networked Systems Laboratory ( NetSysLab ) University of British Columbia http://n etsyslab.ece.ubc.ca .

rhea
Download Presentation

MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh , Samer Al-Kiswany,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MosaStore -A Versatile Storage System Lauro Costa, Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu, Emalayan Vairavanathan, (and many others from UBC, ANL, ORNL) Networked Systems Laboratory (NetSysLab) University of British Columbia http://netsyslab.ece.ubc.ca

  2. Networked Systems Laboratory (NetSysLab) University of British Columbia A golf course … … a (nudist) beach (… and 199 days of rain each year)

  3. The Landscape Diverse workload characteristics Workflows Data Analysis Checkpointing C C C C Storage System Middleware Challenge: Design an efficient storage system middleware Supercomputers Cloud Computing Desktop Grids Diverse platform capabilities

  4. 2.5K IO Nodes 160K cores GPFS IO rate : 8GBps = 51KBps / core 10 Gb/s Switch Complex Hi-Speed Network 24 servers 850 MBps per 64 nodes 2.5 GBps per node Motivation: Underprovisioned storage systems on manyHPC platforms (e.g., BlueGene/P at ANL) The shared storage is a bottleneck There are underutilized resources close to application

  5. 2.5K IO Nodes 160K cores GPFS IO rate : 8GBps = 51KBps / core 10 Gb/s Switch Complex Shared data-store 24 servers 850 MBps per 64 nodes 2.5 GBps per node Solution: a temporary shared datastore Nodes dedicated to an application Storage system coupled with the application’s execution

  6. 2.5K IO Nodes 160K cores GPFS IO rate : 8GBps = 51KBps / core 10 Gb/s Switch Complex Shared data-store 24 servers 850 MBps per 64 nodes 2.5 GBps per node Benefits Storage closer to the application. Ability to specialize

  7. Evaluation: Harnessing ‘Close to Application’ Underutilized Resources Zhang et. al., “Design and Evaluation of a Collective I/O Model for Loosely-coupled Petascale Programming”, MTAGS ’08. Overall: 1.52x Exploiting the underutilized resources can critically improve the storage system performance

  8. Evaluation: Specialization • Deduplication benefits a checpointing workload • 3x higher throughput • 25-70% less storage space and network effort • Scales to hundreds of clients MosaStore throughput at larger scale (pool of 35 nodes) Experiment by: Henry Monti (VirginiaTech) on Cray XT4 cluster at ORNL Specialization can critically improve the storage system performance [S. Al-Kiswany, M. Ripeanu, S. Vazhkudai, A. Gharaibeh, “stdchk: A Checkpoint Storage System for Desktop Grid Computing”, ICDCS ‘08]

  9. Summary so far • MosaStore: versatile storage architecture, that : • Exploits underutilized resources ‘close`to the application. • Supports specialization and configurability • System is • Configured at deployment time • Deployment lifetime coupled with that of the target application. [S. Al-Kiswany, A. Gharaibeh, M. Ripeanu, “The Case for a Versatile Storage System”, HotStorage’09]

  10. FS API CM Cross-layer Optimizations Automating config. choice Versatile Storage StoreGPU How to harness massively multicore processors to support storage system operations? [HPDC ’08, JoCC‘09, IPCCC’09, HPDC`10] Configurable and extensible storage system that can be specialized for a broad set of apps. [ICDCS ’08, HotStorage ’09] Can one enable cross-layer optimizations? [HPDC HotTopics’08, CCGrid`12, WSLF`11] How I choose a good configuration for my application? [ERSS`11¸ GRID`10] MosaStore-Storage System Prototype Goals: (1) exploration platform, and (2) support for large-scale computational science research projects.

  11. Today: applications and storage systems treat data items uniformly Opportunity: additional information can enable differentiated treatment of data items • Application  Storage System • Applications can present hints on the desired use of the data: e.g., desired replication levels, caching, data importance, etc • Storage System  Application • Storage can expose storage-level attributes e.g., file location characteristics, file health status, POSIX API Custom Metadata Our use-case: A workflow aware file system

  12. Workflow Applications • File based communication • Irregular and application-dependant data access • 100000s of process, runs for weeks • Generate large I/O volumes (100TB cumulative). Montage workflow 512 BG/P cores, GPFS intermediate file system Source [Zhao et. al, 2012]

  13. I/O patterns in Workflow Applications • Pipeline • Broadcast • Reduce • Scatter • Gather Case studies in storage access by loosely coupled petascale applications, Wozniak et al, PDWS, 2009

  14. Application: Montage Stage - 10 Reduce pattern Stage - 9 Pipeline pattern Stage - 5 Reduce pattern Stages 6, 7,8 Pipeline pattern • <

  15. I/O Patterns and Storage Optimizations Data-item specific patterns and optimizations! Need for information flows in both directions Idea: Cross-layer communication to support this

  16. A workflow-aware file system • Thesis: cross-layer communication supported by file-level metadata • the key mechanism to enable a workflow-aware file system • Progress so far: promising evaluation of potential gains (CCGrid`12) • Next step: build the system and evaluate it with applications (?SC`12)

  17. FS API CM Cross-layer Optimizations Automating config. choice Versatile Storage StoreGPU Harnessing massively multicore processors to support storage system operations. [HPDC ’08, JoCC‘09, IPCCC’09, HPDC`10] Configurable and extensible storage system that can be specialized for a broad set of apps. [ICDCS ’08, HotStorage ’09] Enablbidirectional cross-layer optimizations. [HPDC HotTopics’08, CCGrid`12, WSLF`11] How I choose a good configuration for my application? [ERSS`11¸ GRID`10] MosaStore-Storage System Prototype Goals: (1) exploration platform, and (2) support for large-scale computational science research projects.

  18. Thank you

More Related