620 likes | 796 Views
Data Management, Storage and Access Optimization in High Performance Distributed Environment. Xiaohui Shen Department of Electrical and Computer Engineering Northwestern University Jan 17, 2001. Outline. Problem Definition Solutions Meta-data Management System
E N D
Data Management, Storage and Access Optimization in High Performance Distributed Environment Xiaohui Shen Department of Electrical and Computer Engineering Northwestern University Jan 17, 2001 Xiaohui Shen
Outline • Problem Definition • Solutions • Meta-data Management System • Remote Storage Access Optimizations • Multi-Storage I/O System • Distributed Parallel File System • I/O performance prediction and evaluation • Integrated working environment Xiaohui Shen
Motivation Xiaohui Shen
Current Solutions • Parallel File System and runtime libraries: smart I/O optimizations, caching, prefetching, parallel I/O • User interfaces are low-level • No portable • Hard-coded • I/O selection is difficult for runtime systems • Database Systems: high-level, easy-to-use, portable • lack of power I/O optimizations Xiaohui Shen
System Architecture Xiaohui Shen
Tasks • Meta-data Management System • Remote Storage Access Optimizations • Efficient Storage Organization • Multi-Storage I/O System • Distributed Parallel File System • I/O performance prediction and evaluation • Integrated working environment Xiaohui Shen
Part 1: Meta-data Management System (MDMS) • Abstract Storage Devices (ASDs) • Storage patterns & access patterns • Access History and trail of navigation Xiaohui Shen
MDMS Tables Xiaohui Shen
MDMS Internal Representation Xiaohui Shen
MDMS I/O Flow (API) Xiaohui Shen
Optimizations inside MDMS Xiaohui Shen
Part 2: Remote Storage Access Optimization for HSS • Secondary Storage Access techniques: collective-I/O, data sieving, caching, prefetching etc • Tertiary Storage Systems directly interacts with applications • Remote environment Xiaohui Shen
Optimizations • Remote Collective I/O • Remote Data sieving • Asynchronous I/O • Subfile • Superfile • Migration, Stage and Purge, SRB Container Xiaohui Shen
Optimization: Subfile Xiaohui Shen
Optimization: Superfile • Create: One large file • Access: first access brings the whole large file into memory, subsequent accesses can be directly serviced from memory Xiaohui Shen
Other Optimizations • Migration • Stage • Purge • SRB Container Xiaohui Shen
Part 3: MS-I/O: A Multi-storage I/O System • Further performance improvement is limited by the nature of storage media. • The problem is rooted in the traditional Single-storage resource architecutre. Xiaohui Shen
Solution: Multi-storage Resource Architecture • Increases logical storage capacity • Provides a more flexible and reliable computing environment • Provides new opportunities for further performance improvement Xiaohui Shen
Multi-storage Resource Architecture Xiaohui Shen
Experimental Environment • Local Postgres Database • Local Disks • Remote Disks • Remote Tapes • Compute resource: Argonne SP2 Xiaohui Shen
Multi-storage I/O System Xiaohui Shen
Database Tables and I/O Routines • Run table • Dataset table • Access pattern table • Storage pattern table • Execution table Xiaohui Shen
User Access Pattern (write) Xiaohui Shen
User Access Pattern (read) Xiaohui Shen
Optimization decision Flow Xiaohui Shen
Applications and Tools Xiaohui Shen
Experimental Environment • Applications: IBM SP2 at Argonne • Multiple Storage Resources: • Local Disks: Argonne SP2 • Remote Disks: SDSC • Remote Tapes: SDSC HPSS • Local Database: Postgres at NWU Xiaohui Shen
MS-I/O Experiments:Data Analysis on Astrophysics data • No access pattern then Remote Tape • DataPartition=‘BBB’ then Remote Tape + Colletive I/O • WhenUse=‘soon’ & Size =‘ medium’ then Remote Disk • Plus DataPartion=‘BBB” then Remote Disk + Collective I/O • Plus UseFrequency=‘frequent’ then Local Disk • Plus DataPartion=‘BBB” then Local Disk + Collective I/O Xiaohui Shen
MS-I/O Experiments: Volume Rendering • No Access Pattern then Remote Tape • ComputeTime=‘large’ then Remote Tape + Asyn- I/O • WhenUse=‘soon’ & Size =‘ medium’ then Remote Disk • Plus ComputeTime=‘large’ then Remote Disk + Asyn - I/O • Plus UseFrequency=‘frequent’ then Local Disk • Plus ComputeTime=‘large’ then Local Disk + Asyn - I/O Xiaohui Shen
WriteSize=‘huge’ & FutureReadSize = ‘partial’ WriteSize=‘small’ & WriteSequence=‘y’ & FutureReadSequence=‘y’ MS-I/O Experiments: Subfile and Superfile Xiaohui Shen
Dataset was first placed at Remote site Read.UseFrequency =‘frequent’ Dataset being frequently used is detected. MS-I/O Experiments: Replication and Access History Xiaohui Shen
Part 4: DPFS: A Distributed Parallel File System • Collect idle distributed storage as supplement to native storage of parallel computing systems • Characteristics • Distributed • Parallel • File System • Database Xiaohui Shen
System Architecture of DPFS Xiaohui Shen
Software Architecture of DPFS • Parallelism • Concurrency Xiaohui Shen
DPFS BSU and File view • A Basic Striping Unit (BSU) is called brick in DPFS. Size is 64K. Xiaohui Shen
Striping Methods • Lineal Striping • Multi-dimensional Striping • Array Striping Xiaohui Shen
Lineal Striping Xiaohui Shen
Problems of Linear Striping Xiaohui Shen
Multi-dimensional Striping Xiaohui Shen
Array Striping Xiaohui Shen
Striping Algorithms • Round - Robin • Greedy Algorithm Xiaohui Shen
P0: 0-7 P1:8-15 P2:16-23 P3:24-31 P0(0,4) P1(9,13) P2(18,22)P3(27,31) P0(1,5) P1(10,14) P2(19,23) P3(24,28) ... Request Combination Xiaohui Shen
Meta-data and Database Xiaohui Shen
Tree Structure Xiaohui Shen
Application Programming Interface • DPFS-Open () • DPFS-Write () • DPFS-Read () • DPFS-Close () Xiaohui Shen
User Interface • File system commands: cp, mkdir, rm, ls etc • File transfer between DPFS and general sequential file system. Example: cp local:my.data DPFS:/home/xhshen:4:greedy Xiaohui Shen
Experimental Environment • Compute Resource: Argonne IBM SP2 • Storage Resources: • Class 1: Argonne Linux machines (Fast Ethernet and ATM) • Class 2: NWU Workstations (155M ATM) • Class 3: NWU Workstations (10 M Eithernet) Xiaohui Shen
DPFS Performance Numbers: File Level Comparison Xiaohui Shen
DPFS Performance Numbers: Striping Algorithm Comparison Xiaohui Shen
Part 5: I/O Performance Prediction and Evaluation • Performance Model • Performance Prediction Algorithm Xiaohui Shen