140 likes | 323 Views
Storage Tank in Data Grid. August 23, 2003. Shin, SangYong(syshin, #6468) IBM Grid Computing. Storage Architecture Model. Application. File System. - app. data is in files - files stored on block storage - all managed by storage mgmt s/w. Block Virtualization. Storage Management.
E N D
Storage Tank in Data Grid August 23, 2003 Shin, SangYong(syshin, #6468) IBM Grid Computing
Storage Architecture Model Application File System - app. data is infiles - files stored on block storage- all managed by storage mgmt s/w Block Virtualization Storage Management Storage Devices Block subsystem
Block Virtualization Today Emerging • No common view of block storage • Server impact on storage change • Common view of block storage • No server impact on storage change SAN SAN Block Virtualization - IBM block virtualization is Lodestone
Extending Lodestone for Grid • Functions • Providing Virtual Disks • Online- Dynamic Volume Sizing • Advanced Copy Functions • Economic Disaster Recovery Solutions. • Different Level of Performance • Data Backup with low price disk • No Service Downtime • etc Application Host Host Host ... LVE LVE LVE LVE High-end Midrange Disk array RAID Brick Disk array FastT, Brand Y Shark, Brand X LVE = Lodestone Virtualization Engine
SAN File Systems - Current Capabilities Vs Grid Requirements • GPFS • HPC, Engineering, Digital Media • Access from servers in a cluster • Concurrent multiple I/Os • AIX and Linux OS only • No access to other FS data • Storage Tank • Commercial, file sharing, DB serving • Access from servers on SAN • All servers and OSes • No access to other FS data • Grid requirements • Access from any machine, any OS, anywhere • Access to all file system data • Planned Approach: • Allow remote access to our file systems • Provide multi-site support • Integrate data from other sources
We believe NFSV4 will be an important protocol for the grid • has the necessary extensions for robust security and WAN access • is the first NFS protocol to come through the standards process • proposed standard in Dec. 2002; expected to be draft standard by 4Q03 • Our plan is to provide NFSv4 support for our file systems (J2, GPFS and Storage Tank) • Best case will be late 2004 NFSv4 support for our file systems
GridFTP, NFS LAN file attributes, file location info, control info Storage Tank (ST) - a SAN file system ST Clients Linux Win2K AIX Solaris ST Server ST agent ST agent ST agent ST agent ST Server - data Meta data SN ST Server Meta Prototypes: 2H02-1H03 Customer: CERN Data Data Backup • Capabilities: • access to ST data through Globus GridFTP interface • register ST files in Globus Replica Location Service • enabled to support OGSA services (e.g. replication) • centralized, policy-based storage management • cross-platform file sharing • performance comparable to local file system with direct client-to-storage data path
Data analysis of Large Hadron Collider (LHC) experiments • Basic unit of data is an LHC event • data represents physical collision between 2 protons • 1 to few MBs • stored within 1 GB files • event metadata stored in an RDBMS • Tiered structure • CERN is Tier 0 • event data and metadata distributed to Tier 1 centers • physicists at Tier 2 centers analyze data at Tier 1 centers • 2.4 PB of disk and 14 PB of tape by 2007 • Grid access (AFS/DFS like), simple storage management • IP SANs, not FC CERN Requirements
Use Storage Tank for basic storage infrastructure • Use iSCSI disks • FAStT with iSCSI gateway or 200i • DB2 for event metadata • Research extensions • NAS head for Storage Tank • Grid access to Storage Tank • Object Store prototype for disks Our Proposal
Meta-data Server Cluster Win2K ST Agent AIX ST Agent Solaris ST Agent Linux ST Agent Control Network (IP) Meta-data Server Cluster SAN Win2K ST Agent AIX ST Agent Solaris ST Agent Linux ST Agent data Meta-data Server Cluster SAN Tank NYC Fargo SFO Extend ST to Multiple Sites – Distributed Storage Tank Single namespace across multiple sites - Replication of files for good performance - Extended protocols for consistency across replicas - Joint research w/ Johns Hopkins underway Branch office ST Extensions Prototype: 1H04 Customer: CERN, JHU Integrated ST/NAS Appliance Control Network (IP) data
Client Client Client Client Access Server Access Server Proxy Server exporter exporter exporter exporter Ultimate Vision for Federated Grid File Systems Organization 1 Organization 2 . . . . . . file sources
Client Client Client Client Win2K ST Agent AIX ST Agent Solaris ST Agent Linux ST Agent Meta-data Server Cluster SAN Win2K ST Agent AIX ST Agent Solaris ST Agent Linux ST Agent Meta-data Server Cluster SAN Extend ST to access data from other file systems/sources Control Network (IP) NFS GridFTP Grid data repository NFS NAS data repository NFS Control Network (IP) data data
Applications OGSA-CIM Wrapper Storage Management Services CIM Provider Interface CIM Provider Interface CIM Provider Interface Storage Tank Shark, Tape, etc. Lodestone • IBM storage management products today (TSM, TSRM, ITSANM) and planned products (Merlot) cover a reasonable set of functions • We are converging, with the industry, on CIM/XML as the standard for storage device management • In support of grid, we expect: • to convert our management solutions to Web/OGSA services • to enhance functionality CIM/XML OGSA Storage Management in Grid Computing Environment CIM/XML We are just starting to focus on grid implications for storage management
Summary of Data Grid Support OGSA upper interface Support CIM lower interface Application Block subsystem File System Extend ST & GPFS Block Virtualization OGSA Storage Management CIM Lodestone Storage Devices Block subsystem