240 likes | 413 Views
Data Management. Reading: Chapter 5: “Data-Intensive Computing” And “A Network-Aware Distributed Storage Cache for Data Intensive Environments”. What is Data Management? It depends…. Storage systems Disk arrays Network caches (e.g., DPSS) Hierarchical storage systems (e.g., HPSS)
E N D
Data Management Reading: Chapter 5: “Data-Intensive Computing” And “A Network-Aware Distributed Storage Cache for Data Intensive Environments”
What is Data Management?It depends… • Storage systems • Disk arrays • Network caches (e.g., DPSS) • Hierarchical storage systems (e.g., HPSS) • Efficient data transport mechanisms • Striped • Parallel • Secure • Reliable • Third-party transfers
What is data management? (cont.) • Replication management • Associate files into collections • Mechanisms for reliably copying collections, propagating updates to collections, selecting among replicas • Metadata management • Associate attributes that describe data • Select data based on attributes • Publishing and curation of data • “Official” versions of important collections • Digital libraries
Outline for Today • Examples of data-intensive applications • Storage systems: • Disk arrays • High-Performance Network Caches (DPSS) • Hierarchical Storage Systems (Chris: HPSS) • Next two lectures: • gridFTP • Globus replica management • Metadata systems • Curation
Data-Intensive Applications: Physics • CERN Large Hadron Collider • Several terabytes of data per year • Starting in 2005 • Continuing 15 to 20 years Replication scenario: • Copy of everything at CERN (Tier 0) • Subsets at national centers (Tier 1) • Smaller regional centers (Tier 2) • Individual researchers will have copies
GriPhyN Overview(www.griphyn.org) • 5-year, $12.5M NSF ITR proposal to realize the concept of virtual data, via: • Key research areas: • Virtual data technologies (information models, management of virtual data software, etc.) • Request planning and scheduling (including policy representation and enforcement) • Task execution (including agent computing, fault management, etc.) • Development of Virtual Data Toolkit (VDT) • Four Applications: ATLAS, CMS, LIGO, SDSS
GriPhyN Participants • Computer Science • U.Chicago, USC/ISI, UW-Madison, UCSD, UCB, Indiana, Northwestern, Florida • Toolkit Development • U.Chicago, USC/ISI, UW-Madison, Caltech • Applications • ATLAS (Indiana), CMS (Caltech), LIGO (UW-Milwaukee, UT-B, Caltech), SDSS (JHU) • Unfunded collaborators • UIC (STAR-TAP), ANL, LBNL, Harvard, U.Penn
The Petascale Virtual Data Grid (PVDG) Model • Data suppliers publish data to the Grid • Users request raw or derived data from Grid, without needing to know • Where data is located • Whether data is stored or computed • User can easily determine • What it will cost to obtain data • Quality of derived data • PVDG serves requests efficiently, subject to global and local policy constraints
PVDGScenario User requests may be satisfied via a combination of data access and computation at local, regional, and central sites
Other Application Scenarios • Climate community • Terabyte-scale climate model datasets: • Collecting measurements • Simulation results • Must support sharing, remote access to and analysis of datasets • Distance visualization • Remote navigation through large datasets, with local and/or remote computing
Storage Systems: Disk Arrays • What is a disk array? • Collection of disks • Advantages: • Higher capacity • Many small, inexpensive disks • Higher throughput • Higher bandwidth (Mbytes/sec) on large transfers • Higher I/O rate (transactions/sec) on small transfers
Trends in Magnetic Disks • Capacity increases: 60% per year • Cost falling at similar rate ($/MB or $/GB) • Evolving to smaller physical sizes • 14in 5.25in 3.5in 2.5in 1.0in … ? • Put lots of small disks together • Problem: RELIABILITY • Reliability of N disks = Reliability of 1 disk divided by N
Key Concepts in Disk Arrays Striping for High Performance • Interleave data from single file across multiple disks • Fine-grained interleaving: • every file spread across all disks • any access involves all disks • Course-grained interleaving: • interleave in large blocks • small accesses may be satisfied by a single disk
Key Concepts in Disk Arrays Redundancy • Maintain extra information in disk array • Duplication • Parity • Reed-Solomon error correction codes • Others • When a disk fails: use redundancy information to reconstruct data on failed disk
RAID “Levels” • Defined by combinations of striping & redundancy • RAID Level 1: Mirroring or Shadowing • Maintain a complete copy of each disk • Very reliable • High cost: twice the number of disks • Great performance: on a read, may go to disk with faster access time • RAID Level 2: Memory Style Error Detection and Correction • Not really implemented in practice • Based on DRAM-style Hamming codes • In disk systems, don’t need detection • Use less expensive correction schemes
RAID “Levels” (cont.) • RAID Level 3: Fine-grained Interleaving and Parity • Many commercial RAIDs • Calclate parity bit-wise across disks in the array (using exclusive-OR logic) • Maintain a separate parity disk; update on write operations • When a disk fails, use other data disk and parity disk to reconstruct data on lost disk • Fine-grained interleaving: all disks involved in any access to the array
RAID “Levels” (cont.) • RAID Level 4: Large Block Interleaving and Parity • Similar to level 3, but interleave on larger blocks • Small accesses may be satisfied by a single disk • Supports higher rate of small I/Os • Parity disk may become a bottleneck with multiple concurrent I/Os • RAID Level 5: Large Block Interleaving and Distributed Parity • Similar to level 4 • Distributes parity blocks throughout all disks in array
RAID Levels (cont.) • RAID Level 6: Reed-Solomon Error Correction Codes • Protection against two disk failures • Disks getting so cheap: consider massive storage systems composed entirely of disks • No tape!!
DPSS: Distributed Parallel Storage System • Produced by Lawrence Berkeley National Labs • “Cache”: provides storage that is • Faster than typical local disk • Temporary • “Virtual disk”: appears to be single large, random-access, block-oriented I/O device • Isolates application from tertiary storage system: • Acts as large buffer between slow tertiary storage and high-performance network connections • “Impedance matching”
Features of DPSS • Components: • DPSS block servers • Typically low-cost workstations • Each with several disk controllers, several disks per controller • DPSS mater process • Data requests sent from client to master process • Determines which DPSS block server stores the requested blocks • Forwards request to that block server • Note: servers can be anywhere on network (a distributed cache)
Features of DPSS (cont.) • Client API library • Supports variety of I/O semantics • dpssOpen(), dpssRead(), dpssWrite(), dpssLSeek(), dpssClose() • Application controls data layout in cache • For typical applications that read sequentially: stripe blocks of data across servers in round-robin fashion • DPSS client library is multi-threaded • Number of client threads is equal to number of DPSS servers: client speed scales with server speed
Features of DPSS (cont.) • Optimized for relatively small number of large files • Several thousand files • Greater than 50 MB • DPSS blocks are available as soon as they are placed in cache • Good for staging larges files to/from tertiary storage • Don’t have to wait for large transfer to complete • Dynamically reconfigurable • Add or remove servers or disks on the fly
Features of DPSS (cont.) • Agent-based performance monitoring system • Client library automatically sets TCP buffer size to optimal value • Uses information published by monitoring system • Load balancing • Supports replication of files on multiple servers • DPSS master uses status information stored in LDAP directory to select a replica that will give fastest response
Hierarchical Storage System • Fast, disk cache in front of larger, slower storage • Works on same principle as other hierarchies: • Level-1 and Level-2 caches: minimize off-chip memory accesses • Virtual memory systems:minimize page faults to disk • Goal: • Keep popular material in faster storage • Keep most of material on cheaper, slower storage • Locality: 10% of material gets 90% of accesses • Problem with tertiary storage (especially tape): • Very slow • Tape seek times can be a minute or more…