230 likes | 236 Views
This overview discusses storage technology, including hardware, interconnects, and service layers. It highlights current hot topics in storage, identifies challenges, and provides ideas for managing large data volumes.
E N D
Storage Overviewand IT-DM Lessons Learned Luca Canali, IT-DM DM Group Meeting 10-3-2009
Outline • Goal: review of storage technology • HW layer (HDs, storage array) • Interconnect (how to attach storage to the server) • Service layer (filesystems) • Expose current hot topics in storage • Identify challenges • Stimulate ideas for management of large data volumes
Why storage is a very interesting area in the coming years • Storage market is very conservative • Few vendors share market for large enterprise solutions • Enterprise storage has typically a high premium • Opportunities • Commodity HW/grid-like solutions provide order of magnitude gain in cost/performance • New products coming to the market promise many changes: • Solid state disks, high capacity disks, high performance and low cost interconnects
HW layer – HD, the basic element • Hard disk technology • Basic block of storage since 40 years • Main intrinsic limitation: latency
HD specs • HDs are limited • In particular seek time is unavoidable (7.2k to 15k rpm, ~2-10 ms) • 100-200 IOPS • Throughput ~100MB/s, typically limited by interface • Capacity range 300GB -2TB • Failures: mechanical, electric, magnetic, firmware issues. MTBF: 500k -1.5M hours
Enterprise disks • Performance • enterprise disks offer more performance: • They spin faster and have better interconnect protocols (e.g. SAS vs SATA) • Typically of low capacity • Our experience: often not competitive in cost/perf vs. SATA
HD failure rates • Failure rate • Our experience: it depends on: vendor, temperature, infant mortality, age. • At FAST’07 2 papers (one from Google) showed that vendor specs often need to be ‘adjusted’ in real life. • Google data seriously questioned usefulness of SMART probes and correlation of temperature/age/usage with MTBF. • Other study showed that consumer and enterprise disks have similar failure pattern and life time. Moreover HD failures in RAID sets have correlations.
HD wrap-up • HD is a old but evergreen technology • In particular disk capacities have increased of one order of magnitude in just a few years • At the same time prices have gone down (below 0.1 USD per GB for consumer products) • 1.5 TB consumer disks, and 450GB enterprise disks are common • 2.5’’ drives are becoming standard to reduce power consumption
Scaling out the disk • The challenge for storage systems • Scale out the disk performance to meet demands • Throughput • IOPS • Latency • Capacity • Sizing storage systems • Must focus on critical metric(s) • Avoid ‘capacity trap’
RAID and redundancy • Storage arrays are the traditional approach • implement RAID to protect data. • Parity based: RAID5, RAID6 • Stripe and mirror: RAID10 • Scalability problem of this method • For very large configurations MTBF ~ RAID rebuild time (!) • Challenge: RAID does not scale
Beyond RAID • Google and Amazon don’t use RAID • Main idea: • Divide data in ‘chunks’ • Write multiple copies of the chunks • Google file system: writes chunks in 3 copies • Amazon S3: write copies at different destinations, i.e. data center mirroring • Additional advantages: • Removes the constraint of locally storing redundancy inside one storage arrays • Can move, refresh, or relocate data chunks easily
Our experience • Physics DB storage uses ASM • Volume manager and cluster file system integrated with Oracle • Soon to be also a general-purpose cluster file system (11gR2 beta testing) • Oracle files are divided in chunks • Chunks are distributed evenly across storage • Chunks are written in multiple copies (2 or 3 it depends on file type and configuration) • Allows the use of low-cost storage arrays: does not need RAID support
Scalable and distributed file systems on commodity HW • Allow to manage and protect large volumes of data • Solutions proven by Google and Amazon, Sun’s ZFS, Oracle’s ASM • Can provide order of magnitude savings on HW acquisition • Additional scale savings by deployment of cloud and virtualization models • Challenge: solid and scalable distributed file systems are hard to build
The interconnect • Several technologies available • SAN • NAS • iSCSI • Direct attach
The interconnect • Throughput challenge • It takes 3 hours to copy/backup 1TB over 1 GBPS network
IP based connectivity • NAS, iSCSI suffer from poor performance of Gbps Ethernet • 10 Gbps may/will(?) change the picture • At present not widely deployed on servers because of cost • Moreover TCP/IP has CPU overhead
Specialized storage networks • SAN is the de facto standard for most enterprise level storage • Fast, low overhead on server CPU, easy to configure • Our experience (and Tier1s): SAN networks with max 64 ports at low cost • Measured: 8 Gbps transfer rate (4+4 dual ported HBAs for redundancy and load balancing) • Proof of concept FC backup (LAN free) reached full utilization of tape heads • Scalable: proof of concept ‘Oracle supercluster’ of 410 SATA disks, and 14 dual quadcore servers
NAS • CERN’s experience of NAS for databases • Netappfiler can use several protocols, the main being NFS • Throughput limitation because of TCP/IP • Trunking is possible to alleviate the problem, main solution may/will(?) be to move to 10Gbps • The filer contains a server with CPU and OS • In particular the proprietary WAFL filesystem is capable of creating read-only snapshots • Proprietary Data ONTAP OS runs on the filer box • Additional features make worse cost/performance
iSCSI • iSCSI is interesting for cost reduction • Many concerns on performance though, due to IP interconnect • Adoption seems to be only for low-end systems at the moment • Our experience: • IT-FIO is acquiring some test units, we have been announced that some test HW will be available for IT-DM databases
The quest for ultimate latency reduction • Solid state disks provide unique specs • Seek time are at least one order of magnitude better than best HDs • A single disk can provide >10k random read IOPS • High read throughput
SSD (flash) problems • Flash based SSD still suffer from major problems for enterprise solutions • Cost/GB: more than 10 times vs. ‘normal HDs’ • Small capacity compared to HDs • They have several issues with write performance • Limited number of erase cycles • Need to write entire cells (issue for transactional activities) • Some workarounds for write performance and cell lifetime improvements are being implemented, different quality from different vendors and grade • A field in rapid evolution
Conclusions • Storage technologies are in a very interesting evolution phase • On one side ‘old-fashioned storage technologies’ give more capacity and performance for a lower price every year • New technologies are emerging for scaling out very large data sets (see Google, Amazon, Oracle’s ASM, SUN’s ZFS) • 10 Gbps Ethernet and SSD have the potential to change storage in the coming years (but are not mature yet)
Acknowledgments • Many thanks to Jacek, Dawid and Maria • Eric and Nilo • Helge, Tim Bell and Bernd