A Survey of I/O Optimization Techniques

A Survey of I/O OptimizationTechniques Sven GROOT Kitsuregawa Laboratory December 7th 2007

Background • Increase in CPU and memory speed not matched by disk drives • Increasing disk and data sizes • Disks limited by mechanical components: hard to improve • We must optimize I/O accesses.

Outline • Hard disk drives • Evaluating Optimizations Through the I/O Path [Riska et al, 2007] • File System Level • Device Driver Level • Disk Level • Prefetching • Competitive Prefetching[Li et al, 2007] • DiskSeen[Ding et al, 2007]

Hard Disk Drives

Hard Disk Drives Bottleneck! Disk rotation Track Sector Bottleneck! Head movement Disk head Delay = Head Seek Time + Rotational Latency

The I/O Path [Riska et al, 2007]

Evaluation Environment • Postmark file system benchmark • Measure transactions • Each transaction has two steps: create file or delete file and read file or append file.

File Systems • Ext2 • Block/cylinder groups • Single, double, or triple indirect metadata blocks • Ext3 • Compatible with Ext2 • Journaling filesystem • ReiserFS • Metadata in B+ trees • Journaling filesystem • XFS • Extent-based B+ trees. • Journaling filesystem • Allocation groups

File System Throughput • Reiser performs best • Efficient request merging • Efficient block allocation • Not much journaling overhead

DeviceDriver Level • Requestreordering/merging • Elevator algorithm • Sweep the disk, process all requests when the head passes the location • Shortest Seek Time First • Always process closest request • May lead to starvation

I/O Schedulers • No-Op • First-Come First-Serve algorithm; no reordering • Deadline • SSTF with aging to prevent starvation • Anticipatory (default) • Similar to deadline • Waits for better request under some circumstances • CFQ • Elevator algorithm • Gives each process equal I/O time

Scheduler Throughput • All outperform No-Op • Deadline performs best • No deceptive idleness in Postmark

Disk Drive Level • Requestreordering • Disk drives used:

Disk Drive Results – Throughput

Prefetching • Read data expected to be needed in the future • Prefetching reduces number of I/O switches between concurrent data streams • Optimal strategy: read exactly the data needed • Requires a-priori knowledge of the stream size • Aggressive prefetching: large prefetching depth • May fetch unnecessary data • Conservative prefetching: small prefetching depth • Too many I/O switches

Competitive Prefetching[Li et al,2007] • Prefetching depth: data that can be read during average I/O switch time • Guarantees time taken no more than twice that of optimal off-line strategy • Must measure I/O switch time and transfer rates

CompetitivePrefetching – Results

DiskSeen[Ding et al, 2007] • Problem: file level prefetching has disadvantages • File level sequentialitymaynotbepreserved at disk level • Inconvenientforrecordingaccessinformation • Inter-file sequentialitynotexploited • File metadata blocksnotprefetched • Solution: Block level prefetching • Uses disk logicalblocknumbers • Works next to file level prefetcher

DiskSeen • Sequence detection • Global counter, incremented every block access • Current counter value for block stored: access index • Sequence when access indices on sequential blocks grow uniformly • Prefetch when sequence detected

DiskSeen • History based prefetching • Keeps limited history of past access indices • Look for trails from current block • Unlike sequences, trails can skip blocks or go backwards • When history trail found, prefetch trail blocks 52002 52001 85000 74000 85001 85010 N/A N/A 63110 63111 63200 63290 N/A N/A 52000 48550 43501 43510 N/A N/A 43500 34950 35000 37000 B’3 B’2 B1 B2 B3 B4

DiskSeen OS Caching Area 2 4 Block access information Prefetched blocks On-demandread File-level prefetch Prefetching Move hit blocks Delayedwrite-back Prefetching Area 1 5 3 Hard Disk

DiskSeen – Results

DiskSeen – Results CVS Benchmark BeforeDiskSeen AfterDiskSeen

Conclusion • Disk performance remains bottleneck • Effective optimization opportunities exist at many levels • Active area of research • FS2 [Huang et al, 2005], Preemptive Scheduling [Dimitrijevic et al, 2005], Distributed File Systems [e.g. Weil et al, 2006], Idletime Scheduling [Eggert et al, 2005], etc. • Room for improvement • Use of application/domain knowledge in schedulers/prefetchers • Anticipatory scheduler improvements • Scheduling at disk level

A Survey of I/O Optimization Techniques

A Survey of I/O Optimization Techniques

Presentation Transcript

Chapter 2 Systems Techniques and Documentation

Particle Swarm Optimization (PSO)

Comprehensive Survey of Extraction Techniques of Linear Features from Remote Sensing Imagery for Updating Road Spatial D

Tutorial 11 Constrained optimization Lagrange Multipliers KKT Conditions

(Nonlinear) Multiobjective Optimization

TECHNIQUES OF INTEGRATION

Music-Inspired Optimization Algorithm Harmony Search

Chapter 22 Concurrency Control Techniques

Introduction to optimization

What is Search Engine Optimization (SEO)?

Survey Sampling - 2

Control, Optimization, and Functional Analysis

TECHNIQUES OF INTEGRATION

Lecture 9 Query Optimization

Robust Optimization and Applications

2007 CIRP Freshman Survey

AM18 ASA INTERNALS: QUERY EXECUTION AND OPTIMIZATION

2. C o nstrained Optimization

Search Engine Optimization (SEO)

Network Optimization

Survey Monkey – A “How To” Guide

TECHNIQUES OF INTEGRATION