Secondary Storage

Secondary Storage CSCI 444/544 Operating Systems Fall 2008

Agenda • Overview of secondary storage (disks) • Disk structure • Disk performance • Disk scheduling • Disk management • RAID (Redundant Arrays of Inexpensive Disks)

Secondary Storage • Secondary storage typically: • is anything that is outside of “primary memory” • does not permit direct execution of instructions or data retrieval via machine load/store instructions • Characteristics: • it’s large: 200-1000GB • it’s cheap: $0.45/GB • it’s persistent: data survives power loss • it’s slow: milliseconds to access

Disk trends • Disk capacity, 1975-1989 • doubled every 3+ years • 25% improvement each year • factor of 10 every decade • Still exponential, but far less rapid than processor performance • Disk capacity since 1990 • doubling every 12 months • 100% improvement each year • factor of 1000 every decade • 10x as fast as the increase of processor performance!

Memory Hierarchy <1KB CPU registers <1ns • Each level acts as a cache of lower levels 64KB L1 cache 1 ns 4MB L2 cache 4 ns 2GB Primary Memory 10 ns 1000GB Secondary Storage 10 ms Tertiary Storage 1s-1hr 1-1000TB

Disks and the OS • Disks are messy, messy devices • errors, bad blocks, missed seeks, etc. • Job of OS is to hide this mess from higher-level software • low-level device drivers (initiate a disk read, etc.) • higher-level abstractions (files, databases, etc.) • OS may provide different levels of disk access to different clients • physical disk block (surface, cylinder, sector) • disk logical block (disk block #) • file logical (filename, block or record or byte #)

Physical Disk Structure

Disk Controller • Responsible for interface between OS and disk drive • Common interfaces: ATA/IDE vs. SCSI • ATA/IDE used for personal storage • SCSI for enterprise-class storage • Basic operations • Read block • Write block • OS does not know of internal complexity of disk • Disk exports array of Logical Block Numbers (LBNs) • Disks map internal sectors to LBNs

Disk Operations • Disk performance depends on a number of operations • seek: moving the disk arm (head) to the correct cylinder • depends on how fast disk arm can move • rotation (latency): waiting for the sector to rotate under head • depends on rotation rate of disk • transfer: sequentially moving data from surface into disk controller, and from there sending it back to host • depends on density of bytes on disk • When the OS uses the disk, it tries to minimize the cost of all of these operations • particularly seeks and rotation

Disk Performance • Positioning (head): Seek + Rotation • Positioning time: Seek time + Rotational Delay • How long to read or write n sectors? • Positioning time + Transfer time (n) • Implicit contract: • Large sequential accesses to contiguous LBNs achieve much better performance than small transfers or random accesses

Disk Scheduling • Goal: Minimize positioning time • FCFS: Schedule requests in order received • Advantage: Fair • Disadvantage: High seek cost and rotation • Shortest seek time first (SSTF): • Handle nearest cylinder next • Advantage: Reduces arm movement (seek time) • Disadvantage: Unfair, can starve some requests

FCFS

SSTF

Disk Scheduling (II) • SCAN (elevator algorithm) • move arm from one end toward the other end • service requests until reach the other end, then reverse • skews wait times non-uniformly • C-SCAN (Circular-Scan) • Like scan, but only go in one direction, then start over again (typewriter) • uniform wait times • LOOK and C-LOOK • similar to SCAN and C-SCAN, except stop at the last request • look for a request before continue to move in a give direction

SCAN

C-SCAN

C-LOOK

Disk Management • Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk controller can read and write. • To use a disk to hold files, the operating system still needs to record its own data structures on the disk. • Partition the disk into one or more groups of cylinders. • Logical formatting or “making a file system”. • Boot block initializes system. • Bootstrap loader program in ROM. • the full bootstrap program is stored in the “boot block” at the fixed location on the disk.

Reliability • Disks fail more often.... • When continuously powered-on • With heavy workloads • Under high temperatures • How do disks fail? • Whole disk can stop working (e.g., motor dies) • Transient problem (cable disconnected) • Individual sectors can fail (e.g., head crash or scratch) • Data can be corrupted or block not readable/writable • Disks can internally fix some sector problems • ECC (error correction code): Detect/correct bit flips • Retry sector reads and writes: Try 20-30 different offset and timing combinations for heads • Remap sectors: Do not use bad sectors in future

RAID • RAID: multiple disk drives provide reliability via redundancy • Performance: parallel access • Capacity: store more data • Disk striping uses a group of disks as one storage unit. • RAID schemes improve performance and improve the reliability of the storage system by storing redundant data. • Mirroring or shadowing keeps duplicate of each disk. • Block interleaved parity uses much less redundancy. • RAID turns multiple disks into one bigger, faster, more reliable disk

Secondary Storage