270 likes | 364 Views
File Systems. Vivek Pai / Kai Li Princeton University. Gedank You!. What happens when hard drives hit 2000GB? What makes file system allocation different from memory allocation? Who is this man with the funky hair and what’s the connection? What does funky mean?
E N D
File Systems Vivek Pai / Kai Li Princeton University
Gedank You! • What happens when hard drives hit 2000GB? • What makes file system allocation different from memory allocation? • Who is this man with the funky hair and what’s the connection? • What does funky mean? • How long to read an entire disk?
Mechanics • Read more – 5.4-5.5 • Post-Princeton talk Wed night 8pm • Snacks volunteer? • No beer – I like my job • We’ve got a panel! • Quiz 5 ready - later • RAID discussion • Filesystem discussion
Memory Latency in 10’s of processor cycles Transfer rate 100+MB/s Contiguous allocation gains ~10x Disk Latency in milliseconds Transfer rate in 3-30MB/s Contiguous allocation gains ~1000x Disk Versus Memory
So What Makes Filesystems Hard? • Files grow and shrink • Little a priori knowledge • 6 orders of magnitude in file sizes • Overcoming disk performance behavior • Desire for efficiency • Coping with failure
Can Disks Be Bigger, Faster, Stronger? • Making individual disks larger is hard • Throw more disks at the problem • Capacity increases • Effective access speed may increase • Probability of failure also increases • Use some disks to provide redundancy • Generally assume a fail-stop model • Fail-stop versus Byzantine failures
Main idea Store the error correcting codes on other disks General error correcting codes are too powerful Use XORs or single parity Upon any failure, one can recover the entire block from the spare disk (or any disk) using XORs Pros Reliability High bandwidth Cons The controller is complex RAID (Redundant Array of Inexpensive Disks) RAID controller XOR
Synopsis of RAID Levels RAID Level 0: Non redundant (JBOD) RAID Level 1:Mirroring RAID Level 2:Byte-interleaved, ECC RAID Level 3:Byte-interleaved, parity RAID Level 4:Block-interleaved, parity RAID Level 5:Block-interleaved, distributed parity
Did RAID Work? • Performance: yes • Reliability: yes • Cost: no • Controller design complicated • Fewer economies of scale • High-reliability environments don’t care • Now also software implementations
History of Disk-related Concerns • When memory was expensive • Do as little bookkeeping as possible • When disks were expensive • Get every last sector of usable space • When disks became more common • Make them much more reliable • When processor got much faster • Make them appear faster
File System Components User • Disk management • Arrange collection of disk blocks into files • Naming • User gives file name, not track or sector number, to locate data • Security • Keep information secure • Reliability/durability • When system crashes, lose stuff in memory, but want files to be durable File Naming File access Disk management Disk drivers
Definitions • File descriptor (fd) – an integer used to represent a file – easier than using names • Metadata – bookkeeping data that describes the file or info about it • Open file table – system-wide list of descriptors in use • inode – index node, or a specific set of information kept about each file. Two forms – on disk and in memory
Data Structures for A Typical File System Process control block Open file table (systemwide) File Metadata File system info File metadata Directories Open file pointer array . . . File data
File name lookup and authentication Copy the file metadata into the in-memory data structure, if it is not in yet Create an entry in the open file table (system wide) if there isn’t one Create an entry in PCB Link up the data structures Return a pointer to user Opening A File fd = open( FileName, access) PCB Allocate & link up data structures Open file table File name lookup & authenticate Metadata File system on disk
From User to System View • What happens if user wants to read 10 bytes from a file starting at byte 2? • seek byte 2 • fetch the block • read 10 bytes • What happens if user wants to write 10 bytes to a file starting at byte 2? • seek byte 2 • fetch the block • write 10 bytes • write out the block
Reading A Block read( fd, userBuf, size ) PCB Open file table Get physical block to sysBuf copy to userBuf Metadata read( device, phyBlock, size ) Buffer cache Logical phyiscal Disk device driver
A Disk Layout for A File System • Superblock defines a file system • size of the file system • size of the file descriptor area • free list pointer, or pointer to bitmap • location of the file descriptor of the root directory • other meta-data such as permission and various times • For reliability, replicate the superblock Boot block Super block File metadata (i-node in Unix) File data blocks
File Usage Patterns • How do users access files? • Sequential: bytes read in order • Random: read/write element out of middle of arrays • Whole file or partial file • How are files used? • Most files are small • Large files use up most of the disk space • Large files account for most of the bytes transferred • Bad news • Need everything to be efficient
Data Structures for Disk Management • A “header” for each file (part of the file meta-data) • Disk sectors associated with each file • A data structure to represent free space on disk • Bit map • 1 bit per block (sector) • blocks numbered in cylinder-major order, why? • Linked list • Others? • How much space does a bit map need for a 4G disk?
Contiguous Allocation • Request in advance for the size of the file • Search bit map or linked list to locate a space • File header • first sector in file • number of sectors • Pros • Fast sequential access • Easy random access • Cons • External fragmentation • Hard to grow files
Single-Level Indexed Files orExtent-based Filesystems • A user declares max size • A file header holds an array of pointers to point to disk blocks • Pros • Can grow up to a limit • Random access is fast • Cons • Clumsy to grow beyond limit • Periodic cleanup of new files • Up-front declaration a real pain Disk blocks File header
Linked Files (Alto) • File header points to 1st block on disk • Each block points to next • Pros • Can grow files dynamically • Free list is similar to a file • Cons • random access: horrible • unreliable: losing a block means losing the rest File header . . . null
Approach A section of disk for each partition is reserved One entry for each block A file is a linked list of blocks A directory entry points to the 1st block of the file Pros Simple Cons Always go to FAT Wasting space File Allocation Table (FAT) 0 foo 217 217 619 399 EOF 619 399 FAT
Multi-Level Indexed Files (Unix) • 13 Pointers in a header • 10 direct pointers • 11: 1-level indirect • 12: 2-level indirect • 13: 3-level indirect • Pros & Cons • In favor of small files • Can grow • Limit is 16G and lots of seek • What happens to reach block 23, 5, 340? data data 1 2 . . . . . . data 11 12 13 . . . . . . data . . . . . . . . . data
Challenges • Unix filesystem has great flexibility • Extent-based filesystems have speed • Seeks kill performance – locality • Bitmaps show contiguous free space • Linked lists easy to search • How do you perform backup/restore?
DEMOS (Cray-1) (base,size) • Idea • Using contiguous allocation • Allow non-contiguous • Approach • 10 (base,size) pointers • Indirect for big files • Pros & cons • Can grow (max 10GB) • fragmentation • finding free blocks . . . data (base,size) . . . . . . data