330 likes | 348 Views
This lecture discusses the structure, performance, and scheduling of physical disks, as well as the building of file systems, including files, directories, sharing, and protection. It also covers disk components and specifications, and explains disk interaction and performance. Lastly, it explores the differences between SSDs and disks, and the various disk scheduling techniques.
E N D
Systems Spring2017 Lecture 15: IO and FileSystems CSE120 Principles ofOperating
First we’ll discuss properties of physicaldisks • Structure • Performance • Scheduling • Then we’ll discuss how we build file systems onthem • Files • Directories • Sharing • Protection • File SystemLayouts • File BufferCache • ReadAhead FileSystems CSE 120 – Lecture 12 – FileSystems
Disks are messy physicaldevices: • Errors, bad blocks, missed seeks,etc. • The job of the OS is to hide this mess fromhigher • levelsoftware • Low-level device control (initiate a disk read,etc.) • Higher-level abstractions (files, databases,etc.) • The OS may provide different levels of disk accessto • differentclients • Physical disk (surface, cylinder,sector) • Logical disk (disk block#) • Logical file (file block, record, or byte#) Disks and theOS CSE 120 – Lecture 12 – FileSystems
Diskcomponents • Platters • Surfaces • Tracks • Sectors • Cylinders • Arm • Heads Physical DiskStructure Track Sector Arm Surface Cylinder Heads Platter CSE 120 – Lecture 12 – FileSystems
Specifying disk requests requires a lot ofinfo: • Cylinder #, surface #, track #, sector #, transfersize… • Older disks required the OS to specify all ofthis • The OS needed to know all diskparameters • Modern disks are morecomplicated • Not all sectors are the same size, sectors are remapped,etc. • Current disks provide a higher-level interface(SCSI) • The disk exports its data as a logical array of blocks[0…N] • » Disk maps logical blocks tocylinder/surface/track/sector • Only need to specify the logical block # toread/write • But now the disk parameters are hidden from theOS DiskInteraction CSE 120 – Lecture 12 – FileSystems
Seagate Enterprise Performance 3.5"(server) • capacity: 600GB • rotational speed: 15,000RPM • sequential read performance: 233 MB/s (outer) – 160 MB/s(inner) • seek time (average): 2.0ms • Seagate Barracuda 3.5"(workstation) • capacity: 3000GB • rotational speed: 7,200 RPM • sequential read performance: 210 MB/s - 156 MB/s(inner) • seek time (average): 8.5ms • Seagate Savvio 2.5" (smaller formfactor) • capacity: 2000GB • rotational speed: 7,200 RPM • sequential read performance: 135 MB/s (outer) - ? MB/s(inner) • seek time (average): 11ms Modern DiskSpecifications CSE 120 – Lecture 12 – FileSystems
Disk request performance depends upon threesteps • Seek – moving the disk arm to the correctcylinder • » Depends on how fast disk arm can move (increasing veryslowly) • Rotation – waiting for the sector to rotate under thehead • » Depends on rotation rate of disk (increasing, butslowly) • Transfer – transferring data from surface into diskcontroller electronics, sending it back to thehost • » Depends on density (increasingquickly) • When the OS uses the disk, it tries to minimize the cost of all of thesesteps • Particularly seeks androtation DiskPerformance CSE 120 – Lecture 12 – FileSystems
SSDs are a relatively new storagetechnology • Memory that does not require power to rememberstate • No physical moving parts faster than harddisks • No seek and no rotationoverhead • But…more expensive, not as muchcapacity • Generally speaking, file systems have remained unchanged when usingSSDs • Some optimizations no longer necessary (e.g., layoutpolicies, • disk head scheduling), but basically leave FS code asis • Initially, SSDs have the same disk interface(SATA) • Increasingly, SSDs used directly over the I/O bus(PCIe) • » Much higherperformance Solid StateDisks CSE 120 – Lecture 12 – FileSystems
Seek and rotational delays affect disk performance • SSDs don’t suffer these artifacts • But, SSDs do “write leveling”, which can slow writes as compared to reads • When will SSDs replace Disk drives? • In your laptop? Maybe already. • For your home PC, if you have one? Maybe already. • For massive bulk storage at Amazon, Google, etc? When they are cheaper per byte • As part of the memory hierarchy at Amazon, Google, etc? Already. SSDs vs Disks CSE 120 – Lecture 12 – FileSystems
Because seeks are so expensive (milliseconds!), the OS tries to schedule disk requests that are queued waiting for thedisk • FCFS (donothing) • » Reasonable when load islow • » Long waiting times for long requestqueues • SSTF (shortest seek timefirst) • » Minimize arm movement (seek time), maximize requestrate • » Favors middleblocks • SCAN(elevator) • » Service requests in one direction until done, thenreverse • C-SCAN • » Like SCAN, but only go in one direction(typewriter) DiskScheduling CSE 120 – Lecture 12 – FileSystems
Disk Scheduling(2) • In general, unless there are request queues,disk • scheduling does not have muchimpact • Important for servers, less so forPCs • Modern disks often do the disk scheduling themselves • Disks know their layout better than OS, can optimizebetter • Ignores, undoes any scheduling done byOS CSE 120 – Lecture 12 – FileSystems
Filesystems • Implement an abstraction (files) for secondarystorage • Organize files logically(directories) • Permit sharing of data between processes, people,and machines • Protect data from unwanted access(security) FileSystems CSE 120 – Lecture 12 – FileSystems
A file is data with some properties • Contents, size, owner, last read/write time, protection,etc. • A file can also have atype • Understood by the filesystem • » Block, character, device, portal, link,etc. • Understood by other parts of the OS or runtimelibraries • » Executable, dll, source, object, text,etc. • A file’s type can be encoded in its name orcontents • Windows encodes type inname • » .com, .exe, .bat, .dll, .jpg,etc. • Unix encodes type incontents • » Magic numbers, initial characters (e.g., #! for shellscripts) Files CSE 120 – Lecture 12 – FileSystems
Unix • creat(name) • open(name,how) • read(fd, buf,len) • write(fd, buf,len) • sync(fd) • seek(fd,pos) • close(fd) • unlink(name) • CreateFile(name,CREATE) • CreateFile(name,OPEN) • ReadFile(handle,…) • WriteFile(handle,…) • FlushFileBuffers(handle,…) • SetFilePointer(handle,…) • CloseHandle(handle,…) • DeleteFile(name) • CopyFile(name) • MoveFile(name) Basic FileOperations Windows CSE 120 – Lecture 12 – FileSystems
File AccessMethods • Some file systems provide different accessmethods • that specify different ways for accessing data in afile • Sequential access – read bytes one at a time, inorder • Direct access – random access given block/bytenumber • Record access – file is array of fixed- or variable-length records, read/written sequentially or randomly by record# • Indexed access – file system contains an index to a particular field of each record in a file, reads specify a value for thatfield and the system finds the record via the index(DBs) • What file access method does Unix, Windowsprovide? • Older systems provide the more complicatedmethods CSE 120 – Lecture 12 – FileSystems
Directories serve two purposes • For users, they provide a structured way to organizefiles • For the file system, they provide a convenient naming interface that allows the implementation to separate logicalfile organization from physical file placement on thedisk • Most file systems support multi-leveldirectories • Naming hierarchies (/, /usr, /usr/local/,…) • Most file systems support the notion of acurrent • directory • Relative names specified with respect to currentdirectory • Absolute names start from the root of directorytree Directories CSE 120 – Lecture 12 – FileSystems
A directory is a list of entries • <name,location> • Name is just the name of the file ordirectory • Location depends upon how file is represented ondisk • List is usually unordered (effectivelyrandom) • Entries usually sorted by program that readsdirectory • Directories typically stored infiles • Only need to manage one kind of secondary storageunit DirectoryInternals CSE 120 – Lecture 12 – FileSystems
Unix • Directories implemented infiles • Use file ops to createdirs • C runtime library providesa higher-level abstraction for readingdirectories • opendir(name) • readdir(DIR) • seekdir(DIR) • closedir(DIR) Basic DirectoryOperations Windows • Explicit diroperations • CreateDirectory(name) • RemoveDirectory(name) • Very different methodfor • reading directoryentries • FindFirstFile(pattern) • FindNextFile() CSE 120 – Lecture 12 – FileSystems
Let’s say you want to open“/one/two/three” • What does the file systemdo? • Open directory “/” (well known, can alwaysfind) • Search for the entry “one”, get location of “one” (in direntry) • Open directory “one”, search for “two”, get location of“two” • Open directory “two”, search for “three”, get location of“three” • Open file“three” • Systems spend a lot of time walking directorypaths • This is why open is separate fromread/write • OS will cache prefix lookups forperformance • » /a/b, /a/bb, /a/bbb, etc., all share “/a”prefix Path NameTranslation CSE 120 – Lecture 12 – FileSystems
File sharing has been around sincetimesharing • Easy to do on a singlemachine • PCs, workstations, and networks get us there(mostly) • File sharing is important for getting workdone • Basis for communication andsynchronization • Two key issues when sharingfiles • Semantics of concurrentaccess • » What happens when one process reads while anotherwrites? • » What happens when two processes open a file forwriting? • » What are we going to use tocoordinate? • Protection FileSharing CSE 120 – Lecture 12 – FileSystems
File systems implement a protectionsystem • Who can access afile • How they can accessit • Moregenerally… • Objects are “what”, subjects are “who”, actions are“how” • A protection system dictates whether a given action performed by a given subject on a given object should beallowed • You can read and/or write your files, but otherscannot • You can read “/etc/motd”, but you cannot writeit Protection CSE 120 – Lecture 12 – FileSystems
Access Control Lists(ACL) • For each object, maintain alist of subjects and their permitted actions RepresentingProtection Capabilities • For each subject, maintain alist of objects and their permitted actions Objects Subjects Capability ACL CSE 120 – Lecture 12 – FileSystems
Setuid • Show setuidbit • Root/sudo/administrator • Pic of ls-l CSE 120 – Lecture 12 – FileSystems
The approaches differ only in how the tableis • represented • What approach does Unix use in theFS? • Capabilities are easier totransfer • They are like keys, can handoff, does not depend onsubject • In practice, ACLs are easier tomanage • Object-centric, easy to grant,revoke • To revoke capabilities, have to keep track of all subjectsthat • have the capability – a challengingproblem • ACLs have a problem when objects are heavilyshared • The ACLs become verylarge • Use groups (e.g.,Unix) ACLs andCapabilities CSE 120 – Lecture 12 – FileSystems
How do file systems use the disk to storefiles? • File systems define a block size (e.g.,4KB) • Disk space is allocated in granularity ofblocks • A “Master Block” determines location of rootdirectory • Always at a well-known disklocation • Often replicated across disk forreliability • A free map determines which blocks are free,allocated • Usually a bitmap, one bit per block on thedisk • Also stored on disk, cached in memory forperformance • Remaining disk blocks used to store files (anddirs) • There are many ways to dothis File SystemLayout CSE 120 – Lecture 12 – FileSystems
Files span multiple diskblocks • How do you find all of the blocks for afile? • Contiguousallocation • » Likememory • » Fast, simplifies directoryaccess • » Inflexible, causes fragmentation, needscompaction • Linkedstructure • » Each block points to the next, directory points to thefirst • » Good for sequential access, bad for allothers • Indexed structure (indirection,hierarchy) • » An “index block” contains pointers to many otherblocks • » Handles random better, still good forsequential • » May need multiple index blocks (linkedtogether) Disk LayoutStrategies CSE 120 – Lecture 12 – FileSystems
Unix inodes implement an indexed structure forfiles • Also store metadata info (protection, timestamps, length, refcount…) • Each inode contains 15 blockpointers • First 12 are direct blocks (e.g., 4 KBblocks) • Then single, double, and tripleindirect UnixInodes (Metadata) 0 1 … … 12 13 14 (1) (2) (3) … … CSE 120 – Lecture 12 – FileSystems
Unix inodes are notdirectories • Inodes describe where on the disk the blocks for a file areplaced • Directories are files, so inodes also describe where theblocks for directories are placed on thedisk • Directory entries map file names toinodes • To open “/one”, use Master Block to find inode for “/” ondisk • Open “/”, look for entry for“one” • This entry gives the disk block number for the inode for“one” • Read the inode for “one” intomemory • The inode says where first data block is ondisk • Read that block into memory to access the data in thefile Unix Inodes and Path Search CSE 120 – Lecture 12 – FileSystems
File BufferCache • Applications exhibit significant locality for readingand • writingfiles • Idea: Cache file blocks in memory to capturelocality • Called the file buffercache • Cache is system wide, used and shared by allprocesses • Reading from the cache makes a disk perform likememory • Even a small cache can be veryeffective • Issues • The file buffer cache competes with VM (tradeoffhere) • Like VM, it has limitedsize • Need replacement algorithms again (LRU usuallyused) CSE 120 – Lecture 12 – FileSystems
On a write, some applications assume thatdata • makes it through the buffer cache and onto thedisk • As a result, writes are often slow even withcaching • OSes typically do write backcaching • Maintain a queue of uncommittedblocks • Periodically flush the queue to disk (30 secondthreshold) • If blocks changed many times in 30 secs, only need oneI/O • If blocks deleted before 30 secs (e.g., /tmp), no I/Osneeded • Unreliable, butpractical • On a crash, all writes within last 30 secs arelost • Modern OSes do this by default; too slowotherwise • System calls (Unix: fsync) enable apps to force data todisk CachingWrites CSE 120 – Lecture 12 – FileSystems
Many file systems implement “readahead” • FS predicts that the process will request nextblock • FS goes ahead and requests it from thedisk • This can happen while the process is computing onprevious block • » Overlap I/O withexecution • When the process requests block, it will be incache • Compliments the disk cache, which also is doing readahead • For sequentially accessed files can be a bigwin • Unless blocks for the file are scattered across thedisk • File systems try to prevent that, though (duringallocation) ReadAhead CSE 120 – Lecture 12 – FileSystems
Files • Operations, accessmethods • Directories • Operations, using directories to do pathsearches • Sharing • Protection • ACLs vs.capabilities • File SystemLayouts • Unixinodes • File BufferCache • Strategies for handlingwrites • ReadAhead Summary CSE 120 – Lecture 12 – FileSystems
Nexttime… • Read Chapters 11.8,12.7 CSE 120 – Lecture 12 – FileSystems