  1. Systems Spring2017 Lecture 15: IO and FileSystems CSE120 Principles ofOperating

  2. First we’ll discuss properties of physicaldisks • Structure • Performance • Scheduling • Then we’ll discuss how we build file systems onthem • Files • Directories • Sharing • Protection • File SystemLayouts • File BufferCache • ReadAhead FileSystems CSE 120 – Lecture 12 – FileSystems

  3. Disks are messy physicaldevices: • Errors, bad blocks, missed seeks,etc. • The job of the OS is to hide this mess fromhigher • levelsoftware • Low-level device control (initiate a disk read,etc.) • Higher-level abstractions (files, databases,etc.) • The OS may provide different levels of disk accessto • differentclients • Physical disk (surface, cylinder,sector) • Logical disk (disk block#) • Logical file (file block, record, or byte#) Disks and theOS CSE 120 – Lecture 12 – FileSystems

  4. Diskcomponents • Platters • Surfaces • Tracks • Sectors • Cylinders • Arm • Heads Physical DiskStructure Track Sector Arm Surface Cylinder Heads Platter CSE 120 – Lecture 12 – FileSystems

  5. Specifying disk requests requires a lot ofinfo: • Cylinder #, surface #, track #, sector #, transfersize… • Older disks required the OS to specify all ofthis • The OS needed to know all diskparameters • Modern disks are morecomplicated • Not all sectors are the same size, sectors are remapped,etc. • Current disks provide a higher-level interface(SCSI) • The disk exports its data as a logical array of blocks[0…N] • » Disk maps logical blocks tocylinder/surface/track/sector • Only need to specify the logical block # toread/write • But now the disk parameters are hidden from theOS DiskInteraction CSE 120 – Lecture 12 – FileSystems

  6. Seagate Enterprise Performance 3.5"(server) • capacity: 600GB • rotational speed: 15,000RPM • sequential read performance: 233 MB/s (outer) – 160 MB/s(inner) • seek time (average): 2.0ms • Seagate Barracuda 3.5"(workstation) • capacity: 3000GB • rotational speed: 7,200 RPM • sequential read performance: 210 MB/s - 156 MB/s(inner) • seek time (average): 8.5ms • Seagate Savvio 2.5" (smaller formfactor) • capacity: 2000GB • rotational speed: 7,200 RPM • sequential read performance: 135 MB/s (outer) - ? MB/s(inner) • seek time (average): 11ms Modern DiskSpecifications CSE 120 – Lecture 12 – FileSystems

  7. Disk request performance depends upon threesteps • Seek – moving the disk arm to the correctcylinder • » Depends on how fast disk arm can move (increasing veryslowly) • Rotation – waiting for the sector to rotate under thehead • » Depends on rotation rate of disk (increasing, butslowly) • Transfer – transferring data from surface into diskcontroller electronics, sending it back to thehost • » Depends on density (increasingquickly) • When the OS uses the disk, it tries to minimize the cost of all of thesesteps • Particularly seeks androtation DiskPerformance CSE 120 – Lecture 12 – FileSystems

  8. SSDs are a relatively new storagetechnology • Memory that does not require power to rememberstate • No physical moving parts faster than harddisks • No seek and no rotationoverhead • But…more expensive, not as muchcapacity • Generally speaking, file systems have remained unchanged when usingSSDs • Some optimizations no longer necessary (e.g., layoutpolicies, • disk head scheduling), but basically leave FS code asis • Initially, SSDs have the same disk interface(SATA) • Increasingly, SSDs used directly over the I/O bus(PCIe) • » Much higherperformance Solid StateDisks CSE 120 – Lecture 12 – FileSystems

  9. Seek and rotational delays affect disk performance • SSDs don’t suffer these artifacts • But, SSDs do “write leveling”, which can slow writes as compared to reads • When will SSDs replace Disk drives? • In your laptop? Maybe already. • For your home PC, if you have one? Maybe already. • For massive bulk storage at Amazon, Google, etc? When they are cheaper per byte • As part of the memory hierarchy at Amazon, Google, etc? Already. SSDs vs Disks CSE 120 – Lecture 12 – FileSystems

  10. Because seeks are so expensive (milliseconds!), the OS tries to schedule disk requests that are queued waiting for thedisk • FCFS (donothing) • » Reasonable when load islow • » Long waiting times for long requestqueues • SSTF (shortest seek timefirst) • » Minimize arm movement (seek time), maximize requestrate • » Favors middleblocks • SCAN(elevator) • » Service requests in one direction until done, thenreverse • C-SCAN • » Like SCAN, but only go in one direction(typewriter) DiskScheduling CSE 120 – Lecture 12 – FileSystems

  11. Disk Scheduling(2) • In general, unless there are request queues,disk • scheduling does not have muchimpact • Important for servers, less so forPCs • Modern disks often do the disk scheduling themselves • Disks know their layout better than OS, can optimizebetter • Ignores, undoes any scheduling done byOS CSE 120 – Lecture 12 – FileSystems

  12. Filesystems • Implement an abstraction (files) for secondarystorage • Organize files logically(directories) • Permit sharing of data between processes, people,and machines • Protect data from unwanted access(security) FileSystems CSE 120 – Lecture 12 – FileSystems

  13. A file is data with some properties • Contents, size, owner, last read/write time, protection,etc. • A file can also have atype • Understood by the filesystem • » Block, character, device, portal, link,etc. • Understood by other parts of the OS or runtimelibraries • » Executable, dll, source, object, text,etc. • A file’s type can be encoded in its name orcontents • Windows encodes type inname • » .com, .exe, .bat, .dll, .jpg,etc. • Unix encodes type incontents • » Magic numbers, initial characters (e.g., #! for shellscripts) Files CSE 120 – Lecture 12 – FileSystems

  14. Unix • creat(name) • open(name,how) • read(fd, buf,len) • write(fd, buf,len) • sync(fd) • seek(fd,pos) • close(fd) • unlink(name) • CreateFile(name,CREATE) • CreateFile(name,OPEN) • ReadFile(handle,…) • WriteFile(handle,…) • FlushFileBuffers(handle,…) • SetFilePointer(handle,…) • CloseHandle(handle,…) • DeleteFile(name) • CopyFile(name) • MoveFile(name) Basic FileOperations Windows CSE 120 – Lecture 12 – FileSystems

  15. File AccessMethods • Some file systems provide different accessmethods • that specify different ways for accessing data in afile • Sequential access – read bytes one at a time, inorder • Direct access – random access given block/bytenumber • Record access – file is array of fixed- or variable-length records, read/written sequentially or randomly by record# • Indexed access – file system contains an index to a particular field of each record in a file, reads specify a value for thatfield and the system finds the record via the index(DBs) • What file access method does Unix, Windowsprovide? • Older systems provide the more complicatedmethods CSE 120 – Lecture 12 – FileSystems

  16. Directories serve two purposes • For users, they provide a structured way to organizefiles • For the file system, they provide a convenient naming interface that allows the implementation to separate logicalfile organization from physical file placement on thedisk • Most file systems support multi-leveldirectories • Naming hierarchies (/, /usr, /usr/local/,…) • Most file systems support the notion of acurrent • directory • Relative names specified with respect to currentdirectory • Absolute names start from the root of directorytree Directories CSE 120 – Lecture 12 – FileSystems

  17. A directory is a list of entries • <name,location> • Name is just the name of the file ordirectory • Location depends upon how file is represented ondisk • List is usually unordered (effectivelyrandom) • Entries usually sorted by program that readsdirectory • Directories typically stored infiles • Only need to manage one kind of secondary storageunit DirectoryInternals CSE 120 – Lecture 12 – FileSystems

  18. Unix • Directories implemented infiles • Use file ops to createdirs • C runtime library providesa higher-level abstraction for readingdirectories • opendir(name) • readdir(DIR) • seekdir(DIR) • closedir(DIR) Basic DirectoryOperations Windows • Explicit diroperations • CreateDirectory(name) • RemoveDirectory(name) • Very different methodfor • reading directoryentries • FindFirstFile(pattern) • FindNextFile() CSE 120 – Lecture 12 – FileSystems

  19. Let’s say you want to open“/one/two/three” • What does the file systemdo? • Open directory “/” (well known, can alwaysfind) • Search for the entry “one”, get location of “one” (in direntry) • Open directory “one”, search for “two”, get location of“two” • Open directory “two”, search for “three”, get location of“three” • Open file“three” • Systems spend a lot of time walking directorypaths • This is why open is separate fromread/write • OS will cache prefix lookups forperformance • » /a/b, /a/bb, /a/bbb, etc., all share “/a”prefix Path NameTranslation CSE 120 – Lecture 12 – FileSystems

  20. File sharing has been around sincetimesharing • Easy to do on a singlemachine • PCs, workstations, and networks get us there(mostly) • File sharing is important for getting workdone • Basis for communication andsynchronization • Two key issues when sharingfiles • Semantics of concurrentaccess • » What happens when one process reads while anotherwrites? • » What happens when two processes open a file forwriting? • » What are we going to use tocoordinate? • Protection FileSharing CSE 120 – Lecture 12 – FileSystems

  21. File systems implement a protectionsystem • Who can access afile • How they can accessit • Moregenerally… • Objects are “what”, subjects are “who”, actions are“how” • A protection system dictates whether a given action performed by a given subject on a given object should beallowed • You can read and/or write your files, but otherscannot • You can read “/etc/motd”, but you cannot writeit Protection CSE 120 – Lecture 12 – FileSystems

  22. Access Control Lists(ACL) • For each object, maintain alist of subjects and their permitted actions RepresentingProtection Capabilities • For each subject, maintain alist of objects and their permitted actions Objects Subjects Capability ACL CSE 120 – Lecture 12 – FileSystems

  23. Setuid • Show setuidbit • Root/sudo/administrator • Pic of ls-l CSE 120 – Lecture 12 – FileSystems

  24. The approaches differ only in how the tableis • represented • What approach does Unix use in theFS? • Capabilities are easier totransfer • They are like keys, can handoff, does not depend onsubject • In practice, ACLs are easier tomanage • Object-centric, easy to grant,revoke • To revoke capabilities, have to keep track of all subjectsthat • have the capability – a challengingproblem • ACLs have a problem when objects are heavilyshared • The ACLs become verylarge • Use groups (e.g.,Unix) ACLs andCapabilities CSE 120 – Lecture 12 – FileSystems

  25. How do file systems use the disk to storefiles? • File systems define a block size (e.g.,4KB) • Disk space is allocated in granularity ofblocks • A “Master Block” determines location of rootdirectory • Always at a well-known disklocation • Often replicated across disk forreliability • A free map determines which blocks are free,allocated • Usually a bitmap, one bit per block on thedisk • Also stored on disk, cached in memory forperformance • Remaining disk blocks used to store files (anddirs) • There are many ways to dothis File SystemLayout CSE 120 – Lecture 12 – FileSystems

  26. Files span multiple diskblocks • How do you find all of the blocks for afile? • Contiguousallocation • » Likememory • » Fast, simplifies directoryaccess • » Inflexible, causes fragmentation, needscompaction • Linkedstructure • » Each block points to the next, directory points to thefirst • » Good for sequential access, bad for allothers • Indexed structure (indirection,hierarchy) • » An “index block” contains pointers to many otherblocks • » Handles random better, still good forsequential • » May need multiple index blocks (linkedtogether) Disk LayoutStrategies CSE 120 – Lecture 12 – FileSystems

  27. Unix inodes implement an indexed structure forfiles • Also store metadata info (protection, timestamps, length, refcount…) • Each inode contains 15 blockpointers • First 12 are direct blocks (e.g., 4 KBblocks) • Then single, double, and tripleindirect UnixInodes (Metadata) 0 1 … … 12 13 14 (1) (2) (3) … … CSE 120 – Lecture 12 – FileSystems

  28. Unix inodes are notdirectories • Inodes describe where on the disk the blocks for a file areplaced • Directories are files, so inodes also describe where theblocks for directories are placed on thedisk • Directory entries map file names toinodes • To open “/one”, use Master Block to find inode for “/” ondisk • Open “/”, look for entry for“one” • This entry gives the disk block number for the inode for“one” • Read the inode for “one” intomemory • The inode says where first data block is ondisk • Read that block into memory to access the data in thefile Unix Inodes and Path Search CSE 120 – Lecture 12 – FileSystems

  29. File BufferCache • Applications exhibit significant locality for readingand • writingfiles • Idea: Cache file blocks in memory to capturelocality • Called the file buffercache • Cache is system wide, used and shared by allprocesses • Reading from the cache makes a disk perform likememory • Even a small cache can be veryeffective • Issues • The file buffer cache competes with VM (tradeoffhere) • Like VM, it has limitedsize • Need replacement algorithms again (LRU usuallyused) CSE 120 – Lecture 12 – FileSystems

  30. On a write, some applications assume thatdata • makes it through the buffer cache and onto thedisk • As a result, writes are often slow even withcaching • OSes typically do write backcaching • Maintain a queue of uncommittedblocks • Periodically flush the queue to disk (30 secondthreshold) • If blocks changed many times in 30 secs, only need oneI/O • If blocks deleted before 30 secs (e.g., /tmp), no I/Osneeded • Unreliable, butpractical • On a crash, all writes within last 30 secs arelost • Modern OSes do this by default; too slowotherwise • System calls (Unix: fsync) enable apps to force data todisk CachingWrites CSE 120 – Lecture 12 – FileSystems

  31. Many file systems implement “readahead” • FS predicts that the process will request nextblock • FS goes ahead and requests it from thedisk • This can happen while the process is computing onprevious block • » Overlap I/O withexecution • When the process requests block, it will be incache • Compliments the disk cache, which also is doing readahead • For sequentially accessed files can be a bigwin • Unless blocks for the file are scattered across thedisk • File systems try to prevent that, though (duringallocation) ReadAhead CSE 120 – Lecture 12 – FileSystems

  32. Files • Operations, accessmethods • Directories • Operations, using directories to do pathsearches • Sharing • Protection • ACLs vs.capabilities • File SystemLayouts • Unixinodes • File BufferCache • Strategies for handlingwrites • ReadAhead Summary CSE 120 – Lecture 12 – FileSystems

  33. Nexttime… • Read Chapters 11.8,12.7 CSE 120 – Lecture 12 – FileSystems

