1 / 104

Introduction to File Systems

Introduction to File Systems. File System Issues. What is the role of files? What is the file abstraction? File naming. How to find the file we want? Sharing files. Controlling access to files.

Download Presentation

Introduction to File Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to File Systems

  2. File System Issues • What is the role of files? What is the file abstraction? • File naming. How to find the file we want?Sharing files. Controlling access to files. • Performance issues - how to deal with the bottleneck of disks? What is the “right” way to optimize file access?

  3. Role of Files • Persistence - long-lived - data for posterity • non-volatile storage media • semantically meaningful (memorable) names

  4. Abstractions User view Addressbook, record for Duke CPS Application addrfile ->fid, byte range* fid File System bytes block# device, block # Disk Subsystem surface, cylinder, sector

  5. *File Abstractions • UNIX-like files • Sequence of bytes • Operations: open (create), close, read, write, seek • Memory mapped files • Sequence of bytes • Mapped into address space • Page fault mechanism does data transfer • Named, Possibly typed

  6. User grp others rwx rwx rwx 111 100 000 O_RDONLYO_WRONLY O_RDWR O_CREAT O_APPEND ... Relative to beginning, current position, end of file Unix File Syscalls int fd, num, success, bufsize; char data[bufsize]; long offset, pos; fd = open (filename, mode [,permissions]); success = close (fd); pos = lseek (fd, offset, mode); num = read (fd, data, bufsize); num = write (fd, data, bufsize);

  7. R, W, X, none Shared, Private, Fixed, Noreserve Memory Mapped Files fd = open (somefile, consistent_mode); pa = mmap(addr, len, prot, flags, fd, offset); fd + offset pa len len VAS Reading performed by Load instr.

  8. Nachos File Syscalls/Operations Create(“zot”); OpenFileId fd; fd = Open(“zot”); Close(fd); char data[bufsize]; Write(data, count, fd); Read(data, count, fd); Limitations: 1. small, fixed-size files and directories 2. single disk with a single directory 3. stream files only: no seek syscall 4. file size is specified at creation time 5. no access control, etc.

  9. Functions of File System • Determine layout of files and metadata on disk in terms of blocks. Disk block allocation. Bad blocks. • Handle read and write system calls • Initiate I/O operations for movement of blocks to/from disk. • Maintain buffer cache

  10. r-w pos, mode r-w pos, mode pos pos File System Data Structures System-wide Open file table System-wide File descriptor table Process descriptor in-memory copy of inode ptr to on-disk inode File data stdin stdout per-process file ptr array stderr

  11. Data Block Addr ... File Attributes ... ... ... ... ... UNIX Inodes 3 3 3 3 Data blocks Block Addr 1 2 2 ... Decoupling meta-data from directory entries 1 2 2 1

  12. File Sharing Between Parent/Child main(int argc, char *argv[]) { char c; int fdrd, fdwt, fdpriv; if ((fdrd = open(argv[1], O_RDONLY)) == -1) exit(1); if ((fdwt = creat([argv[2], 0666)) == -1) exit(1); fork(); if ((fdpriv = open([argv[3], O_RDONLY)) == -1) exit(1); while (TRUE) { if (read(fdrd, &c, 1) != 1) exit(0); write(fdwt, &c, 1); } }

  13. r-w pos, mode r-w pos, mode forked process’s Process descriptor openafterfork File System Data Structures System-wide Open file table System-wide File descriptor table Process descriptor in-memory copy of inode ptr to on-disk inode stdin stdout per-process file ptr array stderr

  14. user ID process ID process group ID parent PID signal state siblings children user ID process ID process group ID parent PID signal state siblings children Sharing Open File Instances shared seek offset in shared file table entry parent shared file (inode or vnode) child system open file table process file descriptors process objects

  15. Directory Subsystem • Map filenames to fileids: open (create) syscall. Create kernel data structures. • Maintain naming structure (unlink, mkdir, rmdir)

  16. File Attributes File Attributes Directory node Directory node File Attributes current inode# Proj inode# File Attributes Proj Directory node proj3 inode# Pathname Resolution cps110 “cps110/current/Proj/proj3” current proj3 data file index node of wd

  17. open(/foo/bar/file);read(fd,buf,sizeof(buf)); read(fd,buf,sizeof(buf)); read(fd,buf,sizeof(buf)); close(fd); read(rootdir); read(inode); read(foo);read(inode); read(bar); read(inode); read(filedatablock); read(rootdir); read(inode); read(foo);read(inode); read(bar); read(inode); read(fd,buf,sizeof(buf)); read(fd,buf,sizeof(buf)); read(fd,buf,sizeof(buf)); Access Patterns Along the Way Proc DirSubsys DeviceSubsys File Sys

  18. Functions of Device Subsystem In general, deal with device characteristics • Translate block numbers (the abstraction of device shown to file system) to physical disk addresses. Device specific intelligent placement of blocks. (subject to change with upgrades in technology) • Schedule (reorder?) disk operations

  19. What to do about Disks? • Avoid them altogether! Caching • Disk scheduling • Idea is to reorder outstanding requests to minimize seeks. • Layout on disk • Placement to minimize disk overhead • Build a better disk (or substitute) • Example: RAID

  20. Disk Scheduling for Requests • Minimize seek AND rotational delay • Maintain a queue of requests on EACH cylinder • Sort each queue by sector (rotational position) • At one cylinder process request based on rotational position to minimize rotational delay • Move from cylinder when all requests on old cylinder are processed, in old direction, until no request remain in that direction. • “Elevator” algorithm

  21. Redundant Array of Inexpensive Disks • Parallel seeks and data transmission • Each record is “smeared” across disks so that some bytes come from each disk • All disks positioned at same cylinder, track • Add extra redundant disk, for error check • Reading one file is much faster, if no seek needed • Error correction possible, so re-reading on error not needed

  22. File Naming

  23. Goals of File Naming • Foremost function - to find files (e.g., in open() ), Map file name to file object. • To store meta-data about files. • To allow users to choose their own file names without undue name conflict problems. • To allow sharing. • Convenience: short names, groupings. • To avoid implementation complications

  24. Possible Naming Structures • Flat name space - 1 system-wide table, • Unique naming with multiple users is hard.Name conflicts. • Easy sharing, need for protection • Per-user name space • Protection by isolation, no sharing • Easy to avoid name conflicts • Associate process with directory to use to resolve names, allow user to change this “current working directory” (cd)

  25. Naming Structures Naming network • Component names - pathnames • Absolute pathnames - from a designated root • Relative pathnames - from a working directory • Each name carries how to resolve it. • Could allow defining short names for files anywhere in the network. This might produce cycles, but it makes naming things more convenient.

  26. Full Naming Network* Terry A • /Jamie/lynn/project/D • /Jamie/d • /Jamie/lynn/jam/proj1/C • (relative from Terry)A • (relative from Jamie)d grp1 root Lynn TA Jamie lynn project jam B proj1 d D E C D project * not Unix

  27. Full Naming Network* Terry A • /Jamie/lynn/project/D • /Jamie/d • /Jamie/lynn/jam/proj1/C • (relative from Terry)A • (relative from Jamie)d grp1 root Lynn TA Jamie lynn project jam B proj1 d D E C D project Why? * Unix

  28. File size File type Protection - access control information History: creation time, last modification,last access. Location of file - which device Location of individual blocks of the file on disk. Owner of file Group(s) of users associated with file Meta-Data

  29. Restricting to a Hierarchy • Problems with full naming network • What does it mean to “delete” a file? • Meta-data interpretation

  30. Operations on Directories (UNIX) • link (oldpathname, newpathname) - make entry pointing to file • unlink (filename) - remove entry pointing to file • mknod (dirname, type, device) - used (e.g. by mkdir utility function) to create a directory (or named pipe, or special file) • getdents(fd, buf, structsize) - reads dir entries

  31. Reclaiming Storage Terry A grp1 X root Jo X TA Series of unlinks X Jamie joe project jam B proj1 d D E What should be dealloc? C D project

  32. Reclaiming Storage Terry A grp1 X root Jo X TA Series of unlinks X Jamie joe project jam B proj1 d D E C D project

  33. Reference Counting? Terry A 2 grp1 X root Jo X TA 3 1 Series of unlinks X Jamie joe 1 2 project jam B proj1 d D E 2 C D project

  34. Garbage Collection * Terry Phase 1 marking A * Phase 2 collect grp1 X root * Jo X TA Series of unlinks X Jamie joe project jam B proj1 d D E C D project

  35. Restricting to a Hierarchy • Problems with full naming network • What does it mean to “delete” a file? • Meta-data interpretation • Eliminating cycles • allows use of reference counts for reclaiming file space • avoids garbage collection

  36. Given: Naming Hierarchy (because of implementation issues) / bin etc tmp usr vmunix ls sh project users packages mount point (volume root) leaf tex emacs

  37. A Typical Unix File Tree Each volume is a set of directories and files; a host’s file tree is the set of directories and files visible to processes on a given host. / File trees are built by grafting volumes from different devices or from network servers. bin etc tmp usr vmunix In Unix, the graft operation is the privileged mount system call, and each volume is a filesystem. ls sh project users packages mount point coveredDir • mount (coveredDir, volume) • coveredDir: directory pathname • volume: device • volume root contents become visible at pathname coveredDir

  38. A Typical Unix File Tree Each volume is a set of directories and files; a host’s file tree is the set of directories and files visible to processes on a given host. / File trees are built by grafting volumes from different devices or from network servers. bin etc tmp usr vmunix In Unix, the graft operation is the privileged mount system call, and each volume is a filesystem. ls sh project users packages mount point (volume root) coveredDir mount (coveredDir, volume) tex emacs /usr/project/packages/coveredDir/emacs

  39. A Typical Unix File Tree Each volume is a set of directories and files; a host’s file tree is the set of directories and files visible to processes on a given host. / File trees are built by grafting volumes from different devices or from network servers. bin etc tmp usr vmunix In Unix, the graft operation is the privileged mount system call, and each volume is a filesystem. ls sh project users packages mount point • mount (coveredDir, volume) • coveredDir: directory pathname • volume: device specifier or network volume • volume root contents become visible at pathname coveredDir (volume root) tex emacs /usr/project/packages/coveredDir/emacs

  40. Reclaiming Convenience • Symbolic links - indirect filesfilename maps, not to file object, but to another pathname • allows short aliases • slightly different semantics • Search path rules

  41. directory A directory B wind: 18 0 0 inode link count = 2 sleet: 48 rain: 32 hail: 48 inode 48 Unix File Naming (Hard Links) A Unix file may have multiple names. Each directory entry naming the file is called a hard link. Each inode contains a reference count showing how many hard links name it. link system call link (existing name, new name) create a new name for an existing file increment inode link count unlink system call (“remove”) unlink(name) destroy directory entry decrement inode link count if count = 0 and file is not in active use free blocks (recursively) and on-disk inode

  42. wind: 18 0 0 directory A directory B sleet: 67 rain: 32 hail: 48 inode link count = 1 ../A/hail/0 inode 48 inode 67 Unix Symbolic (Soft) Links • Unix files may also be named by symbolic (soft) links. • A soft link is a file containing a pathname of some other file. symlink system call symlink (existing name, new name) allocate a new file (inode) with type symlink initialize file contents with existing name create directory entry for new file with new name The target of the link may be removed at any time, leaving a dangling reference. How should the kernel handle recursive soft links? Convenience, but not performance!

  43. Soft vs. Hard Links What’s the difference in behavior?

  44. Soft vs. Hard Links What’s the difference in behavior? / Lynn Terry Jamie

  45. Soft vs. Hard Links What’s the difference in behavior? / Lynn Terry Jamie X

  46. X Soft vs. Hard Links What’s the difference in behavior? / Lynn Terry Jamie

  47. File Structure

  48. After Resolving Long PathnamesOPEN(“/usr/faculty/carla/classes/cps110/spring02/lectures/lecture13.ppt”,…)Finally Arrive at File • What do users seem to want from the file abstraction? • What do these usage patterns mean for file structure and implementation decisions? • What operations should be optimized 1st? • How should files be structured? • Is there temporal locality in file usage? • How long do files really live?

  49. File System design and impl Usage patterns observed today Know your Workload! • File usage patterns should influence design decisions. Do things differently depending: • How large are most files? How long-lived?Read vs. write activity. Shared often? • Different levels “see” a different workload. • Feedback loop

  50. Generalizations from UNIX Workloads • Standard Disclaimers that you can’t generalize…but anyway… • Most files are small (fit into one disk block) although most bytes are transferred from longer files. • Most opens are for read mode, most bytes transferred are by read operations • Accesses tend to be sequential and 100%

More Related