250 likes | 418 Views
Chapter 10.2: File-System Interface. Chapter 10: File-System Interface. Chapter 10.1 File Concept Access Methods Chapter 10.2 Directory Structure - continued File-System Mounting Protection. Directory Structure. Directories.
E N D
Chapter 10: File-System Interface • Chapter 10.1 • File Concept • Access Methods • Chapter 10.2 • Directory Structure - continued • File-System Mounting • Protection
Directories • Systems may have zero or more file systems and each of these may be of various types used to manage data. • Files systems themselves may consist of millions of files scattered and organized (or not well organized) in a many of ways. • All files must be managed and organized, as files constitute a major component of any computing system.. • For files that are organized (again, they don’t have to be…) the principal way of organizing files is by using a directory • But there are many different directory structures used to organize / manage files. • These various directories can contain different data items too. • We will look at the key ways in which directories are organized.
Directories - 2 • While disks may certainly be dedicated, it is frequently the case that we may have multiple file systems on a single disk. • These can be organized in many ways and termed also in many ways. • Disks themselves may be partitioned, can have ‘raw disk,’ ‘regular’ formatted disk, etc. • Disks can be sliced and diced by manufacturers and vendors many ways. • For the time being, refer to a storage device holding a file system as a volume. • A volume may be thought of as a virtual disk, because volumes can actually span physical devices. • A disk itself can not only store data files, program files, directories (all with a variety of formats), and more but also a host of other storable items such as other operating systems. • A Volume Table of Contents (VTOC), which is a device directory, contains information describing the volume contents. • Simply refer to these structures as ‘directories.’
Directory Structure • A directory can be organized in various ways. • A directory may be considered a table mapping file names to a specific files. • A directory may be considered a collection of nodes containing information about all files; that is, the directory entry not only points to a file but also contains much informaton about the file. Directory Files F 1 F 2 F 3 F 4 F n Both the directory structure and the files reside on disk Backups of these two structures are often kept on magnetic tapes
Operations Performed on Directory • Search for a file • Given a file name, we need to be able to search the directory to find the file. • Create a file; Delete a file. • Need to be able to create / delete file on disk and hence maintain an appropriate entry in the directory • List a directory • We need to be able to list the contents of a directory and see characteristics of the files contained in the directory. • Rename a file • Often need to rename a file; its name may imply its position in a directory. (a full path name…) • Traverse the file system • Here we want to be able to access the directory and every file contained in the directory structure. • We want to be able to do all this very quickly!
Single-Level Directory • A single directory for all users – the simplest format. All files in the same directory. Problems: Files must have unique names This is very difficult (not practical) for multiple users using the same directory. Not uncommon for a single user to have hundreds of files on a single computing system. (I know that I do on my local Linux machine!)
Two-Level Directory • Here, we have a separate directory for each user • Similar structure. A master file directory contains the user name / account number and points to the file directory for that user. • In creating a file, the OS uses the user’s user file directory (UFD) as part of the pathname and thus ensures file names and other entries are unique.. • Creation of a new directory will normally require a system administrator. • Every entry has a pathname to uniquely define / locate a file.. • Other systems require a volume, as in C:\mydir\pgr1.java.
Two-Level Directory • Important to note that for system files, such as loaders, linkers, assemblers, compilers, and various other ‘commands,’ these too are defined as files and when we invoke them, the file is loaded and executed. • e.g gcc pgm1.c • This invokes the compiler and passes a file name as a parameter. • But where is gcc? • Search Path: So, many commonly used files, such as system files are put in a special directory for system files. • Because the user’s directory is always searched first, a ‘not-found’ will result in a search of this system directory. • The sequence of directories searched when a file is named is called a search path and this can have many fully-defined directories in it. • Both Unix and Windows machines use this approach.
Tree-Structured Directories • So we’ve seen a two-level directory. • The natural extension to a two-level directory is a tree (inverted tree) of arbitrary height. • A tree, by definition, has one root, and, because it is a tree (not a graph), supports only a singlepath to each item. • At each level, we either have files or directories / subdirectories for a lower levels. • Be sure to continue to recognize that a directory is itself simply a file, but directories are used for special ‘things’ and are organized and manageddifferently than a standard datafile, as we shall see.
Tree-Structured Directories - 2 • Running Processes: Each running process has a ‘currentdirectory.’ • References made to files by a running process causes the OS to search the current directory to locate the reference. • If the desired item is NOT in the current directory, then the user must specify a path name or path name(s) that can be alternatively used to search for the desired item • In Unix / Linux. The current directory is indicated with a dot (.). • Typically, when one ‘logs onto a system, one is in a login shell. • The operating system searches this directory for some kind of information identifying this user – perhaps a profile file… • You can edit your profile in various ways: • Easiest is $ pico .profile if you don’t mind ‘pico.’ • Notice the dot (.) in front of profile. You can set search PATHs in here… • Upon successful login, one is typically linked to your current directory.
Tree-Structured Directories – Path Names • Path names can be both absolute or relative. • An absolute path name is the full path which will start at the root directory and will follow a path ‘down’ to desired file while specifying directories and subdirectories en route to that item. • A relative path name defines a path in the current directory. • Of course, we can change the current directory to be whatever we want whenever we want to do this. • We can issue a $ cd .. Which means go up one level $ pwd which will print your working directory – in other words, where you are ‘at’ in your directory. • Example: $ cd nextdirdown <enter>
Tree-Structured Directories In the tree-structured directory above, if currentdirectory is root/spell/mail, then the relative path to prt/first refers to the same file as the absolute path root/spell/mail/prt/first Note that root, spell,mail, and prt are directories; first is a file. Of course, as a user, we can create directories and subdirectories to organize our files in any way we please.
Tree-Structured Directories (Cont) • Current directory (working directory) • The Linux command: cd /spell/mail/prog makes this subdirectory ‘current directory.’ • cd is a command that invokes a file containing an executable program that ‘changes our directory’ to the one specified. • prog is a directory with three files in it (see previous slide) list, obj, and spell. • We can also just issue a ls command, which will list the contents of our current directory – wherever we ‘are’ in our directory structure. • Dangers: • Some operating systems will not allow a user to delete a directory while there are ‘entries’ in it, such as other directories, files, etc. perhaps to many levels. • Windows environment requires directory empty before you can delete it. • Inconvenient, but may save your bacon!! • Unix provides the rm command (remove). • There is also a rmdir to remove an entire directory – but be careful!!! • Removing a directory in Linux removes all beneath it!!
Tree-Structured Directories (cont) • Remember: in our directory system we have both absolute and relative path names. • Creating a new file is done in current directory, unless we change directories or cite a different directory as part of the creation of the new file. • Delete a file? rm <file-name> • Creating a new subdirectory is done in a current directory mkdir <dir-name> mail prog copy prt exp count Deleting “mail” (above) deletes the entire subtree rooted by “mail” Be careful!!!
Acyclic-Graph Directories • Have the ability to sharesubdirectoriesandfiles. • Perhaps you wish to share resources with other people working on the same file or same project. • A tree data structure does not permit more than one path to an entry. So we need a different data structure. • An acyclic graph is a graph with no cycles, but unlike a tree, there may be more than one path to a node (file or subdirectory) • Thus this permits the same file / same subdirectory to be in two different higher level directories.
Acyclic-Graph Directories (Cont.) • Note that the sharing does not mean duplication. Quite the contrary! There is only one copy of the item being shared! • If using an acyclic-graph directory structure, be careful. • A file may have multiple absolute path names. • Referencing a file having more than one absolute path can cause problems in accumulating statistics on files or copying files to backup storage, or other issues too, such as accounting… • Deleting a file? • With more than one path to a file, do we remove the file whenever anyone deletes it? This may well cause problems for other ‘users’ of this file referencing it by a different path name. • If links are used and a link is deleted, the file may still be present. But if the file itself is deleted, the space is de-allocated and we may well have links with no file! • Your book points out that Unix leaves symbolic links when a file is deleted, and it is then up to the user to realize that the original file is gone. Windows does the same thing.
Links • Sharing files and subdirectories is very important and done all the time. • Unix accommodates this need by providing a new kind of directory entry called a link. • A link is effectively a pointer to another file or subdirectory. • Can be an absolute or a relative path. • In practice, when we reference a file, we search the current directory. • If the directory entry is marked as a link, then the name of the real file is included in the link information. • “We resolve the link by using that path name to locate the real file.” • Links: easily identifiable in a directory; often called indirect pointers. • The operating system ignores these links when traversing directory trees to preserve the acyclic structure of the system.
More on Links – in Unix • In Unix, a symbolic link is also termed a soft link, and is a special kind of file that points to another file, much like a shortcut in Windows. • Unlike a hard link, a symbolic link does not contain the data in the target file. • It simply points to another entry somewhere in the file system. • This difference gives symbolic links the ability to link to directories, or to files (on remote computers networked through a network file system. • Also, when you delete a target file, symbolic links to that file become unusable. • (Google search on Unix, links)
General Graph Directory Here is a visual for a general graph directory. You will note that there is a cycle present. You can see that graph is not only acyclic but also a ‘general graph’ and contains a cycle.
General Graph Directory (Cont.) • How do we guarantee no cycles? This is the main question. • We understand two-level directories and tree-structured directories. • But when we add links to another existing tree-structured directories, we no longer have a tree and we have a graph. • See figure 10.11. Again, note that this graph contains a cycle. • Bottom line is that we want to avoid cycles at all costs, and a general graph, as shown, may (this one does) contain cycles! • They may cause infinite loops in searching and potential degraded performance. Problems too when we wish to delete a file, and more • In acyclicgraphs, we may use a reference count bit = 0 for each entry to tell us there are no more references to a file or directory and hence it can be deleted. • In general graphs, however, when cycles are permitted, a reference count may not be 0 even when it is no longer possible to refer to a directory or a file due to deletion of links... • So what to do:
General Graph Directory – Garbage Collection • One approach is to have a garbage collection routine to discover when there are no more references to an entry (hence space may be recovered.) • Implementation: • Entire file system must be traversed marking everything that can be accessed. • A second pass collects those not marked into a list of free space. • Unfortunately, traversing a file system in attempts to manage references to files that may / may not be deleted is very expensive and often not done.
Acyclic Graph – Garbage Collection • We need garbage collection for a file system that permits cycles. • In acyclicgraphs, we can use a reference count bit = 0 for each entry to tell us there are no more references to a file or directory and hence it can be deleted. • So in an acyclic graph garbage collection is much easier to deal with, since no cycles are permitted. • But as we add links, we must be certain that new additions will not result in a cycle, if we are to maintain acyclic nature of the directory. • We can effect garbage collection in an acyclic graph by using an algorithm that determines when a new file will cause a cycle. • But running such an algorithm is very expensive when analyzing a large directory structure on disk. • A simpler approach for directories and links is to bypass any links during directory traversals. • This precludes any possibility of a cycle and costs very little.