300 likes | 314 Views
Explore file headers, naming structures, and directory organization in operating systems, optimizing storage and access with hierarchical, relational, and contextual naming methods.
E N D
Naming and Directories Andy Wang Operating Systems COP 4610 / CGS 5675
Recall from the last time… • A file header associates the file with its data blocks
File Header Storage • Under UNIX, a file header is stored in a data structure called i-node • For early UNIX systems • I-nodes are stored in a special array • Fixed number of array entries • Maximum number of files fixed • Not stored near data blocks on disk • Reading a small file involves • One disk seek to get the i-node • Other disk seek(s) to get file blocks
Reasons for Separate Allocations • Reliability • Data corruptions are unlikely to affect i-nodes • Reduced fragmentation • File headers are smaller than a whole block • By packing them in an array, multiple headers can be fetched from disk • File headers are accessed more often • e.g., ls • Grouping file headers improves disk efficiency
For BSD 4.2… • Portions of file header array stored on each cylinder • For small directories • All file headers and data stored in the same cylinder • Reduce seek time
Naming • Remember that odd moment when your computer asks you for name the first file? • Naming: allows users to issue file names instead of i-node numbers - Users tend to come up with poor names • e.g., test - Many file are difficult to name…
Directories • A table of file names and their i-node numbers • Under many file systems • Directories are implemented as normal files • Containing file names and i_node numbers • Only the OS is permitted to modify directories
Name Space • Flat name space • Hierarchical naming • Relational name space • Contextual naming • Content-based naming
Flat Name Space • All files are stored in a single directory + Easy to implement - Not scalable for large directories • Name collisions: multiple files with the same names
Hierarchical Naming • Uses multiple levels of directories • Most popular name space organization + Conceptual model maps well into the human model of organizing things • A file cabinet contains many files + Scalable • The probability of name collisions decreases + Spatial locality • Store all files under a directory within a cylinder to avoid disk seeks
More on Hierarchical Naming • Absolute path name: consisting the path from the root directory ‘/’ to the file • e.g., /pets/cat.jpg root directory sub directory file name
pets pests ? ? Drawbacks of Hierarchical Naming - Not all files can fit into the hierarchical model - Accessing a file may involve many levels of directory lookups, or a path resolution before getting to the file content
An Example of Path Resolution • To access the data content of /pets/cat.jpg • The system needs to perform the following disk I/Os
/ An Example of Path Resolution • To access the data content of /pets/cat.jpg • The system needs to perform the following disk I/Os 1. Read in the file header for the root directory ‘/’ • Stored at a fixed location on disk
/ An Example of Path Resolution • To access the data content of /pets/cat.jpg • The system needs to perform the following disk I/Os 2. Read the first data block for the root directory • Lookup the directory entry for pets pets
/ An Example of Path Resolution • To access the data content of /pets/cat.jpg • The system needs to perform the following disk I/Os 3. Read the file header for pets pets pets
/ An Example of Path Resolution • To access the data content of /pets/cat.jpg • The system needs to perform the following disk I/Os 4. Read the first data block for the pet directory • Lookup the directory entry for cat.jpg pets pets cat
/ An Example of Path Resolution • To access the data content of /pets/cat.jpg • The system needs to perform the following disk I/Os 5. Read the file header for cat.jpg pets cat pets cat
/ An Example of Path Resolution • To access the data content of /pets/cat.jpg • The system needs to perform the following disk I/Os 6. Read the data block for cat.jpg pets cat pets cat
Some Performance Optimizations… • Top-level directories are usually cached • A user inside a directory (e.g., /pets) • Can issue relative path names (e.g., cat.jpg) to refer files within the current directory
Relational Name Space • Hierarchical naming model is largely a tree • Relational naming model allows the construction of general graphs • A file can belong to multiple folders • According to its attributes • Files can be accessed in a manner similar to relational databases • e.g., keywords: cats and blinds
Pros/Cons of Relational Name Space + More flexible than hierarchical naming - May require a long list of attributes to name a single piece of data • e.g., this lecture • Keywords: operating systems, file systems, naming, PowerPoint XP - Who will create those attributes?
Contextual Naming • Takes advantage of the observation that certain attributes can be added automatically • e.g., when you try to open a file by Word, a system will search only the file types supported by Word (.doc, .txt, .html) + Avoids a long list of attributes - A user may not remember the file name
Content-based Naming • Searches a file by its content instead of names • File contents are extracted automatically • e.g., I want a photo of a cat taken five years ago • The system returns all files satisfying the criteria
Content-based Naming - Requires advanced information processing techniques • e.g., image recognition • Many existing systems use manual indexing • Automated content-based naming is still an active area of research
Example: The “Internet File System” • Can be viewed as a worldwide file system • What is the naming scheme for the Internet file system?
The “Internet File System” • Contains shades of various naming schemes • Flat name space: • Each URL provides a unique name • Hierarchical name space: • Within individual websites • Relational name space • Can search the Internet via search engines • Contextual name space: • Page ranked according to relevance • Content-based name space: • You can find your information without knowing the exact file names
Example: Plan 9 • Modern UNIX has a deep-rooted influence from the Plan 9 OS • Developed by Bell lab • Major design philosophy: everything is a file • A single hierarchical name space for • Processes (e.g., /proc) • Files • IPC (e.g., pipe) • Devices (e.g., /dev/fd0) • Use open/close/read/write for everything • e.g., /dev/mem