440 likes | 597 Views
Connecting with Computer Science, 2e. Chapter 10 File Structures. Objectives. In this chapter you will: Learn what a file system does Understand the FAT file system and its advantages and disadvantages Understand the NTFS file system and its advantages and disadvantages
E N D
Connecting with Computer Science, 2e Chapter 10 File Structures
Objectives • In this chapter you will: • Learn what a file system does • Understand the FAT file system and its advantages and disadvantages • Understand the NTFS file system and its advantages and disadvantages • Compare common file systems • Learn how sequential and random file access work • See how hashing is used • Understand how hashing algorithms are created Connecting with Computer Science, 2e
Why You Need to Know About...File Structures • Knowledge of how an operating system stores and maintains data in a computer • Allows better comprehension of how a computer handles and manipulate files • Allows the computer to run as efficiently as possible Connecting with Computer Science, 2e
What Does a File System Do? • Responsibilities • Creating, manipulating, renaming, copying, and removing files to and from a storage device • Organizing files into common storage units • Called directories • Keeping track of file and directory locations • Assisting users • Relate files and folders to the physical structure of the storage medium Connecting with Computer Science, 2e
What Does a File System Do? (cont’d.) • Files used by operating systems and applications • Word-processing documents • Source code for programs you have written • Music files • Movie files • Spreadsheets • Photos • Operating systems use a file folder icon to represent a directory Connecting with Computer Science, 2e
What Does a File System Do? (cont’d.) Figure 10-1, Files and directories in a file system are similar to documents and folders in a filing cabinet Connecting with Computer Science, 2e
What Does a File System Do? (cont’d.) Figure 10-2, Folders and files in Windows Connecting with Computer Science, 2e
What Does a File System Do? (cont’d.) • Hard disk • Most common storage medium for a file system • Physically organized into tracks and sectors • Read/write heads move over specified areas of the hard disk to store (write) or retrieve (read) data • Random access device • Reads or writes data directly on the disk • Faster than sequential access • Reads and writes from beginning to end • Makes use of the file system to organize files Connecting with Computer Science, 2e
File Systems and Operating Systems • File management system • Dependent on the operating system • FAT (File Allocation Table) • Used from MS-DOS to Windows ME • NTFS (New Technology File System) • Default for Windows • Unix and Linux support several file systems • XFS, JFS, ReiserFS, ext3, others • Mac OS X file system • HFS and HFS+ Connecting with Computer Science, 2e
FAT • Groups hard drive sectors into clusters • Increases performance by organizing blocks of sectors contiguously • Maintains a relationship between files and clusters • Clusters have two entries in the FAT • Current cluster information • Link to next cluster or special code indicating the last cluster • Keeps track of writable clusters and bad clusters Connecting with Computer Science, 2e
FAT (cont’d.) Figure 10-3, Sectors are grouped into clusters on a hard disk Connecting with Computer Science, 2e
FAT (cont’d.) • Hard drive organization • Partition boot sector • Contains information on how to access volumes • Main and backup FAT • If error in reading the main FAT, backup copied to main to ensure stability • Root directory • Contains entries for every file and folder in the directory • Data area • Measured in clusters Connecting with Computer Science, 2e
FAT (cont’d.) Figure 10-4, Typical FAT file system Connecting with Computer Science, 2e
Disk Fragmentation • File clusters scattered in different locations on the storage medium • Windows provides the Disk Defragmenter utility • Reorganizes clusters contiguously • Improves performance • Minimizes movement of the read/write heads • Use regularly to ensure system runs at peak performance Connecting with Computer Science, 2e
Disk Fragmentation (cont’d.) Figure 10-5, Files become fragmented as they’re stored in noncontiguous clusters; a defragmenting utility moves files to contiguous clusters and improves disk performance Connecting with Computer Science, 2e
Advantages of FAT • Efficient use of disk space • Does not have to use contiguous space for large files • File names up to 255 characters (FAT32) • Easy to recover deleted files upon deletion • System places E5h in the first position of filename • File remains on drive • Replace E5h with original first letter of the filename Connecting with Computer Science, 2e
Disadvantages of FAT • Performance slows down as more files are stored on the partition • Hard drive fragments easily • Lack of security • NTFS provides access rights to files and directories • File integrity problems • Lost clusters • Invalid files and directories • Allocation errors Connecting with Computer Science, 2e
NTFS • Overcomes FAT system limitations • “Journaling” file system • Keeps track of transaction performed • “Rolls back” transactions if errors found • Uses a Master File Table (MFT) • Stores data about all files and directories • Similar to database table with records • Uses clusters • Reserves blocks of space to allow the MFT to grow Connecting with Computer Science, 2e
Advantages of NTFS • File access is very fast and reliable • MFT allows system recovery from problems without losing significant amounts of data • Security is greatly increased over FAT • File encryption with EFS (Encrypting File System) • File compression reduces file size • Saves disk space Connecting with Computer Science, 2e
Disadvantages of NTFS • Large overhead • Not recommended for volumes less than 4 GB • Cannot access NTFS volumes from: • MS-DOS • Windows 95 • Windows 98 • Linux Connecting with Computer Science, 2e
Comparing File Systems • Choosing correct file system • Operating system dependent • Rarely depends on hardware • NTFS: Windows XP or Vista • Supports drive sizes up to 16 TB (1600 GB) • FAT: Windows 9x • Older small hard drives, small removable devices • UNIX/Linux • Many file system choices Connecting with Computer Science, 2e
Comparing File Systems (cont’d.) Table 10-1, Fat16, FAT32, and NTFS compared Connecting with Computer Science, 2e
Comparing File Systems (cont’d.) Table 10-2, Some UNIX/Linux file systems Connecting with Computer Science, 2e
File Organization • Topics covered: • File characteristics • How files are stored on disks and other media Connecting with Computer Science, 2e
Binary or Text • Text files • Consist of ASCII or Unicode characters • Typically read with word-processing programs or text editors • Easy to view and modify • Binary files • Computer readable (not human readable) • Coded and numeric information • More compact than text files • Examples: executable programs, applications, sound and image files Connecting with Computer Science, 2e
Sequential or Random Access • Sequential storage • Data accessed one chunk after the other in order • Random storage • Data accessed in any order • Also called direct or relative access Connecting with Computer Science, 2e
Sequential or Random Access (cont’d.) Figure 10-6, Sequential versus random access Connecting with Computer Science, 2e
Sequential Access • Starts at the beginning and processes to the end of the file • Writing process is very fast • New data added to the end of a file • Retrieving, inserting, deleting, modifying data • Very slow • Stores data in rows like a database record • Field delimiters or specific fixed sizes for each field Connecting with Computer Science, 2e
Sequential Access (cont’d.) Figure 10-7, A comma can be used as a field delimiter Connecting with Computer Science, 2e
Sequential Access (cont’d.) Figure 10-8, Data can also be in fixed-length format Connecting with Computer Science, 2e
Random Access • Provides faster access to large amounts of data • Stores fixed-length records (relative records) • Ability to mathematically calculate the record’s position on disk surface and go right to it • Ability to update records in place • May waste disk space • Partial record or no data • Works well when sequential record number can easily identify records Connecting with Computer Science, 2e
Random Access (cont’d.) Figure 10-9, Record organization and file access Connecting with Computer Science, 2e
Hashing • Used for accessing relative record files • Uses unique value called a hash key • Widely used in database management systems • Involves a hashing algorithm to generate hash keys for each record • Combining hash keys establishes an index to rows or records of information Connecting with Computer Science, 2e
Why Hash? • Allows a key field number not suited for relative file access to be converted into a relative record number • Example: phone numbers as keys in a customer information table • Divide highest possible phone number by the expected number of customers to get the hash key • 9999999999 / 2000 (estimated number of customers) = approximately 5,000,000 • Phone number 7025551234 / 5,000,000 gives the record number 1045 Connecting with Computer Science, 2e
Why Hash? (cont’d.) • Hashing may result in collisions • Same relative key is generated for more than one original key value • One solution: • Expand algorithm to add the sum of the digits of the phone number to the relative key • Sum of the digits in phone number 7025551234 is 34 • Original key 1045 + 34 = 1079 • Lessens collisions but does not eliminate them Connecting with Computer Science, 2e
Dealing with Collisions • Best hashing algorithms have collisions • One solution: create overflow area • Records with duplicate record numbers are placed in the overflow area at the end of the file • Record retrieval • Hash key is calculated, and record at calculation position is retrieved • If the record at that location isn’t the correct one, the overflow area is searched sequentially Connecting with Computer Science, 2e
Dealing with Collisions (cont’d.) Figure 10-10, An overflow area helps resolve collisions Connecting with Computer Science, 2e
Hashing and Computing • Efficient hashing algorithm • Important to companies producing database management systems • Many different hashing algorithms are used in computing • Encryption and decryption • Indexing • Many programming languages have specialized libraries of built-in hashing routines Connecting with Computer Science, 2e
One Last Thought • Determining a computer system’s worth • Often measured in terms of data stored on hard drives • Data can be difficult to replace • Data storage dependent on file systems • Strong understanding of file systems allows more data availability and protraction Connecting with Computer Science, 2e
Summary • Hard drive • Random access device • Stores information in tracks and sectors • Accesses data through read/write heads • File system • Responsible for creating, manipulating, renaming, copying, and removing files from a storage device • Windows uses either FAT or NTFS Connecting with Computer Science, 2e
Summary (cont’d.) • FAT keeps track of which files are using specific clusters • Vulnerable to disk fragmentation • NTFS uses MFT to keep track of files and directories • Used with Windows • NTFS advantages over FAT • Better reliability and security, journaling, file encryption, and file compression Connecting with Computer Science, 2e
Summary (cont’d.) • Linux can be used with many file systems • Files contain binary or text (ASCII) data • Data is usually stored and accessed either sequentially or randomly (relative access) • Hashing • Common method for accessing a relative file • Collisions occur when the hash key is duplicated for more than one relative record location Connecting with Computer Science, 2e