180 likes | 194 Views
Disk Drive Fragmentation. We often talk about the need to defragment our hard drives, but we seldom mention how and why they get fragmented. That is the subject of this presentation. But first, a little background information about disk drives. Features. Stacked 2 sided platters –
E N D
We often talk about the need to defragment our hard drives, but we seldom mention how and why they get fragmented. That is the subject of this presentation. But first, a little background information about disk drives.
Features Stacked 2 sided platters – magnetic coating on both sides Spindle speeds – 4200, 5400, 5900, 7200, 10k rpm Flying heads – fly on air cushion above surface 50 nm (.05µ, a hair is about 100µ) Head positioner – linear motor (voice coil) Controller –
Physical Organization Cylinders: stacks of same numbered tracks from each side of each platter Tracks: concentric circular regions divided into sectors Sectors: Fixed length data areas separated by gaps and sector header areas Tracks are identified by their cylinder and head number as there is a head for each side of each platter.
Sector Layout Each sector user data area is preceded on the track by a gap and an address marker, used by the controller to verify the sector location, and followed by an error checking and correcting code, used by the controller to reconstruct data when read errors occur. The gaps, markers and ECC codes add about 100 bytes to each sector's length.
Traditionally, the sector user data area was 512 bytes long. Since Vista and Windows 7, disk drives with 4K sectors have been available because of their larger capacity. This was a logical choice because hard disk space has been allocated in 4K chunks for a long time. By using 4K sectors, the space between 7 sectors was made available for user data. There was no penalty for the user as space allocation was the same, but in 1 sector instead of 8. And here is the real reason, the file allocation table can only support a limited number of sectors. Now, with larger sectors, the table can support larger drives.
Controller To reduce the burden on the driver of supporting drives with different cylinder, track and sector numbers, a logical block addressing scheme, LBA, has been developed where the driver supplies a block number to the drive and the controller in the drive does the math to locate the sectors since it knows the drive's geometry. This programmed controller is responsible for powering up the drive, controlling its speed, monitoring its temperature, calculating cylinder and track number, positioning the head assembly accordingly, switching the heads, reading and verifying the track address, locating the requested sector in the track, and preparing to read or write the data. It also buffers the data in and out of the drive because of differences in transfer speed and detects and corrects data errors.
Bare Drive At this point, we have an empty disk drive, just as it comes from the factory. The only things on it are the track and sector address markers and the servo control data on 1 platter surface which is used by the controller for head positioning. No OS. No file system. Nothing. Some incredible engineering and precision assembly.
File System When you format the drive, you build a file system on it. Windows supports NTFS and FATs. Mac OS supports HFS+ and FATs. Linux supports EXTs, FATs and more. Each has its advantages and shortcomings. Each OS and File System allocates file space and deals with file fragmentation differently. All systems can develop fragmented files.
Why Fragments? Most programs write system logs, temporary or user files. Fragmenting results from non-contiguous space allocation. Only files larger than 1 allocation can become fragmented. File space is allocated as a file is written on an as needed basis from the next available free space pool. The file's directory keeps a list of the file's allocations. The following discussion is based primarily on the FAT file system, developed by Microsoft, as it is so commonly used. Other systems must perform similar functions, but differ in methods.
Case 1: New file written to empty disk - As each 4K of data is written, space for it is allocated from the next available free space pool. Since the disk is empty, the allocations are contiguous. No fragmenting will occur. Case 2: Updated file is rewritten - The file data up to the next 4K boundary is rewritten into the space already allocated. Data not fitting in the existing space will be placed in newly allocated space as needed. Fragmenting may occur.
Case 3: Longer updated file rewritten after another file was written - As in case 2, the first part of the file will be written into its allocated space and the rest into newly allocated space. The difference is that another file has been allocated space following the first file. Now the new allocations for the extra part of the first file can not be contiguous and the file will become fragmented.
Case 4: Interference - There are many tasks active concurrently and some of them write files. One of these tasks may sneak in and write a file that needs a space allocation while you are writing your file. That will force a fragmentation, possibly of both files. An example of this is that when writing a very large file, the allocation list in the directory may become full and a secondary list must be created. In this case, your space allocation request creates its own request. When this happens, fragmentation always occurs.
Space Allocation One scheme maintains a list of free space areas ordered by increasing size. Allocation is made from the smallest area large enough to satisfy the request. As files are deleted, their space is added to the free space list. The freed space is immediately available for reuse and the deleted files may be overwritten. Another scheme always allocates from the original free space pool. As files are deleted, they are marked as deleted, but their space remains allocated. Deleted files may be recoverable. Free space from deleted files is recovered by a cleanup utility. When the free space pool is depleted, either the system stops or initiates a garbage collection routine to recover free space.
Fragmentation Effects He effects of file fragmentation are to increase the time needed to read or write a file. If the next allocation is in the same cylinder, then the delay is the rotational time to reach the sector. If the next allocation is in another cylinder, then the time to move the head assembly to that cylinder must be added to the rotational delay time. Moving the head assembly is the most time consuming operation, taking several milliseconds, depending on how many cylinders are passed. Rotational delay is tens of microseconds to a few milliseconds, depending on rotational speed and how far away the needed sector is. Clearly, if file fragments are scattered about the drive, the time to read or write increases rapidly.
Defragmenting Defragmenting is the process that rearranges the file fragments on the disk such that they are contiguous. This does not imply that files are positioned relative to each other or that there are no gaps between files. Defragmenting a file system is a disk intensive and time consuming operation, and great efforts are made to reduce the time taken. Some programs will treat empty space as a file and defragment it after defragmenting the other files, thus coalescing the free space into one large pool. Other programs will leave some empty space between the defragmented files rather than take the time to move them. Some defraggers run silently in the background. Others are standalone and provide lots of statistics.
Summary Most activities write files. Most files will become fragmented at some time. The effects of fragmentation depend on the size of the files, how often they are updated, and the capacity and performance of the drive. The number of fragmented files does not affect the time to read or rewrite unfragmented files. The presence of fragmentation increases the probability of more fragmentation. Fragmentation delays only affect disk operations. They have no effect on other activities.