330 likes | 346 Views
Chapter 3 Secondary Storage. Objectives: To get familiar with: Storage and access of data on disk Storage and access of data on tape Storage and access of data on CD-ROM Buffer management. Outline. Disk organization and capacity Disk access Tape CD-ROM A journey of a byte
E N D
Chapter 3 Secondary Storage Objectives: To get familiar with: Storage and access of data on disk Storage and access of data on tape Storage and access of data on CD-ROM Buffer management
Outline • Disk organization and capacity • Disk access • Tape • CD-ROM • A journey of a byte • Buffer management
Disks • Serial device: permit serial data access only. Example: magnetic tapes. • Direct access storage device (DASD): permit direct data access. Example: magnetic disks, optical disks. • hard disk: high capacity, low cost. Usually, attached to a computer system on a hard disk drive. • floppy disk: small capacity, low cost. Usually, removable from a floppy disk drive. • CD-ROM: read only, higher capacity, low cost. Usually, removable from a CD-ROM drive. • Compact disks can be writable (CD-RW).
Disk Organization -- Platters • The information stored on a disk is stored on the surface of one or more platters.
Disk Organization -- Tracks and Sectors • The information is stored in successive tracks on the surface of the disk. • Each track is divided into a number of sectors. • A sector is the smallest addressable unit of a disk. • When a READ( ) statement fetches a particular byte from the disk, the entire sector containing that byte is loaded to a special space in RAM, called buffer.
Disk Organization -- Cylinders • The tracks that are directly above and below one another form a cylinder. • Data on a single cylinder can be accessed withoutmovingthearm. • Moving the arm is called seeking. The arm movement is the slowest part of reading data from a disk.
Disk Capacities • Disks ranges in width from 2 to 14 inches, commonly 3.5”. • The capacity of a disk ranges from several megabytes to several hundreds of gigabytes. • In a disk, each platters can store data on both sides, called surfaces. • The number of surfaces is twice the number of platters. • The number of cylinders is the same as the number of tracks on a single surface. • The bit density on a track affects the amount of data can be held on the track surface. The bit density depends on the quality of the recording medium and the size of the read/write head. • A low density disk can hold about 4KB on a track and 35 tracks on a surface. • A top-of-the-line disk can hold more than 1MB on a track and more than 10,000 tracks on a surface (cylinders).
Disk Capacities (cont’d) • Disk drive capacity: track capacity = number of sectors per track bytes per sector number of tracks per cylinder = 2 number of platters. cylinder capacity = number of track per cylinder track capacity drive capacity = number of cylinder cylinder capacity • Example: suppose a disk has the following specification number of bytes per sector = 512 number of sectors per track = 256 number of platters = 12 number of cylinders = 8192 track capacity = 512 256 bytes = 128 KB number of cylinders = 2 12 = 24 cylinder capacity = 24 128 KB = 3MB total disk capacity = 8192 3 = 24GB
Disk Capacities (cont’d) • If the size of a file is known, the amount of disk space can be calculated. Example: a file of 500,000 fixed-length data records 256 bytes in each record Using the disk in the previous slide: • A sector can hold 2 records. • The file needs 500000/2 = 250000 sector, or 250000/256 = 977 tracks, or 977/24 = 41 cylinders • If the disk does not have 41 physically contiguous cylinders available, the file may be spread out over dozens or even hundreds of cylinders.
Track Organization -- by Sector • Two basic ways to organize data on a disk: • organizing tracks by sector, and • organizing tracks by user-defined block. • The physical placement of sectors: • physically adjacent sectors • interleaving sectors For newer disks with faster data transfer rate For disks with slow data transfer rate
Clusters • A cluster is a fixed number of contiguous sectors (not physically contiguous; the degree of physical contiguity is determined by the interleaving factor). • Once a cluster has been found on a disk, all sectors in that cluster can be accessed without additional seeks. • A file is viewed as a series of clusters of sectors using a file allocation table (FAT) containing a list of all clusters ordered according to the logical order of the sectors they contain. • The system administrator can decide how many sectors in a cluster.
Extents • An extent is a file stored in contiguous sectors, tracks and cylinders. Its clusters are contiguous. • An extent is possible, if a disk has a lot of free space. • The file can be accessed with a minimum amount of seeking. • If no contiguous free space for a file, the file can be stored in two or more extents.
Fragmentation • Internal fragmentation of a disk is the unused disk space which cannot be used by other files. • Store a file of 300-byte records in a disk of sector size 512 bytes. • Store a record in a sector. This will cause the loss of disk space, i.e., internal fragmentation. • Allow records to span in two sectors. This will save disk space. But, it may require the retrieval of two sectors when accessing a record. • If the number of bytes in a file is not a multiple of the cluster size, internal fragmentation will occur in the last extent of the file.
Track Organization -- by Block • Disk tracks can be divided into integral numbers of user-defined blocks. • Block size can have fixed or variable length. • In block organization, different amount of data can be transferred in a single I/O operation. • A block organization does not have sector-spanning and fragmentation problems. (a) track organization by sectors (b) track organization by blocks
Track Organization -- by Block (cont’d) • A block may contain one or several records. • Each block is usually accompanied by on or more subblocks containing extra information about the data blocks. • count subblock: counting the number bytes in the accompanied data block. • key subblock: containing the key for the last record in the data block. • When key subblocks are used, a track can be searched by the disk controller for a block or record identified by a given key. • This search is more efficient than sector-addressable schemes because it does not load the keys into primary memory.
Nondata Overhead • Preformatting overhead for sector-addressable disks • stored at the beginning of each sector, including information about sector address, track address, and condition (whether the sector is usable or defective). • preformatting also involves placing gaps and synchronization marks between fields of information. • Nondata overhead for block-addressable disks • subblocks and interblock gaps. • block factors: number of bytes per track/block length. • In general, block factors is the greater the better. • However, larger blocks have higher potential of internal track fragmentation.
Disk Access Cost • Seek time: the time required to move the access arm to the correct cylinder. average seek time • Rotational delay: the time required to rotate the disk so the desired sector can be placed under the read/write head. • Maximum rotational delay: time for one resolution • Average rotational delay: half of maximum rotational delay • Transfer time:the time required to read the data from the disk (number of bytes transferred number of bytes on a track) rotation time or (number of sectors transferred number of sectors in a track) rotation time
Disk Access Time • Suppose the previous mentioned disk with 10000 rpm (resolutions per minute) average seek time = 10 ms average rotational delay = half resolution = (1/2) (1/10000) minute = 3 ms • Suppose the previous mentioned file is stored as Case 1. Random sectors, that is, we can read only one sector a time Case 2. Random clusters: each cluster has 8 sectors (4KB). Case 3.One extent Decide the access time of the file for these three cases
Disk Access Time (cont’d) • Case 1: assume the file is read sector by sector in random. average seek 10.0 msec rotational delay 3.0 msec read one sector 0.023 msec //(1/256) (1/10000 min) Total 13.023 msec Total time =250000 13.023 msec = 3255.75 seconds = 54 minutes • Case 2: assume the file is read cluster by cluster in random. average seek 10.0 msec rotational delay 3.0 msec read one cluster 0.187 msec //(8/256) (1/10000 min) total 13.187 msec Total time: (250000/8) 13.187 msec = 412.09 seconds = 6.9 minutes
Disk Access Time (cont’d) • Case 3: sequential access average seek 10.0 msec 41 = 410 msec rotational delay 3 msec read one extend (250000/256) (1/10000 min) = 5859.4 msec Total time: 410 + 3 + 5859.4 = 6272.4. msec = 6.3 seconds • Conclusion • Seeking is the most expensive operation. Avoid seeking as much as possible. • Grouping data into larger units (e.g., cluster) can reduce access time. • Sequential access is much faster than random access.
Disk as Bottleneck • Disk is slow comparing with memory, CPU, and high-speed network. • A process is disk-bound when CPU or network is waiting for disk I/O.The execution time of the process is bound by the disk access. • Possible solutions: • Multi-tasking: CPU switches among processes • Stripping/RAID: using multiple disks for different parts of a file -- parallelism. • Buffering
Tape • No direct accessing facility, but very rapid sequential access. • Compactness, resistance to rough environmental conditions, easy to store and transport, cheaper than disk • Used to be used for application data • Currently, tapes are primarily used as archival storage.
Organization of Data on Nine-Track Tapes • On a tape, the logical position of a byte within a file corresponds directly to its physical position relative to the start of the file. • The surface of a typical tape can be seen as a set of parallel tracks each of which is a sequence of bits. These bits correspond to 1 byte + a parity bit. • One Byte = a one-bit-wide slice of tape called a frame. • In odd parity, the bit is set to make the number of bits in the frame odd. This is done to check the validity of the data. • Frames are organized into data blocks of variable size separated by interblock gaps (long enough to permit stopping and starting)
Estimating Tape Length Requirements • Let b= the physical length of a data block • Let g= the length of an interblock gap, and • Let n= the number of data blocks. • The space requirement, s, for storing the file is s = n (b+g) • b= blocksize (i.e., bytes per block)/ tape density (i.e., bytes per inch) • The number of records stored in a physical block is called the blocking factor. • Effective Record Density: a general measure of the effect of choosing different block sizes: (number of bytes per block)/ (number of inches required to store a block) • ==> Space utilization is sensitive to the relative sizes of data blocks and interblock gaps.
Estimating Data Transmission Times • Normal Data Transmission Rate= (Tape Density (bpi)) (Tape Speed (ips)) • Interblock gaps, however, must be taken into consideration Effective Transmission Rate = (Effective Recording Density) (Tape Speed) • Blocking factor affects effective transmission rate.
Disk versus Tape • In the past: • Both Disks and Tapes were used for secondary storage. Disks were preferred for random access and tape was better for sequential access. • Now: • Disks have taken over much of secondary storage ==> Because of the decreased cost of disk + memory storage • Tapes are used as Tertiary storage (Cheap, fast & easy to stream large files or sets of files between tape and disk)
CD-ROM • A single disc can hold more than 600 MB of data. • CD-ROM is a descendent of CD Audios. i.e., listening to music is sequential and does not require fast random access to data. • CD-ROM is read only. i.e., it is a publishing medium rather than a data storage and retrieval like magnetic disks. There can’t be any changes ==> File organization can be optimized. • CD-ROM Strengths: • High storage capacity • Inexpensive price • Durability • CD-ROM Weaknesses: • Extremely slow seek performance (between 1/2 a second to a second) ==> Intelligent File Structures are critical.
Pits and Lands • CD-ROMs are stamped from a glass master disk which has a coating that is changed by the laser beam. When the coating is developed, the areas hit by the laser beam turn into pits along the track followed by the beam. The smooth unchanged areas between the pits are called lands. • Pits scatter light; lands reflect light. • 1’s are represented by the transition from pit to land and back again. 0’s are represented by the amount of time between transitions. The longer between transitions, the more 0s we have. • There must be at least two 0s between any pair of 1s. • Raw patterns of 1s and 0s have to be translated to get the 8-bit patterns of 1s and 0s that form the bytes of the original data. • EFM encoding (Eight to Fourteen Modulations) turns the original 8 bits of data into 14 expanded bits that can be represented in the pits and lands on the disk. • Since 0s are represented by the length of time between transition, the disk must be rotated at a precise and constant speed. This affects the CD-ROM drive’s ability to seek quickly.
CLV vs. CAV • Data on a CD-ROM is stored in a single, spiral track. This allows the data to be packed as tightly as possible since all the sectors have the same size (whether in the center or at the edge) -- constant linear velocity (CLV). • Since reading the data requires that it passes under the optical pick-up device at a constant rate, the disc has to spin more slowly when reading the outer edges than when reading towards the center. • The CLV format is responsible for the poor seeking performance of CD-ROM Drives: there is no straightforward way to jump to a location. Part of the problem is the need to change rotational speed. • To read the address info, we need to be moving the data under the optical pick up at the correct speed. But to adjust the speed, we need to read the address info. How do we break this loop? By guessing and through trial and error ==> Slows down performance. • Disk drives pack the data more densely in the center than in the edge -- constant angular velocity (CAV). The disk spins at a constant rate. Data density is less on outer tracks. It is easy to find the start of a tractor.
Addressing • Different from the “regular” disk method. • Each second of playing time on a CD is divided into 75 sectors. Each sector holds 2 Kilobytes of data. Each CD-ROM contains at least one hour of playing time. The disc is capable of holding at least 60 min * 60 sec/min * 75 sector/sec * 2 Kilobytes/sector = 540, 000 KBytes • Often, it is actually possible to store over 600, 000 KBytes. • Sectors are addressed by min:sec:sector e.g., 16:22:34
A Journey of A Byte What happens when the program statement: write(fd, &ch, 1) is executed ? Part that takes place in memory: • Statement calls the Operating System (OS) which overseas the operation • File manager (Part of the OS that deals with I/O) • Checks whether the operation is permitted • Locates the physical location where the byte will be stored (Drive, Cylinder, Track & Sector) • Finds out whether the sector to put the character is already in memory (if not, call the I/O Buffer) • Puts ‘P’ (content of ch) in the I/O Buffer • Keep the sector in memory to see if more bytes will be going to the same sector in the file
A Journey of A Byte (Cont’d) Part that takes place outside of memory: • I/O Processor: Wait for an external data path to become available (CPU is faster than data-paths ==> Delays) • Disk Controller: • I/O Processor asks the disk controller if the disk drive is available for writing • Disk Controller instructs the disk drive to move its read/write head to the right track and sector. • Disk spins to right location and byte is written
Buffer Management • What happens to data travelling between a program’s data area and secondary storage? • Buffering involves working with a large chunk of data in memory so the number of accesses to secondary storage can be reduced. • How many buffers do we need? at least two: one for input and the other for output • Moving data to or from disk is very slow and programs may become I/O bound. • Buffering Strategies • Multiple Buffering • Double Buffering • Buffer Pooling • Move mode: move between buffer and program data area Locate mode: operating directly on buffer • Scatter/gather I/O: fill/empty multiple buffer with a single read/write