1 / 72

Chapter 14: Mass-Storage Systems 海量存储器系统

Chapter 14: Mass-Storage Systems 海量存储器系统. 14.1 Disk Structure 磁盘结构 14.2 Disk Scheduling 磁盘调度 14.3 Disk Management 磁盘管理 14.4 Swap-Space Management 交换空间管理 14.5 RAID Structure RAID 结构

johnda
Download Presentation

Chapter 14: Mass-Storage Systems 海量存储器系统

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 14: Mass-Storage Systems海量存储器系统 • 14.1 Disk Structure 磁盘结构 • 14.2 Disk Scheduling 磁盘调度 • 14.3 Disk Management 磁盘管理 • 14.4 Swap-Space Management 交换空间管理 • 14.5 RAID Structure RAID结构 • 14.6 Disk Attachment 磁盘连接 • 14.7 Stable-Storage Implementation 稳定存储实现 • 14.8 Tertiary Storage Devices 三级存储设备 • Operating System Issues 有关操作系统的问题 • Performance Issues 有关性能的问题 Operating System Concepts

  2. 14.1 Disk Structure磁盘结构 • Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer. 磁盘设备是以一种逻辑块的一维大数组的形式编址的,这里的逻辑块是传输的最小单位。 • The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially. 逻辑块的一维数组映射到磁盘上一些相连的扇区。 • Sector 0 is the first sector of the first track on the outermost cylinder. 0扇区是最外边柱面的第一个磁道的第一个扇区。 • Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost. 数据首先都映射到一个磁道,其余的数据映射到同一柱面的其他磁道,然后按照从外向里的顺序映射到其余的柱面。 Operating System Concepts

  3. Disk Structure磁盘结构 • Constant linear velocity(CLV) (恒定线速度) • Density of bits per track is uniform • The farther a track is from the center of the disk, the greater its length, so the more sectors it can hold. • 磁头越往中心移动,转速越快,以保持数据速率不变 • CD-ROM和DVD-ROM采用这种方法 • Constant angular velocity(CAV) (恒定角速度) • 磁头转速不变 • 为保持数据速率不变,从中心往外,数据密度由大变小 • 硬盘等采用这种方法 • Low-level formatted • Block size:512 bytes or 1024 Operating System Concepts

  4. 14.2 Disk Scheduling磁盘调度 • The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk bandwidth. 操作系统任务就是高效地使用硬件——对于磁盘设备,这意味着很短的访问时间和磁盘带宽。 • Access time has two major components 访问时间包括两个主要部分 • Seek time is the time for the disk are to move the heads to the cylinder containing the desired sector. 寻道时间是指把磁头移到所需柱面的时间。 • Rotational latency is the additional time waiting for the disk to rotate the desired sector to the disk head.旋转延迟是指将磁头旋转到磁盘上指定扇区所需的等待时间。 Operating System Concepts

  5. Disk Scheduling (Cont.) • Minimize seek time 最小寻道时间 • Seek time  seek distance 寻道时间  寻道距离 • Disk bandwidth is the total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer. 磁盘带宽,是用传输的总位数,除以第一个服务请求与最后传输完成之间的总时间。(也就是单位时间内数据传输量) Operating System Concepts

  6. Disk Scheduling (Cont.) • Several algorithms exist to schedule the servicing of disk I/O requests. 有几种磁盘I/O请求的服务调度算法 • We illustrate them with a request queue (0-199). 98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53 Operating System Concepts

  7. 14.2.1 FCFS Scheduling先来先服务调度 • Simplest form. Illustration shows total head movement of 640 cylinders.如下图所示,磁头总共移动了640个柱面的距离。 Operating System Concepts

  8. 14.2.2 SSTF Scheduling最短寻道时间优先调度 • Shortest-seek-time-first(SSTF) algorithm • Selects the request with the minimum seek time from the current head position. 选择从当前磁头位置所需寻道时间最短的请求。 • SSTF scheduling is a form of shortest-job-first(SJF) scheduling; SSTF是SJF调度的一种形式; • may cause starvation of some requests. 有可能引起某些请求的饥饿。 • Illustration shows total head movement of 236 cylinders.如图所示,磁头移动的总距离是236柱面。 • It’s not optimal, 最佳磁头移动的总距离是208柱面 Operating System Concepts

  9. SSTF (Cont.) Operating System Concepts

  10. 14.2.3 SCAN Scheduling扫描调度 • The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues. 磁头从磁盘的一端开始向另一端移动,沿途响应访问请求,直到到达了磁盘的另一端,此时磁头反向移动并继续响应服务请求。 • Sometimes called the elevator algorithm. 有时也称为电梯算法。 • Illustration shows total head movement of 236 cylinders. 如图所示,磁头移动的总距离是236柱面。 Operating System Concepts

  11. SCAN (Cont.) Operating System Concepts

  12. 14.2.4 C-SCAN Scheduling • Circular SCAN(C-SCAN) scheduling provides a more uniform wait time than SCAN. 提供比扫描算法更均衡的等待时间。 • The head moves from one end of the disk to the other. servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip. 磁头从磁盘的一端向另一端移动,沿途响应请求。当它到了另一端,就立即回到磁盘的开始处,在返回的途中不响应任何请求。 • Treats the cylinders as a circular list that wraps around from the last cylinder to the first one. 把所有柱面看成一个循环的序列,最后一个柱面接续第一个柱面。 • 磁头移动的总距离是183(*在从一边到另一边的变化过程中不接受任何请求)柱面。 Operating System Concepts

  13. C-SCAN (Cont.) Operating System Concepts

  14. 14.2.5 LOOK/C-LOOK Scheduling • SCAN和C-SCAN总是将磁臂在整个盘面宽度上移动,其实这样做并不实用。 • SCAN和C-SCAN的一种改进形式:LOOK和C-LOOK。 • C-LOOK: Arm only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk. 磁臂在每个方向上仅仅移动到最远的请求位置,然后立即反向移动,而不需要移动到磁盘的一端。 • 磁头移动的总距离是153(*在从一边到另一边的变化过程中不接受任何请求)柱面。 Operating System Concepts

  15. C-LOOK (Cont.) Operating System Concepts

  16. 14.2.6 Selecting a Disk-Scheduling Algorithm选择一种磁盘调度算法 • SSTF is common and has a natural appeal SSTF比较通用,性能一般。 • SCAN and C-SCAN perform better for systems that place a heavy load on the disk. SCAN和C-SCAN在磁盘重负载的系统中性能较好。 • Performance depends on the number and types of requests. 性能依赖于请求的数量和类型。 • Requests for disk service can be influenced by the file-allocation method. 磁盘服务请求受到文件定位方式的影响。 • 目录和索引块的位置(最里面、最外面或中间柱面)也很重要。 • 上述算法仅考虑seek time, 而没有考虑 rotational latency Operating System Concepts

  17. Selecting a Disk-Scheduling Algorithm (Cont.) • The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it to be replaced with a different algorithm if necessary. 磁盘调度算法应该写成操作系统中的一个独立模块,在必要的时候允许用不同的算法来替换。 • Either SSTF or LOOK is a reasonable choice for the default algorithm. SSTF和LOOK都是缺省算法的合理选择。 Operating System Concepts

  18. 磁盘I/O调度策略 • 来自不同进程的磁盘I/O请求构成一个随机分布的请求队列。磁盘I/O调度的主要目标就是减少请求队列对应的平均柱面定位时间。 • 先进先出算法 • 优先级算法 • 后进先出算法 • 短查找时间优先算法 • 扫描(SCAN)算法 • 循环扫描(C-SCAN)算法 • N步扫描(N-step-SCAN)算法 • 双队列扫描(FSCAN)算法 Operating System Concepts

  19. 先进先出(FIFO, First In First Out)算法:磁盘I/O执行顺序为磁盘I/O请求的先后顺序。 • 该算法的特点是公平性;在磁盘I/O负载较轻且每次读写多个连续扇区时,性能较好。 • 优先级算法:依据进程优先级来调整磁盘I/O请求的执行顺序。 • 该算法反映进程在系统的优先级特征,目标是系统目标的实现,而不是改进磁盘I/O性能。 • 后进先出(LIFO, Last In First Out)算法:后产生的磁盘I/O请求,先执行。 • 该算法是基于事务系统中顺序文件中磁盘I/O的局部性特征,相邻访问的位置也相邻。 • 它的问题在于系统负载重时,可能有进程的磁盘I/O永远不能执行,处于饥饿状态。 Operating System Concepts

  20. 短查找时间优先(SSTF, Shortest Service Time First)算法:考虑磁盘I/O请求队列中各请求的磁头定位位置,选择从当前磁头位置出发,移动最少的磁盘I/O请求。 • 该算法的目标是使每次磁头移动时间最少。它不一定是最短平均柱面定位时间,但比FIFO算法有更好的性能。 • 对中间的磁道有利,可能会有进程处于饥饿状态。 • 扫描(SCAN)算法:选择在磁头前进方向上从当前位置移动最少的磁盘I/O请求执行,没有前进方向上的请求时才改变方向。 • 该算法是对SSTF算法的改进,磁盘I/O较好,且没有进程会饿死。 Operating System Concepts

  21. 循环扫描(C-SCAN)算法:在一个方向上使用扫描算法,当到达边沿时直接移动到另一沿的第一个位置。循环扫描(C-SCAN)算法:在一个方向上使用扫描算法,当到达边沿时直接移动到另一沿的第一个位置。 • 该算法可改进扫描算法对中间磁道的偏好。实验表明,该算法在中负载或重负载时,磁盘I/O性能比扫描算法好。 • N步扫描(N-step-SCAN)算法:把磁盘I/O请求队列分成长度为N的段,每次使用扫描算法处理这N个请求。当N=1时,该算法退化为FIFO算法。 • 该算法的目标是改进前几种算法可能在多磁头系统中出现磁头静止在一个磁道上,导致其它进程无法及时进行磁盘I/O。 • 双队列扫描(FSCAN)算法:把磁盘I/O请求分成两个队列,交替使用扫描算法处理一个队列,新生成的磁盘I/O请求放入另一队列中。 • 该算法的目标与N步扫描算法一致。 Operating System Concepts

  22. 14.3 Disk Management磁盘管理 • Disk initialization • Low-level formatting, or physical formatting磁盘低级格式化,或物理格式化 • Booting from disk • Boot blocks • Bad-block recovery • Bad blocks Operating System Concepts

  23. 14.3.1 Disk Format • Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk controller can read and write. 低级格式化,或物理格式化——把磁盘划分成扇区,以便磁盘控制器可以进行读写。 • 扇区的数据结构:header + data area(usually 512 bytes) + trailer • An error-correcting code(ECC) included in header • ECC在写数据时计算产生, 在读数据时校验,并有可能纠正错误 Operating System Concepts

  24. Disk Format(Cont.) • To use a disk to hold files, the operating system still needs to record its own data structures on the disk. 为了使用磁盘保存文件,操作系统还需要在磁盘上保存它自身的数据结构。 步骤: • Partition the disk into one or more groups of cylinders.把磁盘划分成一组或多组柱面。 • Logical formatting or “making a file system”. 逻辑格式化或“创建文件系统”。 • 使用磁盘系统有两种方法: • Raw I/O (raw disk access 直接访问磁盘) • Via regular file system services and I/O Operating System Concepts

  25. 14.3.2 Boot Block 启动块 • Boot block initializes system. 启动块初始化系统 • The bootstrap is stored in ROM. 引导程序存储在ROM中 • The full bootstrap program is stored in a partition called the boot blocks. 完整的引导程序在磁盘的被称为引导块的分区上(系统盘,boot disk / system disk ) • Bootstrap loader program. 引导程序装载程序。 Operating System Concepts

  26. Fig 14.6 MS-DOS Disk Layout Operating System Concepts

  27. 14.3.3 Bad Block 坏块 • 简单磁盘系统(如 IDE接口) • 通过命令方式处理坏块 • 如 MS-DOS 的 format 和 chkdsk 命令 • 复杂磁盘系统(如 SCSI接口) • 通过sector sparing or forwarding的方式处理坏块: • 将坏块重定向到系统保留的空闲块上 • 空闲块在磁盘的某个用户不可见的位置,或在每个柱面的某个位置以减少数据移动距离(seek time) • 通过sector slipping 的方式处理坏块(整体移动): • 假如 17#扇区坏,随后第一个刻有扇区为202# • 则处理方法为: 201202,200201,…,1819,1718 • 坏块未必一定能够被恢复 Operating System Concepts

  28. 14.4 Swap-Space Management交换空间管理 • Swap-space — Virtual memory uses disk space as an extension of main memory. 交换空间——虚拟内存使用磁盘空间作为对主存的扩展。 使用交换空间的主要目的:提高系统吞吐量 • 一个系统中,交换空间的大小可以是几兆~几个Gbytes • Some OS, such as Unix, allow the use of multiple swap spaces • It’s safer to overestimate than to underestimate swap space. Waste some disk space, but does not other harm. Operating System Concepts

  29. Swap-Space Management交换空间管理 • Swap-space Location • can be carved out of the normal file system, 交换空间可以与常规的文件系统一起使用 • 容易实现 • 效率低 =〉改进:cache block location information in main memory • or, more commonly, it can be in a separate disk partition. 或者,更通常的情况是放在一个单独的磁盘分区里。 • Use algorithm optimized for speed, not for storage efficiency • Add more swap space via repartitioning of the disk • 有些操作系统(如 Solaris 2)比较灵活,既可利用raw partitions,也可利用file-system space作为交换空间 Operating System Concepts

  30. Swap-Space Management (Cont.) • Swap-space management 交换空间管理 • 4.3BSD allocates swap space when process starts; holds text segment (the program) and data segment. 4.3BSD在进程开始时分配交换空间;保存正文段(程序)和数据段。 • 交换是通过在连续的磁盘区域和内存之间copy整个进程来实现的 • Kernel uses swap maps to track swap-space use. OS核心使用交换映像跟踪交换空间的使用情况。 • Solaris 2 allocates swap space only when a page is forced out of physical memory, not when the virtual memory page is first created. Solaris 2 仅在一页被交换出物理内存的时候分配交换空间,而不是在虚拟内存页最初生成的时候。 Operating System Concepts

  31. Fig 14.7 4.3 BSD Text-Segment Swap Map • Fixed size (512KB), 最后一块除外(最后一块1KB为分配增量) Operating System Concepts

  32. Fig14.8 4.3 BSD Data-Segment Swap Map Index i= 0 1 2 3 4 … • 分配较复杂,因为数据段是动态增长的 • Map is fixed size. Given index i (map entry number), the block size pointed by i is 2i x 16KB, maximum of 2MB • When a process tries to grow its data segment beyond the final allocated block in its swap area, the operating system allocates another block, twice as large as the previous one. Operating System Concepts

  33. 14.5 RAID Structure RAID结构 • RAID – Redundant Arrays of Inexpensive Disk • multiple disk drives provides reliability via redundancy. • Inexpensive  Independent • RAID is arranged into six different levels. • Mean time to failure (平均故障时间) • Mean time to repair (平均修复时间) • Mean time to data loss (平均数据丢失时间) Operating System Concepts

  34. RAID (cont) • Several improvements in disk-use techniques involve the use of multiple disks working cooperatively. • Improvement of Reliabilityvia Redundant 通过冗余提高可靠性 • Reliability  Redundancy • Simplest (but most expensive) approach is to duplicate every disk, called mirroring (or shadowing) • RAID schemes improve performance and improve the reliability of the storage system by storing redundant data. • Mirroring or shadowing keeps duplicate of each disk. • Block interleaved parity(块交叉存取校验) uses much less redundancy. Operating System Concepts

  35. RAID (cont) • Improvement of Performance via Parallelism 通过并行性提高性能 • With multiple disks, we can improve the transfer rate as well by striping data (bit-level or block-level) across multiple disks • Disk datastriping uses a group of disks as one storage unit. • Bit-level striping: • e.g. 4 disks, bits i and 4+i each byte go to disk i • Block-level striping • e.g. n disks, block i of a file goes to disk (i mod n)+1 • Goals: • Increase the throughput of multiple small accesses by load balancing • Reduce the response time of large accesses Operating System Concepts

  36. Fig 14.9 RAID Levels • P– error-correcting bits • C– a second copy of the data Operating System Concepts

  37. RAID Levels • RAID Level 0: disk arrays with striping at the level of blocks, but without any redundancy • RAID Level 1: disk mirroring • RAID Level 2: memory-style error-correcting code(ECC) organization • All single-bit errors are detected by the memory system • Error-correcting schemes store two or more extra bits • Many extra disks, so it’s not used in practice Operating System Concepts

  38. RAID Levels • RAID Level 3: bit-interleaved parity (位交叉存取校验) organization • 如果有某一个扇区坏了,可以确切地知道是哪一个;并且可以通过另一磁盘上的相应数据计算出该坏块上的每一位是0或是1 • 所需的冗余硬盘比RAID Level 2少 • 性能问题:计算和写校验数据需要时间 • 改进:使用NVRAM(non-volatile RAM) 或 Cache • RAID Level 4: block-interleaved parity (块交叉存取校验) organization • 与RAID Level 3相类似, 但在另一硬盘上保存了校验块(not bit) • Block-level striping • 每个硬盘上按块访问,并行性比较好 • 问题:一次写操作,需要访问磁盘4次(2次读老的块<数据块和校验块>,2次写新的块) , 也即:read-modify-write Operating System Concepts

  39. RAID Levels • RAID Level 5: block-interleaved distributed parity (块交叉分布式存取校验) • 与RAID Level 4的区别:将数据和校验分布在所有N+1个磁盘上,而不是将数据写在N个盘上,将校验写在1个盘上 • 例如:磁盘阵列有5个硬盘构成,则第n个数据块的校验信息存放在第(n mod 5)+1个盘上,而第n块的实际数据分布在另外4个磁盘上 • 改进:与RAID Level 4相比,可以避免校验盘访问过频 • RAID Level 6: P+Q redundancy scheme • 与RAID Level 5的相类似,但存放更多冗余信息,以防多个磁盘失效 • 使用Error-correcting code,而不是parity,因此需要更多的冗余信息 Operating System Concepts

  40. Fig 14.10 RAID (0 + 1) and (1 + 0) Operating System Concepts

  41. RAID Levels • RAID Level 0+1: a combination of RAID 0 and 1 • RAID 0 提供性能(performance) • RAID 1 提供可靠性(reliability) • Better performance than RAID 5 • A set of disks are striped, and then the stripe is mirrored to another, equivalent stripe • RAID Level 1+0: disks are mirrored in pairs, and then the resulting mirror pairs are striped • Have advantages over RAID 0+1 theoretically Operating System Concepts

  42. RAID Levels • RAID 1+0 have advantages over RAID 0+1 theoretically, for example: • If a single disk fails in RAID 0+1, the entire stripe is inaccessible, leaving only the other stripe available • With a failure in RAID 1+0, the single disk is unavailable, but its mirrored pair is still available as are all the rest of the disks Operating System Concepts

  43. 14.5.4 Selecting a RAID level • RAID Level 0: used in high-performance applications • RAID Level 1: used in high-reliability with fast recovery • RAID Level 0+1 and 1+0: used in high-performance and reliability with fast recovery • RAID Level 5: preferred for storing large volumes of data • Hot spare disk 热备份磁盘: is not used for data, but is configured to be used as a replacement should any other disk fail. • Allocating more than one hot spare allows more than one failure to be repaired without human intervention Operating System Concepts

  44. 14.5.5 Extensions • The concepts of RAID have generalized to other storage devices, including arrays of tapes and even to the broadcast of data over wireless systems • Tape-drive robots Operating System Concepts

  45. 14.6 Disk Attachment磁盘连接 • Disks may be attached one of two ways: 1.Host attached via an I/O port -- IDE, ATA and SCSI 2. Network attached via a network connection Operating System Concepts

  46. Fig 14.11 Network-Attached Storage Operating System Concepts

  47. Fig 14.12 Storage-Area Network Operating System Concepts

  48. SAN • SAN is a private network using storage protocols rather than networking protocols • 当前SAN 系统普遍存在的缺陷: • 协议不标准 • 设备的互操作性差 • 发展趋势: • 用IP (Gigabit Ethernet) 网络协议作为交换设备 Operating System Concepts

  49. 14.7 Stable-Storage Implementation稳定存储实现 • Write-ahead log scheme requires stable storage.向前写日志系统需要稳定存储。 • To implement stable storage:为了实现稳定存储 • Replicate information on more than one nonvolatile storage media with independent failure modes. 在多个非易失性存储介质上备份信息,这些介质具有不同的故障方式。 • Update information in a controlled manner to ensure that we can recover the stable data after any failure during data transfer or recovery. 以一种有控制的方式更新信息,以便确保在数据传输或修复的过程中发生错误以后我们能够恢复稳定的数据。 Operating System Concepts

  50. 14.8 Tertiary Storage Structure三级存储结构 • Low cost is the defining characteristic of tertiary storage.三级存储的定义特征是低成本。 • Generally, tertiary storage is built using removable media通常,三级存储由可移动介质构成。 • Common examples of removable media are floppy disks、CD-ROMs, and tapes; other types are available. 通常的可移动介质的例子是软盘、光盘和磁带;其他还有一些类型。 Operating System Concepts

More Related