1 / 95

William Stallings Computer Organization and Architecture 5 th Edition

William Stallings Computer Organization and Architecture 5 th Edition. Chapter 11 CPU Structure and Function CPU 的结构和功能. Topics. Processor Organization Register Organization Instruction Cycle Instruction Pipelining The Pentium Processor. CPU Structure.

Download Presentation

William Stallings Computer Organization and Architecture 5 th Edition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. William Stallings Computer Organization and Architecture5th Edition Chapter 11 CPU Structure and Function CPU的结构和功能

  2. Topics • Processor Organization • Register Organization • Instruction Cycle • Instruction Pipelining • The Pentium Processor

  3. CPU Structure • CPU must: CPU必须具备的功能: • Fetch instructions 能够从存储器读取指令 • Interpret instructions 对指令进行解析译码 • Fetch data 取指令所需的数据 • Process data 对数据进行处理 • Write data 将处理后的数据写回目的地 CPU需要一个小的内部存储器暂存数据和指令 CPU needs a small internal memory

  4. CPU With Systems Bus

  5. CPU Internal Structure

  6. Registers (寄存器) • CPU must have some working space (temporary storage) CPU必须有部分工作空间进行暂时存储 • Called registers 这部分空间叫寄存器 • Number and function vary between processor designs 它们的数量和功能因处理器的设计而不同 • One of the major design decisions 寄存器是设计CPU时考虑的一个主要因素 • Top level of memory hierarchy 位于存储器分级中的较高层 • Two categories: 分为两类: • User-visible registers 用户可见寄存器 • Control and status registers 控制和状态寄存器

  7. User Visible Registers 用户可见寄存器 • General Purpose 通用 寄存器 • Data 数据 寄存器 • Address 地址 寄存器 • Condition Codes 条件代码 寄存器

  8. User Visible Registers • General Purpose Registers • May be true general purpose 真正意义的通用 • May be restricted 可能有一定的限制 • Data registers • Accumulator register 累加寄存器 • Addressing registers • Segment pointers 段寄存器 • Index registers 变址寄存器 • Stack Pointer 堆栈寄存器

  9. General or Special? 比较 • Make them general purpose • Increase flexibility and programmer options 增加了灵活性和程序员的可选择性 • Increase instruction size & complexity 增加了指令的长度和复杂度 • Make them specialized • Smaller (faster) instructions 指令更小更块 • Less flexibility 灵活性变低 • The trend seems to be toward the use of specialized registers. 现在趋向于专用寄存器

  10. How Many GP Registers? 个数 • Between 8 – 32 大都8-32个 • Fewer = more memory references 寄存器个数太少,导致频繁访问存储器 • More does not reduce memory references 寄存器个数太多也不能显著减少访问存储器

  11. How big? 寄存器的长度 • Large enough to hold the largest address 要能够保存最长的地址 • Large enough to hold most data types 要能够保存大多数数据类型的值 • Often possible to combine two data registers 两个数据寄存器经常合并为一个使用 • C programming • double a; • long int a;

  12. Condition Code Registers • Condition codes are bits set by the CPU hardware as the result of operations. • Sets of individual bits 标志位的集合 • e.g. result of last operation was zero • At least partially visible to the user 至少部分对用户可见 • Can be read (implicitly) by programs 程序可以读取 • e.g. Jump if zero • Can not (usually) be set by programs 一般不能有程序进行设置

  13. Control & Status Registers • Program Counter 程序计数器(PC) • Instruction Register 指令寄存器(IR) • Memory Address Register 存储地址寄存器MAR • Memory Buffer Register 存储缓冲寄存器MBR • Revision: what do these all do?

  14. Program Status Word 程序状态字PSW • A set of bits,Includes Condition Codes 状态位集合 • Sign of last result 符号:最后算术运算结果符号位 • Zero 零标记:当结果是零时被置位 • Carry 进位标记:借位或进位时置位 • Equal 等于标记:逻辑比较结果相等置位 • Overflow 溢出标记:用于指示算术溢出 • Interrupt enable/disable 中断允许/禁止 • Supervisor 监督:指出CPU是执行在监督模式中 还是在用户模式中

  15. Program Status Word - Example • Motorola 68000’s PSW System Byte User Byte Interrupt Mask Supervisor Status Trace Mode 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 T S I2 I1 I0 X N Z V C

  16. Other Registers • May have registers pointing to: • Process control blocks (see O/S) 进程控制块(PCB) • Interrupt Vectors (see O/S) 中断向量 • N.B. CPU design and operating system design are closely linked CPU设计和操作系统设计紧密相关

  17. Example Register Organizations

  18. Instruction Cycle Fetch Cycle 取指令周期 Interrupt Cycle中断周期 Execute Cycle执行周期 Interrupt Disabled Check for Interrupt;Process Interrupt Fetch Next Instruction Execute Instruction START Interrupt Enabled HALT An Instruction cycle includes the following subcycles: 指令周期包括以下子周期

  19. Indirect Addressing Cycle • May require memory access to fetch operands 指令的执行需要访问存储器获得操作数 • Indirect addressing requires more memory accesses 间接寻址需要额外的存储器访问 • Can be thought of as additional instruction subcycle 可以把它看成是额外的指令子周期

  20. Instruction Cycle with Indirect

  21. Instruction Cycle State Diagram

  22. Data Flow (Instruction Fetch) • Depends on CPU design 指令周期期间,严格的事件序列取决于CPU的设计 • In general: • Fetch 取指令周期 • PC contains address of next instruction 开始PC拥有待取的下一条指令地址 • Address moved to MAR 将此地址送到MAR • Address placed on address bus 并放到地址总线上 • Control unit requests memory read 控制器请求读存储器 • Result placed on data bus, copied to MBR, then to IR 结果放到数据总线上并复制到MBR,然后传送到IR • Meanwhile PC incremented by 1 此时PC加1

  23. Data Flow (Fetch Diagram) 2 3 1 6 4 5

  24. Data Flow (Indirect Cycle) • IR is examined取指周期后控制器检查IR的内容 • If indirect addressing, indirect cycle is performed 若有一个使用间接寻址的操作数,则执行一个间址周期 • Right most N bits of MBR transferred to MAR MBR最右的N位是一个地址引用,被传送到MAR • Control unit requests memory read 控制器请求一个存储器读 • Result (address of operand) moved to MBR 得到所要求的操作数地址并送入MBR op-code address instruction format

  25. Data Flow (Indirect Diagram) 2 1 3

  26. Data Flow (Execute Cycle) • May take many forms 指令周期能取多种形式 • Depends on instruction being executed 取决于当前执行的指令 • May include • Memory read/write 存储器读写 • Input/Output I/O设备的读写 • Register transfers 寄存器间数据传送 • ALU operations ALU操作

  27. Data Flow (Interrupt Cycle) • Current PC saved to allow resumption after interrupt PC的当前内容必须被保存,以便在中断之后CPU能恢复正常的动作 • Contents of PC copied to MBRPC的内容传送到MBR • Special memory location (e.g. stack pointer) loaded to MAR 一个专门的存储器位置由控制器装入MAR • MBR written to memory 将MBR的内容写到存储器 • PC loaded with address of interrupt handling routine 中断子程序的地址装入PC • Next instruction (first of interrupt handler) can be fetched 下一指令周期将以取此相应的指令而开始

  28. Data Flow (Interrupt Diagram) 2 5 3 1 4

  29. Pipelining 流水处理 A B C D • Laundry Example • Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold 如有4个人有衣服要洗、干、叠 • Washer takes 30 minutes 洗需30分钟 • Dryer takes 40 minutes 干40分 • “Folder” takes 20 minutes 叠20分

  30. Sequential Laundry A B C D 6 PM Midnight 7 8 9 11 10 Time 30 40 20 30 40 20 30 40 20 30 40 20 T a s k O r d e r • Sequential laundry takes 6 hours for 4 loads • If they learned pipelining, how long would laundry take?

  31. Pipelined Laundry 30 40 40 40 40 20 A B C D 6 PM Midnight 7 8 9 11 10 Time T a s k O r d e r • Pipelined laundry takes 3.5 hours for 4 loads

  32. Pipelining Lessons(1) 30 40 40 40 40 20 A B C D • Pipelining doesn’t help latency of single task, it helps throughput of entire workload • Pipeline rate limited by slowest pipeline stage • Multiple tasks operating simultaneously • Potential speedup = Number pipe stages • Unbalanced lengths of pipe stages reduces speedup • Time to “fill” pipeline and time to “drain” it reduces speedup 6 PM 7 8 9 Time T a s k O r d e r

  33. Pipelining Lessons(2) • 流水线对执行单个任务没有帮助, 但是它能够提高整个系统的吞吐量 • 流水的改进比例受最少流水节拍的限制 • 思想:多个任务能够同时进行 • 理想加速比 = 流水节拍数 • 流水节拍的不平衡降低了加速比 • 开始的填充时间和最后的排空时间也会减少加速比

  34. Instruction Pipelining • Similar to assembly line in manufacturing plants: Products at various stages can be worked on simultaneously  Performance improved 在生产车间里,多个产品可以同时在不同的生产线上进行加工,这样就提高了效率 • First attempt: 2 stages 将指令周期分为2步 • Fetch 取指令 • Execution 执行

  35. Prefetch • Fetch accessing main memory 从存储器取指令 • Execution usually does not access main memory 执行时通常不访问存储器 • Can fetch next instruction during execution of current instruction 执行当前指令时可以预取下一条指令 • Called instruction prefetch 称为指令预取 • Ideally instruction cycle time would be halved (if durationF = durationE …) 理想情况下指令周期会减半

  36. Improved Performance(1) • But not doubled: 性能加倍不可能的原因 • Fetch usually shorter than execution 取指时间小于执行时间。 • Any jump or branch means that prefetched instructions are not the required instructions 任何跳转、分支指令意味着预取指令作废 • e.g., ADD A, B BEQ NEXT ADD B, C NEXT: SUB C, D

  37. Two Stage Instruction Pipeline

  38. Improved Performance (2) • Reduce time loss due to branching by guessing 可以通过预测来减少分支带来的时间损失 • Prefetch instruction after branching instruction 取指阶段取存储器中转移指令之后的指令 • If not branched 若转移未发生 use the prefetched instruction. 没有时间损失 else 若转移发 discard the prefetched instruction 己取指令作废 fetch new instruction 并取新的指令

  39. Pipelining • Add more stages to improve performance 流水线可以通过更多的阶段获得进一步的加速 • More stages  more speedup • FI: Fetch instruction 取指令 • DI: Decode instruction 指令译码 • CO: Calculate operands 计算操作数 • FO: Fetch operands 取操作数 • EI: Execute instructions 执行指令 • WO: Write result 写结果 • Various stages are of nearly equal duration 各阶段时间几乎相等 • Overlap these operations 这样就可以并行操作

  40. Timing of Pipeline

  41. Speedup of Pipelining (1) • 9 instructions 6 stages w/o pipelining: __ time units w/ pipelining: __ time units speedup = _____ • Q: 100 instructions 6 stages, speedup = ____ • Q:  instructions k stages, speedup = ____ • Can you prove it (formally)?

  42. Pipelining - Discussion • Not all stages are needed in one instruction • e.g., LOAD: WO not needed 并不是每条指令都必须包含所有阶段 • Assume all stages can be performed in parallel • e.g., FI, FO, and WO  memory conflicts 假设所有阶段能并行执行,没有冲突 • Timing is set up assuming all stages are needed by each instruction  Simplify pipeline hardware 为简化流水线硬件设计,在假定每条指令都要求所有阶段的基础上来建立时序 • Assuming no conditional branch instructions 假设没有条件分支指令

  43. Limitation by Branching • Conditional branch instructions can invalidate several instruction prefetches 条件分支能使多条指令作废 • In our example (see next slide) • Instruction 3 is a conditional branch to instruction 15 • Next instruction’s address won’t be known till instruction 3 is executed (at time unit 7) 指令3执行完后才知道下一条指令地址 • pipeline must be cleared • No instruction is finished from time units 9 to 12 • performance penalty 在时间9-12之间没有指令完成,导致性能惩罚

  44. Branch in a Pipeline

  45. Limitation by Data Dependencies • Data needed by current instruction may depend on a previous instruction that is still in pipeline 当前指令所需要的数据可能是上一条仍在执行的指令的结果 • E.g., A  B + C D  A + E

  46. Limitation by stage overhead • Ideally, more stages, more speedup 理想情况下,指令分段越多,加速比越大 • However, • more overhead in moving data between buffers 数据在缓冲器间传送需要花费开销 • more overhead in preparation and delivery functions 完成准备和递交功能也需要开销 • more complex circuit for pipeline hardware 需要更加复杂的硬件线路

  47. Pipeline Performance • Cycle time =max[i]+d= m+d 周期时间 m:maximum stage delay 最大段延迟k:number of stages 流水线段数 d:time delay of a latch 锁存延迟 • Execute n instruction time with pipeliningTk=[k+(n-1)] 流水执行n条指令的时间 • Time to execute n instructions without pipelining T1 = nk 非流水执行n条指令的时间 • Speedup 加速比Sk=T1/Tk=nk /[k+(n-1)]= nk /[k+(n-1)]

  48. Pipeline Performance • Speedup of k-stage pipelining compared to without pipelining • Q: instructions k stages, speedup = ____

More Related