390 likes | 529 Views
CS 325: CS Hardware and Software Organization and Architecture. Computer Evolution and Performance 2. Outline. Von Neumann Architecture Processor Hierarchy Registers ALU Processor Categories Processor Performance Amdahl’s Law Computer Benchmarks. Von Neumann Architecture.
E N D
CS 325: CS Hardware and SoftwareOrganization and Architecture Computer Evolution and Performance 2
Outline • Von Neumann Architecture • Processor Hierarchy • Registers • ALU • Processor Categories • Processor Performance • Amdahl’s Law • Computer Benchmarks
Von Neumann Architecture • Characteristic of most modern processors. • Central idea is Stored Program. • Three basic components: • Processor • Memory • I/O Facilities
Processor • Digital Device. • Performs computation involving multiple steps. • Building blocks used to form computer system.
Hierarchical Structure and Computational Engines • Most computer architecture follows a hierarchical approach. • Subparts of a large, central processor are sophisticated enough to meet our definition of a processor. • Some engineers use the term computational engine for sub-piece that is less powerful than the main processor.
Major Components of a Conventional Processor • Controller • Computational Engine (ALU) • Local Data Storage • Internal Interconnections • External Interface
Parts of a Conventional Processor • Controller • Overall responsibility for execution • Moves through sequence of steps • Coordinates other units • Computational Engine • Operates as directed by controller • Typically provides arithmetic and Boolean operations (ALU) • Performs one operation at a time
Parts of a Conventional Processor • Local Data Storage • Holds data values for operations • Must be loaded before operation can be performed • Typically implemented with registers • Internal Interconnections • Allows transfer of values among units of the processor • Sometimes called data path
Parts of a Conventional Processor • External Interface • Handles communication between processor and rest of computer system • Provides connections to external memory as well as external I/O devices
Parts of a Conventional Processor • ALU • Status Flags: • Neg, Zero, Carry, Overflow • Shifter: • Left multiplication by 2 • Right division by 2 • Complementer: • Logical NOT
Processor Registers • Motorola CPU - MC68000 • 8 32-bit general purpose registers (D0 – D7) • 8 32-bit address registers (A0 – A7) • 1 32-bit program counter • 1 16 status register
Processor Registers • Intel 8086 – 16-bit • General Purpose: • AX – Accumulator: Multiply, Divide, I/O • BX – Base: Pointer to base address (data) • CX – Count: Counter for loops, shifts • DX – Data: Multiply, Divide, I/O • Pointer and Index: • SP – Stack Pointer: pointer to top of stack • BP – Base Pointer: pointer to base address (stack) • SI – Source Index: source string/index pointer • DI – Destination Index: Destination string/index pointer • Segment Registers: • CS – Code Segment • DS – Data Segment • SS – Stack Segment • ES – Extra Segment • Program Status: • PC – Program Counter • SR – Status Register
Processor Registers • Intel 80386 – Pentium 2 • Similar to 8086, but register width doubled to 32-bit
Arithmetic Logic Unit (ALU) • Main computational engine in conventional processor. • Complex unit that can perform variety of tasks • Integer arithmetic (add, subtract, multiply, divide) • Shift (left, right, circular) • Boolean (AND, OR, NOT, XOR) • Typically CPU “bit size” refers to ALU and register size • 32-bit CPU 32-bit ALU and registers • 64-bit CPU 64-bit ALU and registers
Processor Categories and Roles • Many possible roles for individual processors in: • Coprocessors • Microcontrollers • Microsequencers • Embedded system processors • General purpose processors
Coprocessor • Operates in conjunction with and under the control of another processor. • Special purpose processor • Performs a single task • Operates at high speed • Example: • Math Coprocessor • Used for floating point mathematical operations
Microcontroller • Programmable device • Dedicated to control of a physical system • Example: • ECU for automobile engine • Roadway intersection traffic lights
Microsequencer • Similar to microcontroller • Controls coprocessors and other engines within a large processor • Example: • Move operands to floating point unit • Invoke an operation (divide) • Move result back to memory
Embedded System Processor • Operates sophisticated electronic device • Usually more powerful than microcontroller • Example: • Controlling a DVD player, including commands from a remote control
General Purpose Processor • Most powerful type of processor • Completely programmable • Full functionality • Example: • CPU in personal computer/laptop (CISC x86 architecture) • CPU in smartphone/tablet (RISC ARM architecture)
Clock and Instruction Rate • Clock Cycle • Time interval in which all basic circuits (steps) inside a process must complete • Time at which gates are clocked (gate-signal propagation) • Clock Rate • 1/clock cycle (GHz – billion cycles per second) • Instruction Rate • Measure of time required to execute instructions • MIPS – million instructions per second • Varies since some instructions take more time (more clock cycles) than others • Shift left instruction vs. fetch from memory instruction
Basic Performance Equation Define: N = Number of instructions executed in the program S = Average number of cycles for instructions in the program R = Clock rate T = Program execution time T = N * S R
Improve Performance • To improve performance: • Decrease N and/or S • Increase R • Parameters are not independent: • Increasing R may increase S as well • N is primarily controlled by compiler • Processors with large R may not have the best performance • Due to larger S • Making logic circuits faster/smaller is a definite win • Increases R while S and N remain unchanged
Amdahl’s Law • Potential speed up of program using multiple processors. • Concluded that: • Code needs to be parallelizable • Speed up is bound, giving diminishing returns for more processors • Task dependent • Servers gain by maintaining multiple connections on multiple processors • Databases can be split into parallel tasks
Amdahl’s Law • Most important principle in computer design: • Make the common case fast • Optimize for the normal case • Enhancement: any change/modification in the design of a component • Speedup: how much faster a task will execute using an enhanced component versus using the original component. Speedup = Componentenhanced Componentoriginal
Amdahl’s Law • The enhanced feature may not be used all the time. • Let the fraction of the computation time when the enhanced feature is used be F. • Let the speedup when the enhanced feature is used be Se. • Now the execution time with the enhancement is: Exnew = Exold * (1 – F) + Exold * (F/Se) This gives the overall speedup (So) as: So = Exold/Exnew = 1 / ((1 - F) + (F/Se))
Amdahl’s Law – Example 1 • Suppose that we are considering an enhancement that runs 10 times faster than the original component but is usable only 40% of the time. What is the overall speedup gained by incorporating the enhancement? Se = 10 F = 40 / 100 = 0.4 So = 1 / ((1 – F) + (F / Se)) = 1 / (0.6 + (0.4 / 10)) = 1 / 0.64 = 1.56
Amdahl’s Law – Example 2 • Suppose that we hired a guru programmer that made 70% of our program run 15x faster that the original program. What is the speedup of the enhanced program? Se = 15 F = 70 / 100 = 0.7 So = 1 / ((1 – F) + (F / Se)) = 1 / (0.3 + (0.7 / 15)) = 1 / 0.347 = 2.88
Amdahl’s Law – Example 3 • Suppose that we hired two students to enhance our WKU web Server performance. The first student increased the performance of the server by 12% for 85% of the time. The second student increased the performance of the server by 2x for 25% of the time. Which student produced the overall highest speedup? Student1 Student2 Se = 1.12 Se = 2 F = 85 / 100 = 0.85 F = 25 / 100 = 0.25 So = 1 / ((1 – F) + (F / Se)) So = 1 / ((1 – F) + (F / Se)) = 1 / (0.15 + (0.85 / 1.12)) = 1 / (0.75 + (0.25 / 2)) = 1 / 0.909 = 1 / 0.875 = 1.1 = 1.14
Benchmarks • LINPACK (Scientific Computing) • Speed in solving linear system of equations (matrix multiplications) • http://www.top500.org/list/2013/11/
Benchmarks - LINPACK • Current fastest supercomputer: • Tianhe-2 (MiklyWay-2) • 3.12 million cores @ 2.2Ghz • 33.86 Pflops/sec = 33,860,000,000,000,000 Floating point operations/sec • Current High End Desktop: • Intel I7 “Haswell” 4770k • 4 cores @ 3.5Ghz • 177 Gflops/sec = 177,000,000,000 Floating point operations/sec • Current Google Android Smartphone: • Google Nexus 5 • 4 cores @ 2.3Ghz ARM RISC Architecture • 393 Mflops/sec = 393,000,000 Floating point operations/sec