400 likes | 411 Views
Learn about the MIPS instruction set architecture, system performance, processor design, memory hierarchy, and more. Understand how computers work and the factors that impact their performance.
E N D
Overview • Instruction set architecture (MIPS) • Arithmetic operations & data • System performance • Processor • Datapath and control • Pipelining to improve performance • Memory hierarchy • I/O
Focus • How computers work • MIPS instruction set architecture • The implementation of MIPS instruction set architecture – MIPS processor design • Issues affecting modern processors • Pipelining – processor performance improvement • Cache – memory system, I/O systems
Why Learn Computer Architecture? • You want to call yourself a “computer scientist” • Computer architecture impacts every other aspect of computer science • You need to make a purchasing decision or offer “expert” advice • You want to build software people use – sell many, many copies-(need performance) • Both hardware and software affect performance • Algorithm determines number of source-level statements • Language/compiler/architecture determine machine instructions • Processor/memory determine how fast instructions are executed • Assessing and understanding performance
Objectives • How programs written in a high-level language (e.g., Java/C++) translate into the language of the hardware and how the hardware executes them. • The interface between software and hardware and how software instructs hardware to perform the needed functions. • The factors that determine the performance of a program • The techniques that hardware designers employ to improve performance. As a consequence, you will understand what features may make one computer design better than another for a particular application
Evolution… • In the beginning there were only bits… and people spent countless hours trying to program in machine language 01100011001011001110100 • Finally before everybody went insane, the assembler was invented: write in mnemonics called assembly language and let the assembler translate (a one to one translation) add A,B • This wasn’t for everybody, obviously… (imagine how modern applications would have been possible in assembly), so high-level language were born (and with them compilers to translate to assembly, a many-to-one translation) C= A*(SQRT(B)+3.0)
THE BIG IDEA • Levels of abstraction: each layer provides its own (simplified) view and hides the details of the next.
Instruction Set Architecture (ISA) • ISA: An abstract interface between the hardware and the lowest level software of a machine that encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on. “... the attributes of a [computing] system as seen by the programmer, i.e.,the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation.” – Amdahl, Blaauw, and Brooks, 1964 • Enables implementations of varying cost and performance to run identical software • ABI (application binary interface): The user portion of the instruction set plus the operating system interfaces used by application programmers. Defines a standard for binary portability across computers.
Compiler High-level language program (in C) Assembler High-level to Machine Language Assembly language program (for MIPS) Binary machine language program (for MIPS)
How Do the Pieces Fit Together? Application Operating System • Coordination of many levels of abstraction • Under a rapidly changing set of forces • Design, measurement, and evaluation Compiler Firmware Instruction Set Architecture Memory system Instr. Set Proc. I/O system Datapath & Control Digital Design Circuit Design
Anatomy of Computer 5 classic components Personal Computer Keyboard, Mouse Computer Processor Memory (where programs, data live when running) Devices Disk(where programs, data live when not running) Input Control (“brain”) Datapath (“brawn”) Output Display, Printer • Datapath: performs arithmetic operation • Control: guides the operation of other components based on the user instructions
Moore’s Law • In 1965, Gordon Moore predicted that the number of transistors that can be integrated on a die would double every 18 to 24 months (i.e., grow exponentially with time). • Amazingly visionary – million transistor/chip barrier was crossed in the 1980’s. • 2300 transistors, 1 MHz clock (Intel 4004) - 1971 • 16 Million transistors (Ultra Sparc III) • 42 Million transistors, 2 GHz clock (Intel Xeon) – 2001 • 55 Million transistors, 3 GHz, 130nm technology, 250mm2 die (Intel Pentium 4) - 2004 • 140 Million transistor (HP PA-8500)
Moore’s Law • “Cramming More Components onto Integrated Circuits” • Gordon Moore, Electronics, 1965 • # of transistors per cost-effective integrated circuit doubles every 18 months • “Transistor capacity doubles every 18-24 months” Speed 2x / 1.5 years (since ‘85); 100X performance in last decade
Memory • Dynamic Random Access Memory (DRAM) • The choice for main memory • Volatile (contents go away when power is lost) • Fast • Relatively small • DRAM capacity: 2x / 2 years (since ‘96); 64x size improvement in last decade • Static Random Access Memory (SRAM) • The choice for cache • Much faster than DRAM, but less dense and more costly • Magnetic disks • The choice for secondary memory • Non-volatile • Slower • Relatively large • Capacity: 2x / 1 year (since ‘97)250X size in last decade • Solid state (Flash) memory • The choice for embedded computers • Non-volatile
Memory • Optical disks • Removable, therefore very large • Slower than disks • Magnetic tape • Even slower • Sequential (non-random) access • The choice for archival
DRAM Capacity Growth 128GB 0.02µm 2017
Trend: Memory Capacity • year size (Mbit) • 1980 0.0625 • 1983 0.25 • 1986 1 • 1989 4 • 1992 16 • 1996 64 • 1998 128 • 2000 256 • 512 • 2G • 2010 8G • 16G • 2017 128G • Approx. 2X every 2 years.
Example Machine Organization • Workstation design target • 25% of cost on processor • 25% of cost on memory (minimum memory size) • Rest on I/O devices, power supplies, box Computer CPU Memory Devices Control Input Datapath Output
MIPS R3000 Instruction Set Architecture Registers R0 - R31 • Instruction Categories • Load/Store • Computational • Jump and Branch • Floating Point • coprocessor • Memory Management • Special PC HI LO 3 Instruction Formats: all 32 bits wide OP rs rd sa funct rt OP rs rt immediate OP jump target
Defining Performance • Which airplane is the best?
Response Time and Throughput • Response time • How long it takes to do a task • Throughput • Total work done per unit time • e.g., tasks/transactions/… per hour • How are response time and throughput affected by • Replacing the processor with a faster version? • Adding more processors? • We’ll focus on response time for now…
Relative Performance • Define Performance = 1/Execution Time • “X is n time faster than Y” • Example: time taken to run a program • 10s on A, 15s on B • Execution TimeB / Execution TimeA= 15s / 10s = 3/2 = 1.5 • So A is 1.5 times faster than B
Measuring Execution Time • Elapsed time • Total response time, including all aspects • Processing, I/O, OS overhead, idle time • Determines system performance • CPU time • Time spent processing a given job • Discounts I/O time, other jobs’ shares • Comprises user CPU time and system CPU time • Different programs are affected differently by CPU and system performance
CPU Clocking Clock period • Operation of digital hardware governed by a constant-rate clock Clock (cycles) Data transferand computation Update state • Clock frequency (rate): cycles per second (influenced by CPU design) • e.g., 4.0GHz = 4000MHz = 4.0×109Hz • Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10–12s • also = 1/(clock rate)
CPU Time (for a particular program) • Performance improved by • Reducing number of clock cycles (cycle count) • Increasing clock rate • Hardware designer must often trade off clock rate against cycle count • Clock Frequency = Clock Rate(GHz) = 1/Clock Period(Cycle Time)
CPU Time Example • Computer A: 2GHz clock, 10s CPU time • Designing Computer B • Aim for 6s CPU time • Can do faster clock, but causes 1.2 × clock cycles (A’s) • How fast must Computer B clock be?
Instruction Count and Cycles Per Instruction (CPI) • Instruction Count per program • Determined by program, ISA and compiler • Average cycles per instruction • Determined by CPU hardware • If different instructions have different CPI • Average CPI affected by instruction mix
CPI Example • Computer A: Cycle Time = 250ps, CPI = 2.0 • Computer B: Cycle Time = 500ps, CPI = 1.2 • Same ISA • Which is faster, and by how much? A is faster… …by this much
CPI in More Detail • If different instruction classes take different numbers of cycles • Weighted average CPI Relative frequency
CPI Example • Alternative compiled code sequences using instructions in classes A, B, C • Sequence 1: IC = 5 • Clock Cycles= 2×1 + 1×2 + 2×3= 10 • Avg. CPI = 10/5 = 2.0 • Sequence 2: IC = 6 • Clock Cycles= 4×1 + 1×2 + 1×3= 9 • Avg. CPI = 9/6 = 1.5
Performance Summary The BIG Picture • Performance depends on • Algorithm: affects IC, possibly CPI • Programming language: affects IC, CPI • Compiler: affects IC, CPI • Instruction set architecture: affects IC, and CPI CPU Time = IC x CPI x Clock cycle time:
Pitfall: MIPS as a Performance Metric • MIPS: Millions of Instructions Per Second • Doesn’t account for • Differences in ISAs between computers • Differences in complexity between instructions • CPI varies between programs on a given CPU
Concluding Remarks • Cost/performance is improving • Due to underlying technology development • Hierarchical layers of abstraction • In both hardware and software • Instruction set architecture • The hardware/software interface • Execution time: the best performance measure • Power is a limiting factor • Use parallelism to improve performance