240 likes | 403 Views
CS1104 – Computer Organization. PART 2: Computer Architecture Lecture 1 Introduction. Textbook: Computer Organization and Design by David Patterson and John Hennessy, Morgan Kaufmann Publishers. PLEASE READ THE TEXTBOOK (which Chapters & Sections to be read will be mentioned)
E N D
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 1 Introduction
Textbook: Computer Organization and Design by David Patterson and John Hennessy, Morgan Kaufmann Publishers. • PLEASE READ THE TEXTBOOK (which Chapters & Sections to be read will be mentioned) • Instructor: Samarjit Chakraborty Office: S14 #06-04. Phone: 6874 7997 • There will be tutorials and assignments as before. Only ONE lab for this part (date to be announced later) • Midterm test for this part on Friday, 8 April, 2005 • This part has no apparent similarities with Part 1 of the course, but it builds on Part 1 • In many universities this is part is taught as a full course
Hence, there will be a lot of material to cover • But, there are a few basic principles and if you understand these principles well then the course will be fairly easy • The lecture notes are only meant to assist in the lecturing • We shall follow the textbook very closely • Please READ THE TEXTBOOK (this line is being purposely repeated)
Why study this course? • The prospective software engineer, writing a program: • Why should I declare variable “x” to be an integer rather than floating point? • Should I use array or linked-list as my data structure? • Should I use multiplication, or implement it as repeated additions? • The embedded systems programmer developing code for a PDA or a programmable dishwasher: • How is the memory of this device organized? What instructions are supported? • I need to program it in assembly language!!
Why study this course? • The prospective computer/hardware engineer (say working at Intel and developing a future Pentium processor): • Needs to have a strong background in Computer Architecture • Basic understanding of how a computer works and what is its internal organization • Foundation for more advanced courses • Necessary for hardware engineers as well as for all kinds of software developers
This course is about… • Desktop computing: Programs with integer and floating point data types, no regard for code size and power consumption • Servers: Database, file server, web and time-sharing applications for many users. Integer and string operations more important than floating-point operations • Embedded computing: Values cost, power and code size. Floating point operations might be restricted to reduce costs • …… Cost/Performance/Power/Size Tradeoffs!
Application Operating System Compiler Firmware Instruction set Memory organization I/O system The global picture Programming Languages, Compilers,Operating Systems, Software Engineering Instruction Set Architecture PART 2 Datapath & Control Digital Design PART 1 Circuit Design Electrical Engineering Layout
Program to Execution: The Flow Link multiple machine-language programs to one program Program in High-level language (C, Pascal, etc) Load program into computer’s memory Compile program into assembly language Assemble program to machine language Execute program
Organization Bus Processor Memory Devices Control Input Cache Datapath Program + Data Output Registers
0 17 18 8 0 32 Representing instructions in computer: Assembly Language: Add R0, R1, R2 means R2 R0 + R1 Machine Language: 000000 10001 10010 01000 00000 100000 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits op rs rt rd shamt funct Source Operand (register) Basic operation Destination register Shift amount Function code variants of the op field
Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction • The Clock • Everything in one clock cycle OR each step in one cycle? • Pipeline instruction execution? • The Memory • How much time does it take to access memory? • How is it organized? Cache? • The Bus • How many buses? Avoid bus conflicts …
Issues in Performance Evaluation • Response time: time between start and completion time of an event (execution time) • Throughput: total amount of work (or number of jobs) done in a given time Ex: Replace a processor with a faster processor or add multiple slower processors? How is throughput and response time effected? How does this depend on job arrival rate? • Notion of performance: TIME - user perceived time, CPU time (user CPU time + system CPU time), …
Benchmarks: Choosing Programs to Evaluate Performance Disadvantages Advantages Measure the performance of a machine using a set of programs which will hopefully emulate the workload generated by the user’s programs. Benchmarks: programs designed to measure performance • overly specific • non-portable • difficult to run • hard to identify source • representative Actual Target Workload • portable • widely used • improvements useful in reality • less representativethan above Full Application Benchmarks Small “Kernel” Benchmarks • easy to use early in design cycle • easy to “fool” • “peak” may be far way from real application performance • identify peak capability and potential bottlenecks Microbenchmarks
10 5 times faster A C Architectures B B 10 2 Faster Which architecture is better? A? B? or C? A C programs P1 P2 Summarizing Performance Average execution time: Weighted execution time: where the sum of the weights is equal to 1
SPEC Benchmarks • Normalized execution times (divide execution time on a Sun SPARCstation by the execution time on the measured machine M) – SPEC ratio • Performance of a new program on M = performance of the program on reference machine x SPEC ratio • Average normalized execution times of multiple benchmarks can be expressed as either an arithmetic or a geometric mean Geometric mean of normalized execution times =
Design Principle Make the common case fast: Amdahl’s Law Execution time of entire task without enhancement Speedup = Execution time of entire task with enhancement • Fraction of the execution time that can benefit from the enhancement • How much faster does the enhanced part run? Execution time of entire task without enhancement = eold x (1-fracenhanced) + enew x fracenhanced
Examples • Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions? Time = 5 {non fl-pt.} + 5 {fl-pt.} / 5 = 6 sec. Speedup = 10 / 6 = 1.33 • We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark? Speedup = 3 = 100 / (tfl / 5 + 100 - tfl) tfl = 83.33sec.
time CPU time = # clock cycles x clock cycle time required for for executing a program the prgram The CPU Performance Equation • Instead of reporting execution time in seconds, we often use the number of clockcycles spent in executing a program • Any instructions always starts at the beginning of a clock cycle • Clock cycle time = time between two ticks (in seconds) • Clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)A 200 Mhz. clock has a cycle time Cycle time
1st instruction 2nd instruction 3rd instruction ... 4th 5th 6th The CPU Performance Equation How many clock cycles are required to execute a program? if different instructions require different number of clock cycles (which is mostly the case) CPU time = (#instrA x #cyclesA + #instrB x cyclesB + …) x cycle time instrA, instrB, … are the different possible instructions
The CPU Performance Equation • Instruction Count (IC) = Total number of instructions executed for a program • Average number of Clock Cycles Per Instruction (CPI) = Total # Clock Cycles/IC • Total CPU time = IC x CPI x Clock Cycle Time Therefore, CPU performance depends on: • Clock Cycle Time – Hardware technology & Organization • CPI – Organization & Instruction Set Architecture • IC – Instruction Set Architecture (ISA) & Compiler These are dependent on each other, but at times one can be changed with small and predictable impacts on the other two
CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle instr count ave. CPI clock rate Program X (X) Compiler X X ISA X X Organization X X Technology X IC CPI Clock Cycle Time
Be careful of the following concepts: • Machine ISA and hardware organization • Machine Cycle time • ISA + hardware organization # cycles for any instruction (this is not CPI) • ISA + Compiler + Program # instructions executed • Therefore, ISA + Compiler + Program + hardware organization + Cycle time Total CPU time
Summary • What is “Performance”? • How to determine performance? – Use benchmarks • How to summarize performance results • Amdahl’s Law – Make the common case faster • CPU performance equation • What determines CPU time (running time) of a program? -#instructions, cycles per instruction, cycle time • How do each of these depend on the ISA, organization, compiler, hardware technology …