180 likes | 309 Views
Lecture 1: Introduction. CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang. Traditional “Computer Architecture”.
E N D
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang
Traditional “Computer Architecture” The term architecture is used here to describe the attribute of a system as seen by the programmer, i.e., theconceptual structure and functional behavior as distinct from the organization of the data flow and controls, the logic design, and the physical implementation. • Gene Amdahl, IBM Journal R&D, April 1964
Contemporary “Computer Architecture” • Instruction set architecture: program-visible instruction set • Instruction format, memory addressing modes, architectural registers, endian type, alignment, … • EX: RISC, CISC, VLIW, EPIC • Organization: high-level aspects of a computer’s design • Pipeline structure, instruction scheduling, cache, memory, disks, buses, etc. • Implementations: the specifics of a machine • Logic design, packaging technology
Fundamentals • ISA design principles and performance evaluation • The impacts of technology trends and market factors • Performance evaluation methodologies
High Performance Computer Architecture Given a huge number of transistors, how to run programs as rapid as possible? • Sequential Programs • Parallel and multiprogramming programs
Instruction Level Parallelism Sequential program performance: Execution Time = #inst × CPI × Cycle time • Pipelining works well for sequential programs • But best Performance limited by CPI >= 1.0 • Pipeline hazards draws back performance
Multi-issue Pipeline Naïve extension to multi-issue IF IF IF IF IF ID ID ID ID ID EX EX EX EX EX MEM MEM MEM MEM MEM WB WB WB WB WB
for (i=0; i<N; i++) X[i] = a*X[i]; // let R3=&X[0],R4=&X[N] // and F0=a LOOP:LD.D F2, 0(R3) MUL.D F2, F2, F0 S.D F2, 0(R3) DADD R3, R3, 8 BNE R3, R4, LOOP How much parallelism exist in the program? What’s the problem with the naïve multi-issue pipeline? Data hazards Control hazards Pipeline Efficiency
How to Exploit ILP? Find independent instructions through dependence analysis • Hardware approaches => Dynamically scheduled superscalar • Most commonly used today: Intel Pentium, AMD, Sun UltraSparc, and MIPS families • Software approaches => (1) Static scheduled superscalar, or (2) VLIW
Dynamically Scheduled Superscalar Important features: • Multi-issue and Deep pipelining • Dynamic scheduling • Speculative execution • Branch prediction • Memory dependence speculation • Non-blocking caches • High bandwidth caches
Dynamically Scheduled Superscalar Challenges: Complexity!!! Key issues: • Understand why it is correct • Know dependences • Will prove that dynamic execution is “correct” • Understand how it brings high performance • Will see wield designs • Will use Verilog, simulation to help understanding • Have big pictures
Memory System Performance • A typical memory hierarchy today: • Here we focus on L1/L2/L3 caches, virtual memory and main memory Proc/Regs L1-Cache Bigger Faster L2-Cache L3-Cache (optional) Memory Disk, Tape, etc.
Memory System Performance Memory Stall CPI = Miss per inst × miss penalty = % Mem Inst × Miss rate × Miss Penalty Assume 20% memory instruction, 2% miss rate, 400-cycle miss penalty. How much is memory stall CPI?
Cache Design Many applications are memory-bound • CPU speeds increases fast; memory speed cannot match up Cache hierarchy: exploits program locality • Basic principles of cache designs • Hardware cache optimizations • Application cache optimizations • Prefetching techniques Also talk about virtual memory
High Performance Storage Systems What limits the performance of web servers? Storage! • Storage technology trends • RAID: Redundant array of inexpensive disks
Multiprocessor Systems Must exploit thread-level parallelism for further performance improvement Shared-memory multiprocessors: Cooperating programs see the same memory address How to build them? • Cache coherence • Memory consistency
Other Topics • VLIW basics and modern VLIW processors • Simultaneous multithreading and chip-level multiprocessing • Low power processor design • Circuit issues in high-performance processor • Other selected topics
Why Study Computer Architecture As a hardware designer/researcher – know how to design processor, cache, storage, graphics, interconnect, and so on As a system designer – know how to build a computer system using the best components available As a software designer – know how to get the best performance from the hardware