1 / 13

C SINGH, JUNE 7-8, 2010

Advanced Computers Architecture Lecture 4 By Rohit Khokher Department of Computer Science, Sharda University, Greater Noida, India. C SINGH, JUNE 7-8, 2010. Advanced Computers Architecture, UNIT 1. IWW 2010 , ISATANBUL, TURKEY. High Performance Architectures.

sally
Download Presentation

C SINGH, JUNE 7-8, 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Computers Architecture Lecture 4 By Rohit Khokher Department of Computer Science, Sharda University, Greater Noida, India C SINGH, JUNE 7-8, 2010 Advanced Computers Architecture, UNIT 1 IWW 2010, ISATANBUL, TURKEY

  2. High Performance Architectures • Who needs high performance systems? • How do you achieve high performance? • How to analyses or evaluate performance? C SINGH, JUNE 7-8, 2010 Advanced Computers Architecture, UNIT 1 IWW 2010, ISATANBUL, TURKEY

  3. Outline of my lecture • Classification • ILP Architectures • Data Parallel Architectures • Process level Parallel Architectures • Issues in parallel architectures • Cache coherence problem • Interconnection networks C SINGH, JUNE 7-8, 2010 Advanced Computers Architecture, UNIT 1 IWW 2010, ISATANBUL, TURKEY

  4. Classification of Parallel Computing • Flynn’s Classification • Feng’s Classification • Händler’s Classification • Modern (Sima, Fountain & Kacsuk) Classification C SINGH, JUNE 7-8, 2010 Advanced Computers Architecture, UNIT 1 IWW 2010, ISATANBUL, TURKEY

  5. Feng’s Classification • Feng [1972] also proposed a scheme on the basis of degree of parallelism to • classify computer architectures. • Maximum number of bits that can be processed every unit of time by the system is called ‘ maximum degree of parallelism’. • Feng’s scheme performed sequential and parallel operations at bit and words level. • The four types of Feng’s classification are as follows:- • WSBS ( Word Serial Bit Serial) • WPBS ( Word Parallel Bit Serial) (Staran) • WSBP ( Word Serial Bit Parallel) (Conventional Computers) • WPBP ( Word Parallel Bit Parallel) (ILLIAC IV) C SINGH, JUNE 7-8, 2010 Advanced Computers Architecture, UNIT 1 IWW 2010, ISATANBUL, TURKEY

  6. 16K • MPP 256 • STARAN bit slice length • IlliacIV 64 16 • C.mmP • PDP11 • IBM370 • CRAY-1 1 1 16 32 64 word length Advanced Computers Architecture, UNIT 1 C SINGH, JUNE 7-8, 2010 IWW 2010, ISATANBUL, TURKEY

  7. Modern Classification Parallel architectures Function-parallel architectures Data-parallel architectures C SINGH, JUNE 7-8, 2010 Advanced Computers Architecture, UNIT 1 IWW 2010, ISATANBUL, TURKEY

  8. Data Parallel Architectures Data-parallel architectures Vector architectures Associative And neural architectures SIMDs Systolic architectures C SINGH, JUNE 7-8, 2010 Advanced Computers Architecture, UNIT 1 IWW 2010, ISATANBUL, TURKEY

  9. Function Parallel Architectures Function-parallel architectures Instr level Parallel Arch Thread level Parallel Arch Process level Parallel Arch (MIMDs) (ILPs) Distributed Memory MIMD Shared Memory MIMD Pipelined processors VLIWs Superscalar processors C SINGH, JUNE 7-8, 2010 Advanced Computers Architecture, UNIT 1 IWW 2010, ISATANBUL, TURKEY

  10. Motivation • Non-pipelined design • Single-cycle implementation • The cycle time depends on the slowest instruction • Every instruction takes the same amount of time • Multi-cycle implementation • Divide the execution of an instruction into multiple steps • Each instruction may take variable number of steps (clock cycles) • Pipelined design • Divide the execution of an instruction into multiple steps (stages) • Overlap the execution of different instructions in different stages • Each cycle different instruction is executed in different stages • For example, 5-stage pipeline (Fetch-Decode-Read-Execute-Write), • 5 instructions are executed concurrently in 5 different pipeline stages • Complete the execution of one instruction every cycle (instead of every 5 cycle) • Can increase the throughput of the machine 5 times C SINGH, JUNE 7-8, 2010 Advanced Computers Architecture, UNIT 1 IWW 2010, ISATANBUL, TURKEY

  11. Example of Pipeline LD R1 <- A ADD R5, R3, R4 LD R2 <- B SUB R8, R6, R7 ST C <- R5 5 stage pipeline: Fetch – Decode – Read – Execute - Write Non-pipelined processor: 25 cycles = number of instrs (5) * number of stages (5) F D R E W F D R E W F D R E W F D R E W Pipelined processor: 9 cycles = start-up latency (4) + number of instrs (5) F F D R E W Draining the pipeline F D R E W F D R E W F D R E W Filling the pipeline F D R E W Advanced Computers Architecture, UNIT 1 C SINGH, JUNE 7-8, 2010 IWW 2010, ISATANBUL, TURKEY

  12. Data Dependence • Read-After-Write (RAW) dependence • True dependence • Must consume data after the producer produces the data • Write-After-Write (WAW) dependence • Output dependence • The result of a later instruction can be overwritten by an earlier instruction • Write-After-Read (WAR) dependence • Anti dependence • Must not overwrite the value before its consumer • Notes • WAW & WAR are called false dependences, which happen due to storage conflicts • All three types of dependences can happen for both registers and memory locations • Characteristics of programs (not machines) C SINGH, JUNE 7-8, 2010 Advanced Computers Architecture, UNIT 1 IWW 2010, ISATANBUL, TURKEY

  13. Example Example 1 1 LD R1 <- A 2 LD R2 <- B 3 MULT R3, R1, R2 4 ADD R4, R3, R2 5 SUB R3, R3, R4 6 ST A <- R3 RAW dependence: 1->3, 2-> 3, 2->4, 3 -> 4, 3 -> 5, 4-> 5, 5-> 6 WAW dependence: 3-> 5 WAR dependence: 4 -> 5, 1 -> 6 (memory location A) Execution Time: 18 cycles = start-up latency (4) + number of instrs (6) + number of pipeline bubbles (8) F D R E W F D R E W F D R R R E W F D D D R R R R E W F F F D D D R R R E W F F F D D D R R R E W Pipeline bubbles due to RAW dependences (Data Hazards) Advanced Computers Architecture, UNIT 1 C SINGH, JUNE 7-8, 2010 IWW 2010, ISATANBUL, TURKEY

More Related