1 / 36

SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357. 1/36. Evaluation of D esign A lternatives for a M ultiprocessor M icroprocessor By Basem A. Nayfeh, Lance Hammond and Kunle Olukotun. ISCA 23, 1996, pp. 67-77. 2/36.

corine
Download Presentation

SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBA Ş 2002701357

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SINGLE CHIP MULTIPROCESSORS Computer Architecture Term Paper (11.12.2003) Esra KIRBAŞ 2002701357 1/36

  2. Evaluation of Design Alternatives for a Multiprocessor Microprocessor By Basem A. Nayfeh, Lance Hammond and Kunle Olukotun. ISCA 23, 1996, pp. 67-77. 2/36

  3. With the use of advanced integrated technology, several options for design of high-performance microprocessors are avaliable. • In multiproessor design option, a small # of processors are interconnected on a single-chip or on a multi-chip-module (MCM) substrate. • We consantrate on single-chip multiprocessors. 3/36

  4. Our goal is to study two proposed cache-sharing mechanisms for single chip multiprocessors: • Shared Level-1 (L1) Cache Architecture • Shared Level-2 (L2) Cache Architecture • (Performance of these two architectures will becompared with a single-bus based shared-memory multiprocessor .) 4/36

  5. A multiprocessor architecture whose interconnect is closer to the CPUs in the memory hierarchy will be able to exploit fine-grained parallelism more efficiently than a multiprocessor architecture whose interconnect is further away from the CPUs in the memory hierarchy. • Try to achieve good performance on fine-grained parallel applications without sacrificing the performance of parallel independent jobs. 5/36

  6. CPU CHARACTERISTICS • We use the same CPU with all the three architectures. • 2-way issue processor • Dynamic scheduling • Speculative execution • Non-blocking caches 6/36

  7. Instruction Pipeline Functional Units 7/36

  8. 2-way 16KB set-associative instruction and data caches • 32-entry centeralized instruction window • 32-entry reorder buffer. 8/36

  9. Shared L1-Cache Multiprocessor 9/36

  10. Advantages of this Architecture: • It provides the lowest latency for interprocessor communication by using a shared-memory address space. • Low latency for interprocessor communication helps to achieve high performance in executing fine-grained parallel applications. • Processors may fetch shared data into the cache for each other. • It eleminates the cache coherence logic and implicitly provides a sequentially consistent memory without sacrificing the performance. 10/36

  11. Disadvantages of this Architecture: • Crossbar switching system increases the access time of L1 cache.(We assume that average access time is three.) • All of the memory referances will be entered L1, so there may be some extra delays due to bank conflicts. • If the processors are not executing fine-grained parallel applications, then the miss rate will increase. 11/36

  12. Secondary cache and main memories are uniprocessor like systems L2 (2 MB, 10-cycle latency + 2-cycle occupancy) Main Memory 50-cycle latency 6-cycle occupancy 12/36

  13. Shared L2-Cache Multiprocessor 13/36

  14. Write-through primary caches’ access time is 1 cycle • Latency of L2-cache increses to 14 cycles due to the cross-bar overhead. 14/36

  15. L2 cache has four independent banks to increase its bandwith and enable it to support four independent access streams. • Data-path is 64-bitwidth. • occupancy is 4 cycles (for the transfer of 32-bit cache line) 15/36

  16. Only memory accesses that miss in L1-cache will have to deal with the problem of reduced performance L2 cache. • MCM (multi chip module) technologycan be used. • (for 1996) • Main Memory • 50-cycle latency • 6-cycle occupancy 16/36

  17. To keep the primary caches coherent, we need a coherency protocol. • Simply, we assume that each primary cache uses a write-through policy for shared data. • Additional hardware must be installed for this issue. 17/36

  18. Shared Main Memory Multiprocessor 18/36

  19. Primary cache access time is 1 cycle. • Secondary cache access time is 12 cycles. • All CPUs must access main memory to communicate. 19/36

  20. Ideal Memory Latencies of Three Architectures in CPU Clock Cycles 20/36

  21. SIMULATION ENVIRONMENT • SimOS simulation environment is used • IRIX 5.3 operating system is simulated • Hand Parallelized Scientific and Engineering Applications • Compiler Parallelized Scientific and Engineering Applications • Multiprogramming Workload 21/36

  22. 2 kinds of simulations is done; • Simple Simulation (no speculative execution, dynamic scheduling, and non-blocking memory referances) • Dynamic Superscalar Simulation 22/36

  23. SIMPLE SIMULATION RESULTS (for high degree of interprocessor communication) EAR 23/36

  24. EQNOTT 24/36

  25. (for moderate degree of interprocessor communication) VOLPACK 25/36

  26. FFT Kernel 26/36

  27. (for low degree of interprocessor communication) MULTIPROGRAMMING WORKLOAD 27/36

  28. OCEAN 28/36

  29. DYNAMIC SUPERSCALAR SIMULATION RESULTS 29/36

  30. In dynamic superscalar simulation, Shared-L1 cache performance can diminish substantially, whereas Shared-L2 and shared-memory architectures retain much of the relative performance predicted by the simple simulation results. 30/36

  31. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing By Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben Verghese. ISCA 27, 2000, pp. 282-293 31/36

  32. For Online Transaction Processing Systems • Standart ASIC design technology is used • The centerpiece of the Piranha architecture is a highly integrated processing node, with eight simple Alpha processor cores, seperate instruction and data caches for each core, a shared second level cache, eight memory controllers, two coherence protocol engines, and a network router all on a single chip. 32/36

  33. 33/36

  34. 34/36

  35. 35/36

  36. SIMULATION 36/36

More Related