1 / 52

Sima Dezső

Sima Dezső. Többmagos/sokmagos pro cess z or ok-2. 20 12 . Október. Version 1.0. Áttekintés. 1. Többmagos processzorok megjelenésének szükségszerűsége. 2. Homog én többmagos processzorok. 2.1 Hagyományos többmagos processzorok. 2.2 Sokmagos processzorok.

Download Presentation

Sima Dezső

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Sima Dezső Többmagos/sokmagosprocesszorok-2 2012. Október Version 1.0

  2. Áttekintés 1. Többmagos processzorok megjelenésének szükségszerűsége 2. Homogén többmagos processzorok 2.1 Hagyományos többmagos processzorok 2.2 Sokmagos processzorok 3. Heterogén többmagos processzorok 3.1 Mester/szolga elvű többmagos processzorok 3.2 Csatolt többmagos processzorok 4. Kitekintés

  3. 3. Heterogén többmagos processzorok

  4. 3.1 Heterogén mester/szolga elvű többmagos processzorok (1) MPC GPU CPU Multicore processors Homogenous multicores Heterogenous multicores Conventional MC processors Manycore processors Master/slave architectures Add-on architectures 2 ≤ n ≤ 8 cores with >8 cores Desktops Servers General purpose computing Prototypes/ experimental systems MM/3D/HPC production stage HPC near future 3.1 ábra Többmagos processzorok főbb osztályai

  5. 3. Heterogén többmagos processzorok 3.1 Heterogén többmagos mester/szolga elvű TP-ok A Cell processzor

  6. 3.1 Heterogén mester/szolga elvű TP-ok - A Cell (1) Cell BE • Sony, IBM és Toshiba közös terméke • Cél: Játékok/multimédia, HPC alkalmazások Playstation 3 (PS3) QS2x Blade Szerver család (2 Cell BE/blade) • Előzmények: 2000 nyara: Az architektúra alapjainak meghatározása 02/2006: Cell Blade QS20 08/ 2007 Cell Blade QS21 05/ 2008 Cell Blade QS22

  7. 3.1 Heterogén mester/szolga elvű TP-ok - A Cell (2) SPE: Synergistic Procesing Element SPU: Synergistic Processor Unit SXU: Synergistic Execution Unit LS: Local Store of 256 KB SMF: Synergistic Mem. Flow Unit EIB: Element Interface Bus PPE: Power Processing Element PPU: Power Processing Unit PXU: POWER Execution Unit MIC: Memory Interface Contr. BIC: Bus Interface Contr. XDR: Rambus DRAM 3.2 ábra: A Cell BE blokk diagramja

  8. 3.1 Heterogén mester/szolga elvű TP-ok - A Cell (3) 3.3 ábra: A Cell BE lapka (221mm2, 234 mtrs)

  9. 3.1 Heterogén mester/szolga elvű TP-ok - A Cell (4) 3.10 ábra: A Cell BE lapka - EIB

  10. 3.1 Heterogén mester/szolga elvű TP-ok - A Cell (5) 3.11 ábra: Az EIB működési elve

  11. 3.1 Heterogén mester/szolga elvű TP-ok - A Cell (6) 3.12 ábra: Konkurens átvitelek az EIB-en

  12. 3.1 Heterogén mester/szolga elvű TP-ok - A Cell (7) • Teljesítmény @ 3.2 GHz: QS21 Csúcs SP FP: 409,6 GFlops (3.2 GHz x 2x8 SPE x 2x4 SP FP/cycle) • Cell BE - NIK 2007: Faculty Award (Cell 3Đ app./Teaching) 2008: IBM – NIK Kutatási Együttműködési Szerződés: Teljesítményvizsgálatok • IBM Böblingen Lab • IBM Austin Lab

  13. 3.1 Heterogén mester/szolga elvű TP-ok - A Cell (8) The Roadrunner 6/2008 : International Supercomputing Conference, Dresden A világ 500 leggyorsabb számítógépe 1. Roadrunner 1 Petaflops (1015) fenntartott teljesítmény (linpack)

  14. 3.1 Heterogén mester/szolga elvű TP-ok - A Cell (9) 3.13 ábra:A világ leggyorsabb számítógépe: IBM Roadrunner (Los Alamos 2008)

  15. 3.1 Heterogén mester/szolga elvű TP-ok - A Cell (10) 3.14 ábra: A Roadrunner főbb jellemzői

  16. 3.1 Heterogén mester/szolga elvű TP-ok - A Cell (2) Visszalapozott ehhez SPE: Synergistic Procesing Element SPU: Synergistic Processor Unit SXU: Synergistic Execution Unit LS: Local Store of 256 KB SMF: Synergistic Mem. Flow Unit EIB: Element Interface Bus PPE: Power Processing Element PPU: Power Processing Unit PXU: POWER Execution Unit MIC: Memory Interface Contr. BIC: Bus Interface Contr. XDR: Rambus DRAM 3.2 ábra: A Cell BE blokk diagramja

  17. 3.2 Heterogén csatolt többmagos processzorok (1) MPC GPU CPU Multicore processors Homogenous multicores Heterogenous multicores Conventional MC processors Manycore processors Master/slave architectures Add-on architectures 2 ≤ n ≤ 8 cores with >8 cores Desktops Servers General purpose computing Prototypes/ experimental systems MM/3D/HPC production stage HPC near future 3.15 ábra: Többmagos processzorok főbb jellemzői

  18. 3.2 Heterogén csatolt többmagos processzorok

  19. 3.2 Heterogén csatolt többmagos processzorok (1) Csatolt elvű végrehajtás elve GPGPU-k esetén (legegyszerűbb szervezést feltételezve) Host Device kernel0<<<>>>() (Adatpárh. progr.) kernel1<<<>>>() CUDA

  20. 3.2 Heterogén csatolt többmagos processzorok (2) Megjegyzés a működési elvhez • Heterogén csatolt többmagos processzorok feldolgozás gyorsítók (accelerators) • A működési elv szempontjából előzmény: heterogén csatolt többprocesszoros rendszerek Példák: korai személyi számítógépek lebegőpontos társprocesszorokkal Intel 286 + 287 386 + 387 Az Intel 486-nak már volt saját “on-chip” lebegőpontos egysége (FPU) (az SX és SL modelek kivételével)

  21. 3.2 Heterogén csatolt többmagos processzorok (3) Heterogén csatolt többmagos processzorok legfontosabb implementációi Heterogén csatolt többmagos processzorok Okostelefonok Integrált grafika

  22. 3.2.1 Integrált grafika

  23. Integrált grafika (1) Áttérés angol nyelvű slide-ok használatára

  24. Integrált grafika (2) P P CPU GPU P GPU Mem. CPU NB IG NB Mem. Mem. Periph. Contr. South Bridge South Bridge Implementation of integrated graphics Implementation of integrated graphics On the processor die In the north bridge In a multi-chip processor package on a separate die Both the CPU and the GPU are on separate dies and are mounted into a single package Implementations about 1999 - 2009 Intel’s Havendale (DT) and Auburndale (M) (scheduled for 1H/2009 but cancelled) Arrandale (DT, 1/2010) and Clarkdale (M, 1/2010) Intel’s Sandy Bridge (1/2011) and Ivy Bridge (4/2012) AMD’s Bobcat-based APUs (M, 1/2011) Llano APUs (DT, 6/2011) Trinity APUs (DT, Q4/2012)

  25. 6. hét (2012. 10. 17) – ismétlés alma

  26. Integrált grafika (3) MCP Processor PCI-E Thread Thread 8M Core PCI-E Thread Thread 4M Core Graphics Thread Thread Core GPU Thread Thread Core DDR3 IMC Power DDR3 IMC Power Thread Thread Core DDR3 Thread Thread Core Example 1: Intel’s Havendale (DT) and Auburndale (M) multi-chip CPU/GPU processor plans (scheduled for 1H/2009 but cancelled about 1/2009) [] • Revealed in 9/2007. • Both parts were based on the 2. gen. Nehalem (Lynnfield) architecture, as shown below. Havendale processor (Multi-chip package – MCP) Lynnfield processor (Monolithic die) Same LGA 1160 platform Graphics DDR3 Schedule: • 2H ’08 First Samples • 1H ’09 Production • TDP < 95 W Display Link DMI DMI Display I/O Control Processors Display I/O Control Processors No integrated graphics VGA Analog Analog SDVO, HDMI Display Port, DVI Digital Digital PCIe, SATA, NVRAM, etc. PCIe, SATA, NVRAM, etc. I/O functions I/O functions Ibexpeak PCH Ibexpeak PCH RS – Intel 2009 Desktop Platform Overview Sept. 2007 http://pic.xfastest.com/z/INTEL%202009%20%20Overview/2009Overview.ppt

  27. Integrált grafika (4) IDF 2009 Example 2: Intel’s Westmere-EP based multi-chip CPU/GPU processors (2010)-1 []

  28. Integrált grafika (5) Positioning Intel’s Westmere-EP based multi-chip Clarkdale (DT) and Arrandale (M) processors with in-package integrated graphics []

  29. Integrált grafika (6) Using single part PCH (Peripheral Control Hub) for Intel’s Westmere-EP based multi-chip CPU/GPU processors (2010) []

  30. Integrált grafika (7) (Dedicated graphics via graphics card) Replacing integrated graphics (IGFX) from the north bridge to the processor []

  31. Integrált grafika (8) P P CPU GPU P GPU Mem. CPU NB IG NB Mem. Mem. Periph. Contr. South Bridge South Bridge Implementation of commercial graphics on the processor die Implementation of integrated graphics On the processor die In the north bridge In a multi-chip processor package on a separate die Both the CPU and the GPU are on separate dies and are mounted into a single package Implementations around 1999 - 2009 Intel’s Sandy Bridge (1/2011) and Ivy Bridge (4/2012) AMD’s Bobcat-based APUs (M, 1/2011) and Llano APUs (DT, 6/2011) Trinity APUs (DT, Q4/2012) Intel’s Havendale (DT) and Auburndale (M) (scheduled for 1H/2009 but cancelled) Arrandale (DT, 1/2010) and Clarkdale (M, 1/2010)

  32. Integrált grafika (9) Example 1: Intel’s Sandy Bridge with 6 Series PCH-1 [] Key microarchitecture features of the Sandy Bridge vs the Nehalem []: Kahn O., Piazza T., Valentine B.: Technology Insight: Intel Next Generation Microarchitecture Codename Sandy Bridge, IDF 2010 extreme.pcgameshardware.de/.../281270d1288260884-bonusmaterial-pc- games- hardware-12-2010-sf10_spcs001_100.pdf

  33. Integrált grafika (10) 256 KB L2 (9 clk) 256 KB L2 (9 clk) 256 KB L2 (9 clk) 256 KB L2 (9 clk) 256 KB L2 (9 clk) 256 KB L2 (9 clk) 256 KB L2 (9 clk) Hyperthreading AES Instr. VMX Unrestrict. 20 nm2 / Core 32K L1D (3 clk) AVX 256 bit 4 Operands @ 1.0 1.4 GHz (to L3 connected) (25 clk) PCIe 2.0 256 b/cycle Ring Architecture DDR3-1600 Die plot of the 4C Sandy Bridge processor[] Sandy Bridge 4C 32 nm 995 mtrs/216 mm2 ¼ MB L2/C 8 MB L3 []: Intel Sandy Bridge Review, Bit-tech, Jan. 3 2011, http://www.bit-tech.net/hardware/cpus/2011/01/03/intel-sandy-bridge-review/1

  34. Integrált grafika (11) 1 Block diagram of Intel’s Sandy Bridge with 6 Series PCH-2 [] Core i3-21xx, 2C, 2/2011 Core i5-23xx/24xx/25xx, 4C, 1/2011 Core i7-26xx, 4C, 1/2011 Intel 6 series PCH1 1Except P67 that does not provide a display controller in the PCH Sandy Bridge desktop datasheet

  35. Integrált grafika (12) Example 2: Intel’s Ivy Bridge with 6 Series PCH-1 [] Key microarchitecture features of the Ivy Bridge vs the Sandy Bridge

  36. Integrált grafika (13) Contrasting the die plots of Ivy Bridge vs Sandy Bridge (at the same feature size)-1 [] Ivy Bridge-DT 22 nm 1480 mtrs 160 mm2 Sandy Bridge-DT 32 nm 995 mtrs 216 mm2 http://www.itproportal.com/2012/04/24/picture-ivy-bridge-vs-sandy-bridge-gpu-die-sizes-compared/

  37. Integrált grafika (14) Contrasting the die plots of Ivy Bridge vs Sandy Bridge (at the same feature size)-2 [] Note In the Ivy BridgeIntel devoted much more emphasis to graphics processing than in the Sandy Bridge to compete with AMD’s graphics superiority.

  38. Integrált grafika (15) Example 3: AMD’s “Swift” Fusion APU plan (2009) Preliminaries In 10/2006 AMD acquired the graphics firm ATI and at the same day they announced that “AMD plans to create a new class of x86 processors that integrate the central processing unit (CPU) and graphics processing unit (GPU)at the silicon level, codenamed “Fusion [].” Remark Although in the above statementAMD designatedthe silicon level integration of the CPU and GPU as the Fusion initiative, in some other publicationsthey call both the package level and the silicon level integration of the CPU and GPUas the Fusion technology, as shown in the next figure [b] AMD Completes ATI Acquisition and Creates Processing Powerhouse SUNNYVALE, CALIF. -- October 25, 2006 --AMD

  39. Integrált grafika (16) Extended interpretation of the term Fusion technology in some AMD publications [] Despite this disambiguation, subsequently AMD understood the term Fusion usually as the silicon level integration of the CPU and the GPU. AMD Torrenza and Fusion together , 22 March 2007

  40. Integrált grafika (17) • In 12/2007 at their Financial Analyst Day AMD gave birth to a new term by designating their processors implementing the Fusion concept as APUs (Accelerated Processing Units). • At the same time AMD announced their first APU family called the Swift family [] as well.

  41. Integrált grafika (18) • In 11/2008 again at their Financial Analyst Day AMD postponed the introduction of Fusion-basedAPU processors until the company transitions to the 32 nm technology [].. AMD Fusion now pushed back to 2011 By Joel Hruska | Published: November 14, 2008-

  42. Integrált grafika (19) Remark This is a similar move as done by Intel with their 45 nm Havendale (DT) and Auburndale (M) in-package integrated multi-chip CPU+GPU projects. As leaked from industry sources in 1/2009 Intel canceled their 45 nm multi-chip processor plans in favor of 32-nm multi-chip processors to be introduced in Q1/2010 []. Intel cans 45nm “Auburndale” and “Havendale” Fusion CPUs! Posted by: theovalich | January 31, 2009

  43. Integrált grafika (20) Example 4: AMD’s Piledriver-based Trinity desktop APU line (2012) Announced in 6/2012 Scheduled for Q4/2012 The Trinity APU is based on the Piledriver Compute Module, which is a redesign of the ill fated Bulldozer Compute Module.

  44. Integrált grafika (21) The Piledriver Compute Module of Trinity [] http://www.pcper.com/reviews/Editorial/AMD-Vishera-and-Beyond-New-Design-Philosophy-Dictates-Faster-Pace/How-Does-Vishera

  45. Integrált grafika (22) The Trinity APU die with the Piledriver cores [] http://techreport.com/articles.x/22932

  46. Integrált grafika (23) The Comal platform that incorporates the (Piledriver-based) Trinity APU and the A70M PCH [] http://technewspedia.com/meet-the-new-amd-apus-series-a-2-nd-generation-trinity/

  47. 3.2.2 Okostelefonok

  48. 3.3.2 Okostelefonok (1) 3.2.2 Smart phone platforms Example: Texas OMAP 5 (OMAP 5430)

  49. 3.3.2 Okostelefonok (2)

  50. 4. Kitekintés

More Related