1 / 22

Coordinator of the European Project Permanent Staff Researcher (part time)

SHAPES: a tiled scalable Software Hardware Architecture Platform for Embedded Systems - HW overview. P.S. Paolucci (INFN and Atmel Roma) for the SHAPES Consortium. Coordinator of the European Project Permanent Staff Researcher (part time) Istituto Nazionale di Fisica Nucleare Roma – Italy

palani
Download Presentation

Coordinator of the European Project Permanent Staff Researcher (part time)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SHAPES: a tiled scalable Software Hardware Architecture Platform for Embedded Systems-HW overview P.S. Paolucci (INFN and Atmel Roma) for the SHAPES Consortium Coordinator of the European Project Permanent Staff Researcher (part time) Istituto Nazionale di Fisica Nucleare Roma – Italy 4 generations of APE Massive Parallel Processors based on custom VLIW DSP (part time) Technology Director and Founder ATMEL Roma, corporate design center of ATMEL for Advanced DSP 2 generations of RISC + custom mAgic VLIW DSP MPSoC Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  2. Deep Sub-micron Architectures… • ~160 MGate available on a 100 mm2 chip (45nm CMOS, 2008) • Increasing GATES/CHIP Design Complexity Management: • embedded processors use a few million gates only, IP reuse possible and needed; • WIRING threatens Moore’s law: • Wiring delay increases on new CMOS silicon generations • The full chip cannot be reached in a single clock cycle • Classic monolithic processor architectures do not scale • Locally Synchronous, Globally Asynchronous needed • Communication Centric SW and HW Architecture needed • … PROPOSED SOLUTION: … TILED ARCHITECTURE…BY SIMPLE GEOMETRIC DEMONSTRATION… IF CONSTANT LOGIC COMPLEXITY INSIDE EACH TILE… THEN (LENGTH OF INTRA-TILE WIRES SCALE DOWN AS THE TILE ITSELF… AND SHORT ~ FIRST NEIGHBOURS ON-CHIP AND OFF-CHIP INTER-TILE WIRES) • QUEST OF BEST TILE, ON-CHIP AND OFF-CHIP INTERCONNECT. BUT HOW TO PROGRAM? EXPLICIT PARALLEL PROGRAMMING PARADIGM, and CULTURE NEEDED • POWER DISSIPATION density approaching prohibitive values if higher clock speed used; much better Oper/Watt at moderate clock + parallelism (the human brain parallel architecture performs an excellent job at 50 HZ!... room for improvement) Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  3. Tiled HW Architecture OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM FPGA sensor ADC Tile Tile Tile Tile • Communication Centric, not Processor Centric • Homogeneous SW interface for on-chip and off-chip scalable connection and I/O • 3D first-neighbour Toroidal System Eng. (3DT) for Off-Chip communication • Virtual tunnelling on packed switching NoC (Network on Chip) and off-chip 3DT • Parallelism Aware System SW: Manage memory distribution, capture real time constraints • Explicit parallel programming/Network of Actors Tile Tile Tile Tile actuator DAC OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM Tile Tile Tile Tile sensor ADC Tile Tile Tile Tile actuator DAC OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM NoC 3DT Off-chip communication Tile DNP RISC DSP Multi-Layer BUS DXM POT ADC/DAC Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  4. AT THE CHIP LEVEL MTC: Multiple Tile Chip (composed of multiple Tiles) NOC: Network On Chip (connecting Tiles) 3DT: 3 Dim Toroidal Connection (outside the chip) FUNDAMENTAL TYPE OF TILE RDT includes: RISC: (includes on chip memories RDM and RPM) + DSP(includes on chip memories DDM and DPM) DNP + DXM (off-chip mem) + POT (e.g. DAC/ADC conv) POSSIBLE TILE VARIANTS (subset of RDT) RET := RDT minus DSP DET:= RDT minus RISC DDT:= DET minus DXM INSIDE THE TILE Multilayer Bus Matrix sustains multiple simultaneous transfers RISC max one per tile DSP one or more per tile DNP: Distributed Network Processor (always one per tile) DDM: on-chip Distributed Data Mem (inside the DSP) DPM: on-chip Distributed Progr Mem (inside the DSP) DXM: Distributed eXternal Mem Interface (max one per tile, outside the RISC and DSP) POT: Peripherals On Tile RDM: Risc (tightly coupled) Data Memory RPM: Risc (tightly coupled) Program Memory RCM: Risc Cache Memory HW GLOSSARY Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  5. DXM Mem Bus POT Pads DSP DXM POT Multi-Layer BUS 3DT NoC DNP Different Types of Tiles RDT RISC DSP DXM POT Multi-Layer BUS 3DT NoC DNP RDT: RISC + DSP Elementary Tile DXM Mem Bus POT Pads DET RET RISC DXM POT Multi-Layer BUS 3DT NoC DNP DET: DSP Elementary Tile RET: RISC Elementary Tile Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  6. DNP BUS Slave (to receive commands from RISC & DSP) BUS Master (to read from intra-tile memories) BUS Master (simultaneous intra-tile memory write) 3DT X+ (to forward/receive inter-tile off chippackets) 3DT X- 3DT Y+ 3DT Y- 3DT Z+ 3DT Z- Collective communication NoC (to forward/receive inter-tile on-chippackets) Distributed Network ProcessorDNP: Interface Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  7. DBG IRQ IN IRQ OUT RST, CLOCKS AHB MST AHB SLV 2-port, 8Kx128-bit, VLIW Program Memory(DPM) AHB Master DMA Engine AHB Slave, e.g. DMA Target VLIW Decompressor Flow Controller, VLIWDecoder Program Counter Condition Generation Status Register Instruction Decoder 4-address/cycle Multiple DSP Address Generation Unit 6-access/cycle Data Memory System 2x8Kx40 (DDM) 8R+8W 128x40 Data Register File System 16 multi-field Address Register File 10-float ops/cycle mAgicV IP Architecture (Fully C programmable Gigaflops VLIW DSP) Pier Stanislao Paolucci – INFN and Atmel Roma – SHAPES HW Overview

  8. DSP Reg File DSP Data Mem (DDM) DSP Logic DSP Prog Mem (DPM) AMBA Multilayer Peripherals ARM RDM ARM926 Silicon Floorplan of Diopsis 940:RISC + mAgicV VLIW DSP Pier Stanislao Paolucci – INFN and Atmel Roma – SHAPES HW Overview

  9. DXM DXM Interface(AHB EBI) RISC ICE RCM Instr Cache MMU RCM Data Cache RDM IF BIU D I D I JTAG RDM SRAM P E R I P H E R A L S Multi-layer Bus MATRIX PDMA ROM Bridge mAgicV DSPTM JTAG Master Slave DSP AHB Slave DNP AHB Slave mAgicVTM DPM 2-port DSP AHB Master DNP AHB Master DNP AHB Master APB DNP DDM 6-access/ cycle 4-addr/ cycle Multiple DSP Addr Gen 16-port 256x40 Data Regs Z+ C+ NoC (NI) X+ Y+ Y- Z- X- 10-float ops/cycle The tile: DNP Diopsis + Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  10. Tile Complexity • mAgicV DSP: • 915 Kgates + 1 Mbit Prog Mem + 640 Kbit Data Mem • ARM926 & peripherals • <2 equivalent Mgate (including 640 Kbit mem) • Tile Complexity  • 4230 equivalent Kgate + DNP gate count • including on chip memories Pier Stanislao Paolucci – INFN and Atmel Roma – SHAPES HW Overview

  11. Spidergon topology • It’s a family of regular/symmetric topologies • We look for a complexity/performance trade-off • Low degree (router cost) • Low number of links (wire cost) • Symmetry (homogeneous building blocks; simple routing) • Low diameter (performance) • Good scalability (small network size granularity) Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  12. STNoC key components Network on Chip is a set of on-chip routers (up to layer 3), Network Interfaces (NI) (layer 4) and physical Link IP link router NI IP Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  13. Industrial Interest for Multi Tile Systems-on-Chip • Large silicon area is needed for high Average Selling Price/unit: Multiple tile -> the way to design chips area >= 50mm2 on future technologies -> Industrial Interest of Semiconductor Manufacturers to avoid a low profit commodities-like industry • $ and # of embedded processors / persons increasing faster than conventional processors / persons • # of (phones, games, pdas, cars, home, medical, wearable) vs PC • Collision/convergence on architectures is going to happen: • Because of changes on key driving markets • Because full systems can be integrated on a chip • Because of deep submicron technological facts: • WIRING, COMPLEXITY, POWER • This time, …we are not in 1980, when a simpler solution was achievable through higher clock rate and monolithic architectures…we need multi-processor parallelism. • European Tiles HW Research background exists to help, for example INFN solved the wiring problem on cubic meters using tiles + first neighbours 3D toroidal point-to-point physical links. Today we have face the same problem also for on-chip communications Embedded Systems versus Classical Computing Pier Stanislao Paolucci – INFN and Atmel Roma – SHAPES HW Overview

  14. HW Background: Istituto Nazionale Fisica Nucleare APE family of Massive Parallel custom Very Long Instruction Word Floating-Point Procs. + 3D first neighbour toroidal communication for Numerical Physics Simulations Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  15. APENext (2005) 2048 processor system, VLIW processors manufactured by ATMEL Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  16. APEmille (1999) – 1 TFlops • 2048 VLSI processing nodes • SIMD, synchronous communications • Fully integrated ”Host computer”, 64 PCs cPCI based “Torre” 32 PB, 128GFlops “Processing Board” (PB) 8 nodes, 4GFlops Computing node Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  17. APE100 (1993) - 100 GFlops PB (8 nodes) ~400 MFlops Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  18. 1997- 2001 Spin-off from INFN and Creation of IPITEC start-up (Intellectual Property Initiative for Tools and Embedded Cores) – (P.S. Paolucci, B. Altieri) 2002-2004 mAgic VLIW DSP synthesizable core IPITEC becomes ATMEL Roma Advanced DSP Products ATMEL 2003: Diopsis 740: 1 Gigaflops mAgic DSP + Arm7 2007: Diopsis 940: 1 Gigaflops mAgicV DSP (fully C programmable) + Arm9 2007-2009: Shapes Tile Diopsis 740 tile: A gigaflops VLIW+RISC SoC Tile - HotChips 15 Conference – Stanford (2003) …from INFN APE to Atmel Roma MPSoC tile… …toward SHAPES tile Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  19. SW challenges from Tiled Architectures • Facilitate expression of parallelism: e.g. Network of Actors • Express real time constraints in a formal manner, feature missing in classical languages. This is a key cultural point!!! • Avoid destroying information about available algorithm parallelism • Compilation chain must fully aware of key architectural parameters: bandwidth, computational power, pipeline and latencies • Exploit memory locality – efficient management of Distributed Memories – get rid of classical chaches • Manage Long delays between distant tiles • Reduce Hot Spots in communications • Reduce RTOS overhead (time and memory footprint) - Networked RTOS? Minimal RTOS on each tile? • Introduce Hardware dependent Software and Hardware Abstraction Layers • Capture scalability in a library of characterized SW/HW components • Support for (semi)-automation of iterative design over HW, SW, Appl • Monitor quality and real-time constraints on real HW and Simulators • Simulation speed of multi-tiled architectures Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  20. Scalable Software Hardware Architecture Platform for Embedded Systems (2006-2009)FP6-IST – Future Emerging Technologies-Advanced Computer Architecture Project Research Lines System SW ETH Zurich - Distributed Operation Layer; manage application parallelism RWTH Aachen Univ. - Simulation of Heterogeneous Multi Proc. Systems TIMA Lab and THALES - Hardware dependent Software Layer and RTOS TARGET Compiler Tech. - Retargetable VLIW Compilers System HW ATMEL Roma - Tile: Evolution of (DiopsisTM: mAgicV VLIW DSPTM + RISC) + INFN DNPTM INFN Roma – DNPTMDistributed Network Processor + 3D Toroidal Eng.: Evolution of APE Massive Parallel Processors STMicrolectronics + Univ. of Cagliari and Pisa – Evolution of SpidergonTMPacket Switching Network on Chip Parallel application benchmarking Fraunhofer IDMT,IGD - Audio Wave Field Synthesis and Graphic Algorithm PIE, MedCom - Ultrasound scanner INFN - Physical Modelling Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  21. applicationspecs hardware platform specification Distributed Operation Layer Simulator trace information mappinginformation component interaction,properties and constraints Model Compiler Mapping HdS Generator componentsource code HdS source code Memory mapping Optimised compilation on tiles and comms network RTOS Compiler LinkDispatch componentbinary gluebinary HdSbinary OS servicesbinary SW Environment – Summary of Working Principles • Model Based Application Description • Network of Interacting Componentsincl. non-functional constraints, analytical predictions and run-time profiling • Distributed Operation Layer • Maps components on Processing and Networking Resources • Stepwise approach to semi-automated mapping: • By hand, assisted by simulation, run-time profiling and analytical models • By algorithms for automated multi-objective randomised search • Target Applications • Extensive inherent parallelism Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

  22. Tiled HW Architecture OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM FPGA • Communication Centric, not Processor Centric • Homogeneous SW interface for on-chip and off-chip scalable connection and I/O • 3D first-heighbour Toroidal System Eng. (3DT) for Off-Chip communication • Virtual tunnelling on packed switching NoC (Network on Chip) and off-chip 3DT • Parallelism Aware System SW: Manage memory distribution • Explicit parallel programming/Network of Actors sensor ADC Tile Tile Tile Tile Tile Tile Tile Tile actuator DAC OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM Tile Tile Tile Tile sensor ADC Tile Tile Tile Tile actuator DAC OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM OFF-CHIP MEM NoC 3DT Off-chip communication Tile DNP RISC DSP Multi-Layer BUS DXM POT ADC/DAC Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview

More Related