1 / 15

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT. Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section of Electronics and Computers, Department of Physics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

love
Download Presentation

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section of Electronics and Computers, Department of Physics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece nivas@skiathos.physics.auth.gr Algarve, Portugal February 22-23, 2005

  2. Outline • Motivations • Proposed Architecture • Software Development Environment • Demonstration • Results • Conclusions

  3. Motivations • Quest for Performance and Flexibility • Large portion of computational complexity is concentrated in small kernels covering small parts of overall code • Performance Improved by Accelerating these kernels • Many Algorithms Show a relevant Instruction Level Parallelism (ILP) • Performance Improved by parallel execution • Traditional Processors have computation clock slack • Performance Improved by chaining of operations (Spatial Computation) Extending Embedded Processors With Application Specific Function Units Reconfigurable Instruction Set Processors for Performance with Maximum Flexibility

  4. Proposed Architecture • Reconfigurable Instruction Set Processor (RISP) • Core Processor • 32-bit load/store RISC architecture • 5 Pipeline Stages • Single Issue Elaboration • Reconfigurable Logic Coupling • Reconfigurable Function Unit (RFU) approach => Low Communication Overhead • Tightly Coupled => RFU Fits in two RISC pipeline stages => Better Utilization of the Pipeline Stages • RFU • 1-D Array of Coarse Grain Processing Elements (PEs) • PE Functionality Configurable at Design Time to meet Application requirements • Exploits Instruction Level Parallelism – Spatial & Temporal Computation

  5. Proposed Architecture • Core Processor • Commonly Used Function Units • Control Logic Properly Extended to Handle Reconfigurable Instructions • 4-Read-1-Write Register File • Core / RFU Interface • Receives & Delivers Control and Data Signals • Tightly Coupled RFU • Configuration-Processing-Interconnection Layers • Operates & Delivers Results in two Concurrent Pipeline Stages

  6. Standard And Reconfigurable Instructions 32-Bit Instruction Word Format • Re=‘0’ => Standard Instruction • Control Logic : Configure Core Datapath • Operands : Source1-2 & Destination • ReOpCode = “nop” • Re=‘1’ => Reconfigurable Instruction • Control Logic : Configure Interface • Operands : Source1-4 & Destination • ReOpCode = “OpCode” • Three Types of Reconfigurable Instructions • Complex Computational Operations • Complex Addressing Modes • Complex Control Flow Operations • Each Instruction can be multicycle

  7. Reconfigurable Function Unit (RFU) • Embedded RFU for Dynamic Extension of the Instruction Set • Executes Multiple-Input-Single-Output (MISO) Reconfigurable Instructions • 1-D Array of Coarse Grain Reconfigurable Blocks • Comprised of Three Layers • Processing Layer • Interconnection Layer • Configuration Layer

  8. RFU-Processing Layer • PE Basic Structure • Configurable PE functionality for the targeted application • Unregistered Output => Spatial Computation • Register Output => Temporal Computation • Floating PEs => Can operate in both core pipeline stages on demand • Local Memory for Read Only Values • Execute Long Chains of Operation in one processor cycle

  9. RFU-Interconnection Layer • 1-D Array of PEs • Operands from Register File • Constant Values from Local Memory • Input Network • Operand Select • Output Network => Delivers Results to corresponding pipeline stages

  10. RFU-Configuration Layer • Configuration Bits Local Storage Structure • Multi-Context Configuration Layer • Coarse Grain => Small Number of Configuration Bits => Negligible Overhead to Download new Contexts

  11. Architecture Synthesis & Evaluation • A Hardware Model (VHDL) was Designed for Evaluation Purposes • The Model was Synthesized with STM 0.13um Process • The RFU Area Overhead is 3.3x the Area of the Core Processor • No Caches were taken into account • No Overhead to Core Critical Path

  12. Software Development Environment

  13. Demonstration-RFU Elaboration • Largest MaxMISO for a Quantization Kernel • Execution on the Core => six cycles • Execution on the Core+RFU => one cycle • Performance Improvements • Reduced Instruction Memory Accesses

  14. Results Speed-Ups for Several Kernels – Core Vs. Core+RFU Energy Consumption Dominated by Memory Accesses

  15. Conclusions • A RISC Processor Enhanced by a Run-Time Reconfigurable Function Unit • 1-D Reconfigurable Array of Coarse Grain Processing Elements • Multiple-Input-Single-Output Reconfigurable Instructions • Specific Software Development Environment • Low Cost Performance and Energy Consumption Improvements Next Step => Expand to VLIW Elaboration to Boost Achieved Speed-Ups

More Related