120 likes | 338 Views
Architecture Design of a Scalable Single-Chip Multi-Processor. B.D. Theelen. Overview. Introduction MµP Features System Architecture Hardware RTOS Example Configuration Experimental Results Conclusions. Scalable, Customisable, Reusable. Parallel Execution of Various Tasks.
E N D
Architecture Design of aScalable Single-Chip Multi-Processor B.D. Theelen www.ics.ele.tue.nl/~btheelen
Overview • Introduction • MµP Features • System Architecture • Hardware RTOS • Example Configuration • Experimental Results • Conclusions www.ics.ele.tue.nl/~btheelen
Scalable, Customisable, Reusable Parallel Execution of Various Tasks Introduction Architecture Platforms for Real-Time Embedded Systems Customisability + Parallel + Scalable + Reusable Configurable Set of Application-Dedicated Processor Cores Flexibility + (Parallel + Scalable) + Reusable (Scalable Number of Identical) General-Purpose Processor Core(s) SoC technology enables embedding both on Single-Chip Involves flexible and scalable Interconnects and Memory Architecture Examples: TriMedia, SpaceCake www.ics.ele.tue.nl/~btheelen
Deadlines, Task Priorities, Impact of Overhead Real-Time Environment Architecture Platforms for Real-Time Embedded Systems Involves fast Interconnects and Memory Architecturecapable of dealing with task priorities Multi-Micro Processor (MμP) Combines Scalable Number of Identical General-Purpose Master Processors with Configurable Set of Shared Application-Dedicated Co-processors and a Hardware RTOS Kernel to reduce task switching overhead www.ics.ele.tue.nl/~btheelen
MµP Features • True parallel execution of tasks • Master Processors execute tasks independently • Instruction Set is extendable • Only 1/16th of instruction space is executed by Master Processors • Remainder is split over up to 15 different Co-processor types • Co-processor type determines actual use of instruction space • Number of Co-processors of certain type is scalable • On-chip RTOS Kernel • Transparent priority-based multi-tasking over Master Processors • Hardware support for fast task switches • Communication and synchronisation between (local and remote) tasks • (Counting) semaphores, mailboxes, pipes • Extended event handling mechanism instead of interrupts • Uses counting semaphores www.ics.ele.tue.nl/~btheelen
L2 I$ Master Processors 1 2 n L1 I$ L1 I$ L1 I$ Register D$ Arbiter Memory MultiPort D$ Function Switch Task Assignment SharedCo-Processors Event Inputs m.1 FPU 2.1 LSU 1 TCU m.y FPU 2.x LSU MPNetwork Result Switch Chip Boundary System Architecture Task Control UnitHardware RTOS Kernel www.ics.ele.tue.nl/~btheelen
Design Issues • On-Chip Interconnects • Cyclic path of instructions and results • Interconnects are non-blocking • Master processors accept results at all times and implement scoreboarding • Function Switch routes on co-processor type number • Fair arbitration with high/low priority based on task priority and request age • Result Switch routes on task number • FCFS arbitration without priorities • Perform routing functionality in one clock • Memory Architecture • Separated instruction and data path • Two-level instruction cache architecture with round-robin arbitration • Shared multi port data cache = data cache with statistically multiplexed banks • Round-robin arbitration between accesses for different paths • No real cache coherency problems www.ics.ele.tue.nl/~btheelen
Function Switch Control Space TCU Core TCU Network Management Link Function Rx Task Admin Link Switch Network Task Scheduler Sorted Task List Executive Resource Admin Link Link Resource Data Timers Arbiter Result Tx Event Detect Result Switch Event Inputs MultiPort D$ Master Processors Hardware RTOS www.ics.ele.tue.nl/~btheelen
Design Issues • Task Management • Commands for creating, terminating, delaying, suspending and restarting tasks and for changing priority • Tasks of equal priority time share master processors available to them • Task switching accelerated by specialised cache storing volatile contexts • Transparent Communication • Commands for activating, deactivating, reading and writing resources • Counting semaphores, mailboxes and pipes in hardware • Network Manager shields tasks from MµP network • Tasks can access any resource in the MµP network • Extended Event Handling • Commands for activating and deactivating event inputs • Event inputs are coupled to counting semaphores • Involved semaphore might not be in same MµP where the task resides www.ics.ele.tue.nl/~btheelen
Two 8048 ISA compatible Master Processors 8048 compatible I/O and Timers in Co-Processors 1 clock Function Switchand Result Switch On-chip 2kB Instruction ROM and 1kB Data RAM Register D$ enablingTask Switches in 1 clock TCU Co-Processor • 15 user-definable tasks • 32 binary semaphores • Timers and Interrupts supported as events for predefined tasks • all commands executed in 1 clock By V.R. Suárez Example Configuration (Mini MµP) www.ics.ele.tue.nl/~btheelen
Experimental Results (Mini MµP) • Mini MµP designed using IDaSS • Interactive Design and Simulation System • Automatic generation of synthesisable VHDL or Verilog • Mini MµP implemented in Xilinx Spartan-II 200 FPGA • Uses 42% of memory area and 83% of gate area • Total gate count of 141k • Runs at 25 Mhz (expect over 30Mhz for optimised version) • Critical path is 14 gates (in Master Processor core) • Next critical path in TCU Co-Processor www.ics.ele.tue.nl/~btheelen
Conclusions • Multi Micro Processor (MµP) Architecture • Scalable Single-Chip Multi-Processor • Intended for Real-Time Embedded Systems • On-chip RTOS Kernel with hardware support for fast Task Switches • Design issues • On-chip Interconnects • Memory Architecture • Hardware RTOS • Task Management • Transparent Communication • Extended Event Handling • Results • Mini version of MµP with two 8048 ISA compatible Master Processors www.ics.ele.tue.nl/~btheelen