210 likes | 386 Views
Extensible Processors. ASIP. Gain performance by: Specialized hardware for the whole application (ASIC). Almost no flexibility. High cost. Use special hardware for customized instructions in a GP processor Instruction set extension. Application-specific instruction set processors,
E N D
ASIP • Gain performance by: • Specialized hardware for the whole application (ASIC). • Almost no flexibility. • High cost. • Use special hardware for customized instructions in a GP processor Instruction set extension. • Application-specific instruction set processors, • Customized to perform particularly well in a particular application area. • Can improve performance for particular problem instances while maintaining the flexibility of the overall system. • Motivated by application-specific nature of embedded systems.
ASIP • Problems: • Substantial non-recurring engineering costs • Each new ASIP must be verified both from the functionality and timing perspectives. • A new mask set must be created to fabricate the chip. • Software side: the compiler must be retargeted to each new processor • Any hand-written libraries must be migrated to the new platform. • Automation of some of these tasks may be possible; • however, the majority of this work is still a manual process. • • Difficult to adopt a new ASIP despite the potential advantages.
ASIP • Advantages: • System is post-programmable and can tolerate modest changes to the application (little performance degradation) • e.g., changes in standard. • Computation intensive portions of applications from the same domain (e.g., encryption) are often similar in structure. • Customized instructions can often be generalized in small ways to make them more useful across a set of applications. • Lowers the cost than ASIC.
Xtensa Processor • Xtensa from Tensilica [Gonzales00] • A processor core which lets the system designer: • select and size features for a given application, • define new instructions. • Designer can use standard ASIC design flow and tools to synthesize the processor. • Xtensa is fully synthesizable. • Tensilica processor generator adds the application-specific functionality at the time the hardware is designed. • Extensions are implemented in the same logic family as the rest of the processor. • Cannot modify the extensions for other applications.
Xtensa Processor • Designer • specifies the characteristics in TIE (Tensilica Instruction Extension) language and/or menus. • Number of physical registers, • Instruction cache size, • Data cache size, • Data RAM size, • External bus width, • Number of interrupts, • Extended instructions (functional units). • Tools • generate synthesizable RTL code for the processor, • generate software development tools: • ANSI C//C++ compiler, • Linker, • Assembler, • Code profiler, • Instruction set simulator.
Xtensa Processor • Designer can analyze and identify bottlenecks in application performance. • Can work around the bottlenecks. • Can add instructions.
Xtensa Example • Example: DES algorithm
Xtensa Example • Characteristics: • Extensive bit permutations: • inefficient in software • efficient in hardware: simple renaming of wires • Rotation on 28-bit boundaries: • in software: rotation instruction on 32-bit boundaries • Table look-ups • Added 4 instructions
Xtensa Speed-Ups • Speed-ups for some applications
Altera Nios, Xilinx MicroBlaze • Soft extensible processor • Can define custom instructions • Can configure the processor • Uses Altera FPGA resources • Lower performance • Higher power consumption
Extensible Processors • Major problems with ASIPs: • Not flexible • For a new application: new masks, other NRE costs. • Large human effort required to identify and implement an efficient set of instruction set extensions. • Major problem with soft processors: • Low performance • Solution: • A GP processor with reconfigurable FU.
Extensible Processors • Custom Instruction (CI): • Instructions in the extended Instruction Set Architecture (ISA) • Can be implemented in the processor's datapath itself or as a separate co-processor. • Usually in the processor datapath. • A fragment of the program's dataflow graph mapped onto a hardware Custom Functional Unit (CFU). • Basic block: • A code fragment with single entry and exit points. • Load/Store cannot be in the BB • Cannot predict after how many clocks, the results are available to next instructions
Custom Instructions Limitations • Number of Operands: • Imposed by base architecture of the core processor. • Length of a custom instruction increases with increasing number of operands. • Number of input and output ports to the register file the number of input and output operands • cost and energy consumption of a processor increase significantly with increasing number of register file ports. • Number of custom instructions: • Imposed by the format of the base ISA. • If base ISA supports 26 instructions with fixed-length opcode 6 more CIs.
Custom Instructions Limitations • Area • Important especially in embedded systems. • Control Flow: • Custom instruction identification is typically performed within basic block boundaries. • Assumption: compiler cannot exploit instructions that cross basic block boundaries.
Instruction Set Extension (ISE) • Automatic ISA extension generation consists of: • Custom Instruction Identification • Identifies patterns meeting certain topology requirements • Custom Instruction Selection • Selects the most important patterns under resource and other constraints.
Automatic ISE • To mimic the choices of an expert designer • New concept of “Compiler”: • Retargetable compiler: • Maintaining a single piece of code for compiling to different machine targets: • Reads underlying machine description, then produces code for it. • More automation: • Tuning the machine’s instruction set: • Compiler: defines the machine and then produces code for it.
References • [Gonzalez00] R. Gonzalez, “Xtensa: a configurable and extensible processpr,” IEEE Micro, 2000.