340 likes | 515 Views
Advanced Processor Architectures for Embedded Systems. Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation. Objectives. Discuss ASIC, FPGA-based systems, and general purpose processors Analyze the operating requirements for today’s embedded processors
E N D
Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation
Objectives • Discuss ASIC, FPGA-based systems, and general purpose processors • Analyze the operating requirements for today’s embedded processors • Observe the architectural differences between state-of-the-art processors for embedded systems and high-performance general purpose processors • Tensilica Xtensa • Stretch S5000
Embedded Processors Requirements • operate in memory constraint environment • must be energy efficient • must be low cost • may have to be good at a common set of tasks • matrix multiplication, • encryption, • filtering (FIR), • network packet processing, etc.
Implications • low memory footprint • simplified instruction set • 16-bit, 24-bit • may not need support for VM • may lack hardware MMUs • energy efficient • less complex (smaller number of transistors) • simple pipeline stages • less cache memory on chips • simple floating point units • larger transistors and slower clocks • integrated function specific components for common tasks
Implications (cont.) • low cost • share IP cores to reduce development cost • ARM, MIPS, etc. • use older semiconductor process technologies (e.g. 250nm instead of 90 nm) • task specific • built in DSP unit • wide data bus (more data per movement) • may need support for adding functions to the cores • may need field-reconfigurability
Rationales from “The Death of Micro-Processors”, Nick Tredennick and Brion Shimamoto, Embedded Systems Programming, http://www.embedded.com/showArticle.jhtml?articleID=26807160
Rationales (cont.) from “The Death of Micro-Processors”, Nick Tredennick and Brion Shimamoto, Embedded Systems Programming, http://www.embedded.com/showArticle.jhtml?articleID=26807160
Rationales (cont.) “Studies have shown that custom hardware components often require much less energy to complete their tasks than the same tasks running on general purpose processors.” [1] “An ASIC is custom logic for a particular application. Custom logic can be orders of magnitude more efficient than microprocessor-based solutions.” [2] [1] Lach et al., “Power-Efficient Adaptable Wireless Sensor Networks”, Proceedings of International Conference on Military and Aerospace Programmable Logic Devices (MAPLD), September 2003. [2] Tredennick and Shimamoto, “The Death of Micro-Processors”, Embedded Systems Programming, http://www.embedded.com/showArticle.jhtml?articleID=26807160
Application Specific ICs (ASICs) • provide custom design solutions for particular problems • fixed solutions that require public acceptance to reduce cost • required extensive knowledge of hardware design • not field-reconfigurable • can have large non-recurring engineering (NRE) cost
ASICs (cont.) Wayne Wolf, FPGA-Based System Designs, Prentice Hall, 2004
FPGA Based Systems • Field-programmable gate arrays (FPGAs) • are slower and require more power than custom design • are more expensive • but provide no wait time from completing a design to making a chip • great for prototyping • are also reusable
FPGAs • SRAM based--volatile • Altera Flex, Stratix, Cyclone, Apex • Antifuse--one-time programmable • Actel • EEPROM--non-volatile • Altera Max
ASIC Design Approaches • Custom VLSI designs • are fabricated on manufacturing line • takes months • masking cost is also expensive • operate much faster and consume less power than FPGA equivalents • can be cheaper of manufactured in large volume
ASIC Design Approaches (cont.) • Structured ASIC • is based on pre-designed logic fabric structurally embedded in the platform • fill the market gap between high-density FPGAs and standard cell ASICs • can greatly reduce development time and cost • reduce non-recurring engineering (NRE) cost http://www.amis.com/asics/structured_asics/ http://www.altera.com/b/hardcopyii.html?WT.mc_id=h2_sm_go_xx_tx_2_041&WT.srch=1
Structured ASICs View Altera demo
Integrating ASICs with GPPs • Today’s embedded systems have can have complex software layers • OS • Virtual Machine • Applications • It is more ideal to mate GPPs with ASICs as co-processors
Integrating ASICs with GPPs (cont.) • So, we can have GPPs to perform basic tasks and ASICs (co-processors) to speed up computing intensive functions • sounds simple but in reality, it is quite complex • basic hand-shaking is needed between the ASICs and the main processors • data exchange • shared memory • requires OS and architecture support • synchronous or asynchronous calls • cache coherency issue
ASICs and GPPs (cont.) • An example is to use hardware co-processor for Cryptography • should the co-processor calls be synchronous • main processor blocked on calls and wait for response • or asynchronous • calling process blocked and swapped out • need interrupt support • need to maintain context
ASICs and GPPs (cont.) • Co-processor • shares bus with the main CPU • is a source for bus contention • can cause cache coherency issue • data in the main CPU cache may have been updated by the co-processor • flush the cache accordingly • should be equiped with DMA to relieve the main CPU from copying data
Extending GPPs • Tensilica Xtensa • reconfigurable processor cores • support native 16-bit and 24-bit instruction for higher code density • users can add/subtract components (MMU, Multipliers, FPUs) • users can reconfigure cache organization • users can select bus width (32, 64, or 128 bits) • users defined instruction extension language • users can create custom instructions to speed up commonly used functions • users can instantiate custom registers of different sizes
Tensilica Xtensa from http://www.tensilica.com/html/tensilica_instruction_extensio.html
Tensilica Xtensa (cont.) • We will not go into great detail about the Xtensa. • However, we will study Stretch S5000 engine which is based on the Xtensa core.
Design Time Solutions • Up to now, we have only talked about design-time solutions! • logic designs are done in house • not very reconfigurable after the chip is made • even with FPGAs, someone has to come up with a new hardware design for it to change • the Xtensa needs about 1 hours to synthesize the instruction extension • What if we want to configure on the fly! • each application brings in CPU intensive functions • these functions are not known in advance • Can we leave it up to the software developers to design fast co-processor?
(R)evolution of Processors Ice Hard Rock Hard Playdough Hard
(R)evolution of Processors Ice Hard Hardwire, GPP Perform well in most conditions but not extreme conditions Rock Hard Playdough Hard
(R)evolution of Processors Ice Hard GPP with FPGAs Custom designs perform well in some extreme conditions. Required extensive knowledge Of hardware design Rock Hard Play Dough Hard
(R)evolution of Processors Ice Hard Rock Hard GPP with embedded programmable logics Playdough Hard Reconfiguration triggered by software
(R)evolution of Processors • Ice Hard • Contains ASIC (Application Specific IC) designs • Increases time-to-market • Takes time to reconfigure
Software Hotspots • In DSP • 80% of the processing load are spent on 20% of the code • Hand tuned assembly that can take thousands of cycle to execute. • Less portable • The remaining 80% of the code have complex system functions • Run well on most GPP
Software Hotspots Example • when 16 QuadAM modem (19.2 Kbaud) implemented entirely in software • takes 177,000 instruction cycles to execute on TIC6711 FPGA Co-processor (a few cycles)
PROCESSOR + FPGA Solving Hotspots MULTIPLE DSPs DSP ENABLED PROCESSORS FPGA P P P P P P RISC PROCESSOR PROGRAMMABLE LOGIC
Solving Hotspots PERFORMANCE SCP ASIC FPGA DSP CPU FLEXIBILITY & TTM SCP = Software Configurable Processor