1 / 33

Advanced Processor Architectures for Embedded Systems

Advanced Processor Architectures for Embedded Systems. Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation. Objectives. Discuss ASIC, FPGA-based systems, and general purpose processors Analyze the operating requirements for today’s embedded processors

kina
Download Presentation

Advanced Processor Architectures for Embedded Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation

  2. Objectives • Discuss ASIC, FPGA-based systems, and general purpose processors • Analyze the operating requirements for today’s embedded processors • Observe the architectural differences between state-of-the-art processors for embedded systems and high-performance general purpose processors • Tensilica Xtensa • Stretch S5000

  3. Embedded Processors Requirements • operate in memory constraint environment • must be energy efficient • must be low cost • may have to be good at a common set of tasks • matrix multiplication, • encryption, • filtering (FIR), • network packet processing, etc.

  4. Implications • low memory footprint • simplified instruction set • 16-bit, 24-bit • may not need support for VM • may lack hardware MMUs • energy efficient • less complex (smaller number of transistors) • simple pipeline stages • less cache memory on chips • simple floating point units • larger transistors and slower clocks • integrated function specific components for common tasks

  5. Implications (cont.) • low cost • share IP cores to reduce development cost • ARM, MIPS, etc. • use older semiconductor process technologies (e.g. 250nm instead of 90 nm) • task specific • built in DSP unit • wide data bus (more data per movement) • may need support for adding functions to the cores • may need field-reconfigurability

  6. Rationales from “The Death of Micro-Processors”, Nick Tredennick and Brion Shimamoto, Embedded Systems Programming, http://www.embedded.com/showArticle.jhtml?articleID=26807160

  7. Rationales (cont.) from “The Death of Micro-Processors”, Nick Tredennick and Brion Shimamoto, Embedded Systems Programming, http://www.embedded.com/showArticle.jhtml?articleID=26807160

  8. Rationales (cont.) “Studies have shown that custom hardware components often require much less energy to complete their tasks than the same tasks running on general purpose processors.” [1] “An ASIC is custom logic for a particular application. Custom logic can be orders of magnitude more efficient than microprocessor-based solutions.” [2] [1] Lach et al., “Power-Efficient Adaptable Wireless Sensor Networks”, Proceedings of International Conference on Military and Aerospace Programmable Logic Devices (MAPLD), September 2003. [2] Tredennick and Shimamoto, “The Death of Micro-Processors”, Embedded Systems Programming, http://www.embedded.com/showArticle.jhtml?articleID=26807160

  9. Application Specific ICs (ASICs) • provide custom design solutions for particular problems • fixed solutions that require public acceptance to reduce cost • required extensive knowledge of hardware design • not field-reconfigurable • can have large non-recurring engineering (NRE) cost

  10. ASICs (cont.) Wayne Wolf, FPGA-Based System Designs, Prentice Hall, 2004

  11. FPGA Based Systems • Field-programmable gate arrays (FPGAs) • are slower and require more power than custom design • are more expensive • but provide no wait time from completing a design to making a chip • great for prototyping • are also reusable

  12. FPGAs • SRAM based--volatile • Altera Flex, Stratix, Cyclone, Apex • Antifuse--one-time programmable • Actel • EEPROM--non-volatile • Altera Max

  13. ASIC Design Approaches • Custom VLSI designs • are fabricated on manufacturing line • takes months • masking cost is also expensive • operate much faster and consume less power than FPGA equivalents • can be cheaper of manufactured in large volume

  14. ASIC Design Approaches (cont.) • Structured ASIC • is based on pre-designed logic fabric structurally embedded in the platform • fill the market gap between high-density FPGAs and standard cell ASICs • can greatly reduce development time and cost • reduce non-recurring engineering (NRE) cost http://www.amis.com/asics/structured_asics/ http://www.altera.com/b/hardcopyii.html?WT.mc_id=h2_sm_go_xx_tx_2_041&WT.srch=1

  15. Structured ASICs View Altera demo

  16. Integrating ASICs with GPPs • Today’s embedded systems have can have complex software layers • OS • Virtual Machine • Applications • It is more ideal to mate GPPs with ASICs as co-processors

  17. Integrating ASICs with GPPs (cont.) • So, we can have GPPs to perform basic tasks and ASICs (co-processors) to speed up computing intensive functions • sounds simple but in reality, it is quite complex • basic hand-shaking is needed between the ASICs and the main processors • data exchange • shared memory • requires OS and architecture support • synchronous or asynchronous calls • cache coherency issue

  18. ASICs and GPPs (cont.) • An example is to use hardware co-processor for Cryptography • should the co-processor calls be synchronous • main processor blocked on calls and wait for response • or asynchronous • calling process blocked and swapped out • need interrupt support • need to maintain context

  19. ASICs and GPPs (cont.) • Co-processor • shares bus with the main CPU • is a source for bus contention • can cause cache coherency issue • data in the main CPU cache may have been updated by the co-processor • flush the cache accordingly • should be equiped with DMA to relieve the main CPU from copying data

  20. Extending GPPs • Tensilica Xtensa • reconfigurable processor cores • support native 16-bit and 24-bit instruction for higher code density • users can add/subtract components (MMU, Multipliers, FPUs) • users can reconfigure cache organization • users can select bus width (32, 64, or 128 bits) • users defined instruction extension language • users can create custom instructions to speed up commonly used functions • users can instantiate custom registers of different sizes

  21. Tensilica Xtensa from http://www.tensilica.com/html/tensilica_instruction_extensio.html

  22. Tensilica Xtensa (cont.) • We will not go into great detail about the Xtensa. • However, we will study Stretch S5000 engine which is based on the Xtensa core.

  23. Design Time Solutions • Up to now, we have only talked about design-time solutions! • logic designs are done in house • not very reconfigurable after the chip is made • even with FPGAs, someone has to come up with a new hardware design for it to change • the Xtensa needs about 1 hours to synthesize the instruction extension • What if we want to configure on the fly! • each application brings in CPU intensive functions • these functions are not known in advance • Can we leave it up to the software developers to design fast co-processor?

  24. Run-Time Configuration

  25. (R)evolution of Processors Ice Hard Rock Hard Playdough Hard

  26. (R)evolution of Processors Ice Hard Hardwire, GPP Perform well in most conditions but not extreme conditions Rock Hard Playdough Hard

  27. (R)evolution of Processors Ice Hard GPP with FPGAs Custom designs perform well in some extreme conditions. Required extensive knowledge Of hardware design Rock Hard Play Dough Hard

  28. (R)evolution of Processors Ice Hard Rock Hard GPP with embedded programmable logics Playdough Hard Reconfiguration triggered by software

  29. (R)evolution of Processors • Ice Hard • Contains ASIC (Application Specific IC) designs • Increases time-to-market • Takes time to reconfigure

  30. Software Hotspots • In DSP • 80% of the processing load are spent on 20% of the code • Hand tuned assembly that can take thousands of cycle to execute. • Less portable • The remaining 80% of the code have complex system functions • Run well on most GPP

  31. Software Hotspots Example • when 16 QuadAM modem (19.2 Kbaud) implemented entirely in software • takes 177,000 instruction cycles to execute on TIC6711 FPGA Co-processor (a few cycles)

  32. PROCESSOR + FPGA Solving Hotspots MULTIPLE DSPs DSP ENABLED PROCESSORS FPGA P P P P P P RISC PROCESSOR PROGRAMMABLE LOGIC

  33. Solving Hotspots PERFORMANCE SCP ASIC FPGA DSP CPU FLEXIBILITY & TTM SCP = Software Configurable Processor

More Related