1 / 54

Lecture 11: Interfaces, I/O and Configurable Processors

Lecture 11: Interfaces, I/O and Configurable Processors. Professor Kurt Keutzer Computer Science 252 Spring 2000 With contributions from Prof. David Patterson Niraj Shah, Scott Weber. Embedded System Runs a few applications often known at design time Not end-user programmable

chapa
Download Presentation

Lecture 11: Interfaces, I/O and Configurable Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 11: Interfaces, I/O andConfigurable Processors Professor Kurt Keutzer Computer Science 252 Spring 2000 With contributions from Prof. David Patterson Niraj Shah, Scott Weber Kurt Keutzer

  2. Embedded System Runs a few applications often known at design time Not end-user programmable Operates in fixed run-time constraints, additional performance may not be useful/valuable General purpose computing Intended to run a fully general set of applications End-user programmable Faster is always better Embedded Systems vs. General Purpose Computing - 1 Kurt Keutzer

  3. Embedded System Differentiating features: power cost speed (must be predictable) General purpose computing Differentiating features speed (need not be fully predictable) speed did we mention speed? cost (largest component power) Embedded Systems vs. General Purpose Computing - 2 Kurt Keutzer

  4. Configurabilty and Embedded Systems • Advantages of configuration: • Pay (in power, design time, area) only for what you use • Gain additional performance by adding features tailored to your application: • Particularly for embedded systems: • Principally in embedded controller microprocessor applications • Some us in DSP Kurt Keutzer

  5. What to Configure? • What parts of the microcontroller/microprocessor system to configure? • Easy answers: • Memory and Cache Sizes - get precisely the sizes your applications needs • Register file sizes • Interrupt handling and addresses • Harder answers: • Peripherals • Instructions • But first we need more context Kurt Keutzer

  6. I/O Interrupts • An I/O interrupt is just like the exception handlers except: • An I/O interrupt is asynchronous • Further information needs to be conveyed • An I/O interrupt is asynchronous with respect to instruction execution: • I/O interrupt is not associated with any instruction • I/O interrupt does not prevent any instruction from completion • You can pick your own convenient point to take an interrupt • I/O interrupt is more complicated than exception: • Needs to convey the identity of the device generating the interrupt • Interrupt requests can have different urgencies: • Interrupt request needs to be prioritized Kurt Keutzer

  7. PC saved Disable All Ints Supervisor Mode Raise priority Reenable All Ints Save registers  lw $r1,20($r0) lw $r2,0($r1) addi $r3,$r0,#5 sw $r3,0($r1)  Restore registers Clear current Int Disable All Ints Restore priority RTI  add $r1,$r2,$r3 subi $r4,$r1,#4 slli $r4,$r4,#2 Hiccup(!) lw $r2,0($r4) lw $r3,4($r4) add $r2,$r2,$r3 sw 8($r4),$r2  External Interrupt “Interrupt Handler” Restore PC User Mode Example: Device Interrupt • Advantage: • User program progress is only halted during actual transfer • Disadvantage, special hardware is needed to: • Cause an interrupt (I/O device) • Detect an interrupt (processor) • Save the proper states to resume after the interrupt (processor) Kurt Keutzer

  8. Interrupt Driven Data Transfer CPU add sub and or nop user program (1) I/O interrupt (2) save PC Memory IOC (3) interrupt service addr device read store ... rti interrupt service routine User program progress only halted during actual transfer 1000 transfers at 1 ms each: 1000 interrupts @ 2 µsec per interrupt 1000 interrupt service @ 98 µsec each = 0.1 CPU seconds (4) memory -6 Device xfer rate = 10 MBytes/sec => 0 .1 x 10 sec/byte => 0.1 µsec/byte => 1000 bytes = 100 µsec 1000 transfers x 100 µsecs = 100 ms = 0.1 CPU seconds Still far from device transfer rate! 1/2 in interrupt overhead Kurt Keutzer

  9. Better Way to Handle Interrupts? • Handling all interrupts with CPU could bring it to a halt in a real time system • Isn’t there a better way? • Hint, remember the trickledown theory of embedded processor architecture. Kurt Keutzer

  10. Trickle Down Theory of Embedded Architectures • Mainframe/supercomputers • High-end servers/workstations • High-end personal computers • Personal computers • Lap tops/palm tops • Gadgets • Watches • ... • Features tend to trickle down: • #bits: 4->8->16->32->64 • ISA’s • Floating point support • Dynamic scheduling • Caches • I/O controllers/processors • LIW/VLIW • Superscalar Kurt Keutzer

  11. I/O Interface CPU Memory memory bus Independent I/O Bus Separate I/O instructions (in,out) Interface Interface Peripheral Peripheral CPU Lines distinguish between I/O and memory transfers common memory & I/O bus 40 Mbytes/sec optimistically 10 MIP processor completely saturates the bus! VME bus Multibus-II Nubus Memory Interface Interface Peripheral Peripheral Kurt Keutzer

  12. Delegating I/O Responsibility from the CPU: IOP D1 IOP CPU D2 main memory bus Mem . . . Dn I/O bus target device where cmnds are OP Device Address CPU IOP (1) Issues instruction to IOP (4) IOP interrupts CPU when done IOP looks in memory for commands (2) OP Addr Cnt Other (3) memory what to do special requests Device to/from memory transfers are controlled by the IOP directly. IOP steals memory cycles. where to put data how much Kurt Keutzer

  13. Memory Mapped I/O CPU Single Memory & I/O Bus No Separate I/O Instructions ROM RAM Memory Interface Interface Peripheral Peripheral CPU $ I/O L2 $ Memory Bus I/O bus Memory Bus Adaptor Kurt Keutzer

  14. Delegating I/O Responsibility from the CPU: DMA CPU sends a starting address, direction, and length count to DMAC. Then issues "start". • Direct Memory Access (DMA): • External to the CPU • Act as a master on the bus • Transfers blocks of data to or from memory without CPU intervention CPU Memory DMAC IOC device DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory. Kurt Keutzer

  15. Direct Memory Access Time to do 1000 xfers at 1 msec each: 1 DMA set-up sequence @ 50 µsec 1 interrupt @ 2 µsec 1 interrupt service sequence @ 48 µsec .0001 second of CPU time CPU sends a starting address, direction, and length count to DMAC. Then issues "start". 0 ROM CPU Memory Mapped I/O RAM DMAC Memory IOC device Peripherals DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory. DMAC n Kurt Keutzer

  16. 68332 Family • 68K was the most successful embedded controller in history • CISC instruction set - good code density • Table lookup for compressed tables • Time processing unit - breakthrough in modular peripheral handling! Kurt Keutzer

  17. MC68332 - Top level IMB inter module bus I/0 - channel 0 time CPU32 processing unit TPU I/0 - channel 15 serial I/0 RAM IMB control Designed for automotive applications with mixture of computation intensive tasks and complex I/0 -functions Idea: off-load CPU from frequent I/0 interactions to make use of computation performance: TPU Kurt Keutzer

  18. 68332 CPU Block Diagram Kurt Keutzer

  19. Addressing Modes in 68332 • Seven modes • Register direct • Register indirect • Register indirect with index • Program counter indirect with displacement • Program counter indirect with Index • Absolute • Immediate Why so many modes? Antiquated architectural feature? Kurt Keutzer

  20. Addressing Modes in 68332 • Seven modes • Register direct • Register indirect • Register indirect with index • Program counter indirect with displacement • Program counter indirect with Index • Absolute • Immediate Complex addressing modes allow for more dense code … but … MCore - Mot’s embedded micocontroller rewrite uses simple DLX-like Load Store instructions - code size impact? Kurt Keutzer

  21. MC68332 Time Processing Unit independent programmable timer channels: single-shot "capture & compare" channel coupling and sequence control with control processor Host Timer Service Control Scheduler Interface Channels Requests Channel 0 System Configuration Channel 1 Channel time pin Development base IMB Support Pins and Test Microengine Channel Control Control Control and Data Store Data Store Parameter Execution RAM Unit Channel 15 TPU: time processing unit: peripheral coprocessor Kurt Keutzer

  22. Time Processing Unit Kurt Keutzer

  23. Time Processing Unit • Semi-autonomous microcontroller • Operates concurrently with CPU • Schedules tasks • Processes ROM instructions • Accesses shared data with CPU • Performs Input/Output Kurt Keutzer

  24. Uses of Time Processing Unit • Programmable series of two operations • Match • Capture • Each operation is called an ``event’’ • A pre-programmed series of event is called a ``function’’ • Pre-programmed functions • Input capture/input transition counter • Output compare • Period measurement with addition/missing transition detect • Position synchronized pulse-generator • Period/pulse-width accumulator Kurt Keutzer

  25. Time Bases • Two sixteen-bit counters provide time bases for all • Pre-scalers controlled by CPU via bit-fiels in TPU module configuration register TPUCMR • Current values accessible via TCR1 and TCR2 registers • TCR1, TCR2 can be read/written by TPU microcode- not available to CPU • TC1 qualified by system clock • TC2 qualified by system clock or external clock Kurt Keutzer

  26. Timer Channels • Sixteen channels • - each one connect to a MCU pin • Each channel has symmetric hardware: • Event register • 16-bit capture register • 16-bit compare/match register • 16-bit comparator • Pin control logic - pin direction determined by TPU microengine Kurt Keutzer

  27. Scheduler • Determines which of sixteen channels is serviced by the microenginer • Channel can request service for one of four reasons • host service • link to another channel • match event • capture event • Host system assigns to each channel a priority • high • middle • low Kurt Keutzer

  28. Microengine • Determines which of sixteen channels is serviced by the microenginer • Channel can request service for one of four reasons • host service • link to another channel • match event • capture event • Host system assigns to each channel a priority • high • middle • low Kurt Keutzer

  29. Another Motorola Microprocessor Kurt Keutzer

  30. Concepts so far ... • Interrupts • Memory Mapping of I/O • Time Processing Unit / Peripheral Processor • other configurable elements • Peripherals • Instructions Kurt Keutzer

  31. Configurability in ARM Processor • ARM allows for configurability via AMBA bus • Offers ``prime cell’’ peripherals which hook into AMBA Peripheral Bus (APB) • UART • Real Time Clock • Audio Codec Interface • Keyboard and mouse interface • General purpose I/O • Smart card interface • Generic IR interface • http://www.arm.com/Pro+Peripherals/PrimeCell/index.html Kurt Keutzer

  32. ARM7 core Kurt Keutzer

  33. ARM’s Amba open standard • Advanced System Bus, (ASB) - high performance, CPU, DMA, external • Advanced Peripheral Bus, (APB) - low speed, low power, parallel I/O, UART’s • External interface http://www.arm.com/Documentation/Overviews/AMBA_Intro/#intro Kurt Keutzer

  34. Ex1: ARM Infrared (IR) Interface Kurt Keutzer

  35. Ex 2: ARM Smart Card Interface Kurt Keutzer

  36. Ex 3: Audio Codec Kurt Keutzer

  37. HDL RTL Synthesis netlist Library logic optimization netlist physical design layout Another Kind of Configurability • Synthesis of a processor core from an RTL description allows for: • full range of other types of configurability • additional degrees of freedom in quality of implementation • Examples: • ARM7 • Motorola Coldfire • Tensilica Xtensa Kurt Keutzer

  38. Quality of Results Tradeoffs Synthesizable implementation allows for explanation of a wide range of implementations Delay Area Kurt Keutzer

  39. ARM Core7 Thumb Embedded Kurt Keutzer

  40. Ultimate configurabilty :The tensilica solution: Kurt Keutzer

  41. Tensilica Viterbi Implementation Niraj Shah Scott Weber 290A Final Presentation Kurt Keutzer

  42. Tensilica Flow .c .c .c TIE uArch xt-gcc gen Designer Tensilica Processor Generator gen xt-run .o Kurt Keutzer

  43. Xtensa Architecture • TIE Extensions: • single cycle • state free • no new exceptions • no stalls • typeless data • Rs, Rt, Rr are 32 bit regs • I is the instruction controlling the TIE unit • Xtensa Core is a 32 bit configurable RISC processor Xtensa Core Rs Rt I Rr TIE Kurt Keutzer

  44. Viterbi Architecture ADC I/0 Device Init RAM TraceBack ACS Measured Performance Here Kurt Keutzer

  45. TIE SetupBMreg (ACS) Rs Rt 31 8:7 0 31 8:7 0 I Q 0x7F + - + - - Control instruction bm0 bm1 bm2 bm3 0 7:8 15:16 23:24 31 Rr Kurt Keutzer

  46. ACS TIE Extension (ACS) Rs Rt 31 27 17 11 1:0 31 24:23 16:15 8:7 0 pm- pm- bm3 bm2 bm1 bm0 + msb =1? - + ACS03 || ACS12 || ACS30 || ACS21 instruction 0:1 11:12 31 0’s pm decision bit Rr Kurt Keutzer

  47. ACS TIE Extension with State (ACS) Rt Rs 1:0 0:1 11 17 27 31 31 27 17 11 31 24:23 16:15 8:7 0 pm- pm- bm3 bm2 bm1 bm0 pm- pm- + + msb msb + - =1? =1? - + Control 0:1 11 16:17 27 31 instruction decision bit pm pm decision bit Rr Kurt Keutzer

  48. TIE Zmask (TraceBack) Rs Rt 31 1:0 31 6:5 0 <<1 0x7F & | 0x3F & Control instruction 0 6:7 31 Rr Kurt Keutzer

  49. Designs • All designs had a BER of 0.000095 after 10 million iterations • Design 1 • 100 MHz, 48 mW, 1K DCache, 1K ICache, TIE • Design 1+ • 222 MHz, 144 mW, 1K DCache, 1K ICache, TIE • Design 2- • 100 MHz, 69 mW, 16K DCache, 16K ICache, TIE • Design 2 • 222 MHz, 191 mW, 16K DCache, 16K ICache, TIE • Design 3 • 222 MHz, 191 mW, 16K DCAche, 16K ICache, TIE with state Kurt Keutzer

  50. Performance Kb/s Kurt Keutzer

More Related