430 likes | 718 Views
Embedded Processor 의 설계. Contents. Trends in Embedded Processor Design Design Example of Embedded Processor Low-power Design for Embedded Processor Compiler Issues for Embedded Processor Embedded OS. Trends in Embedded-Microprocessor Design. What are the Embedded Processor ?
E N D
Contents • Trends in Embedded Processor Design • Design Example of Embedded Processor • Low-power Design for Embedded Processor • Compiler Issues for Embedded Processor • Embedded OS
Trends in Embedded-Microprocessor Design • What are the Embedded Processor ? • The Embedded Marketplace • What are the Differences Between Desktop and Embedded Processors ? • Evaluation Parameter for Embedded Processors • The Future of the Embedded Processors
What’s the Embedded Processor ? Package • Major Profits of Embedded Application • Small area (no area loss by interconnection and package) • Low cost • Small delay between modules, small power loss in interconnection Package Chip Chip Embedded Processor Embedded Peripherals General Purpose Processor Custom IC Memory Modules Peripherals Peripherals Embedded Memory General Application Embedded Application
The Embedded Marketplace • Today’s Embedded-processor market • There are more than 100 vendors and two dozen instruction set architectures • Architectures for the Embedded Market • Motorola’s 68000 architecture (most successful in 1996) • Intel’s I960, Motorola’s Coldfire, Sun’s Sparc, Intel’s x86 • Advanced RISC Machines’ ARM, MIPS Technology’s MIPS • Selling their architecture to licensees as cores • Manufactured in different versions by various semiconductor companies • The Growing Success of Embedded Processor • The demand for video games, handheld computers, digital still cameras, and cellular phones
What are the Differences Between Desktop and Embedded Processors ? (1) • Traditional ways to partition the microprocessor world • In the past, it was sufficient to partition the market into four basic, clearly separated, target markets. • Emerging handheld, mobile, and multimedia applications require new classes of embedded processors.
What are the Differences Between Desktop and Embedded Processors ? (2) • What issues can differentiate embedded processor from desktop CPU? • Power consumption • Cost • Integrated peripherals (display, sound, etc.) • Other important features for embedded processor • Interrupt response time • Amount of on-chip RAM or ROM • Number of I/O ports • The most important point • Embedded microprocessor must do the job for a particular application at the lowest possible cost within the timing constraint.
Evaluation Parameterfor Embedded Processors • Essential criteria to compare embedded processors • Low power consumption • High code density • Less overhead of peripheral integration and number/cost of chipsets • Equipped with multimedia acceleration and acceleration of special application software • Lower price/performance ratio
Power Consumption • Embedded domain application cannot use a heat sink or a fan. • Most embedded microprocessors have three modes • Fully operational; • Clock signal is propagated to the entire processor • Stand-by; • Processor is not actually executing instructions, but all its stored information is still available • Power down; • System has to be restarted • Reducing power consumption • Reduced voltage level • Stopping transistor activity when a particular block is not in use • Integration of off-chip power-consuming peripherals • Announced target of all vendors in the 32-bit embedded arena is 1,000 MIPS/watt.
Code Density • Most RISC processors use fixed-length instructions • MIPS, Sparc, PowerPC • Fixed 32-bit instruction length negatively affected code density of programs for 32-bit RISC processors • To overcome fixed-length problem • SuperH(Hitachi), M-Core(Motorola) • Only 16 bits (rather than 32) for each instruction • Thumb extension(ARM) • recode 32-bit ARM instruction to 16-bit opcodes • on-chip logic decompresses Thumb code to ARM instruction in real time • Quality of C compilers
Peripherals and Higher Integration • More peripherals • increase chip complexity • increase cost • reduce yield • need more pins • need more testing • Integrated peripherals must simplify system design and shorten the development cycle of complete systems • Embedded DRAM • dissipates less power, because of the extremely small load capacitance of the bus
Multimedia Acceleration • Typical instructions for multimedia support • multiply-accumulate operation for fast FIR filtering • enhanced addressing modes • graphics acceleration • Discrete Cosine Transform for JPEG and MPEG image compression • Embedded microprocessors execute functions that a separate DSP processor performed in the past • in case of cellular phone, application needs • a lot of control to handle the protocol stack and interface the external world • equalization • voice encoding/decoding and compression • which was traditionally handled with a low-performance microcontroller and a DSP processor
Classification of the New 32-bit Embedded Processors • Classification of the New 32-bit Embedded Processors
The Future of the embedded processors • Embedded processor of the future will • offer plenty of MIPS • run DSP program like a dedicated DSP processor • integrate all its peripherals • cost only a few cents • No dominating architecture like the x86 • because the embedded world is driven by a variety of applications • Trend to use standard operating systems and platforms • to reduce the development cost • increase reusability • shorten design cycle time • Each market segment will have its dominant architecture and perhaps vendor
Example Design: SimpleCore • 16-bit RISC architecture with simple instructions • Harvard architecture • Separate instruction and data memories • Three-stage pipeline • No forwarding scheme required • Makes design and verification easy • Longer cycle time • Performance degradation Three-stage pipelining
Example Design: SimpleCore • 2001년 7월 부터 현재까지 IDEC Newsletter에 연재되고 있는 강좌 ‘마이크로프로세서설계 무작정 따라하기’를 위해 설계 • 설계자: 배영돈 (KAIST 박사과정) • Source code available: http://www.donny.co.kr/simplecore • 2,000 downloads made so far • Status: • Part I: RTL description • Part II: Development environment (Compiler, Assembler, etc) • Part III: Synthesis and place & route (연재 중)
Register file • 15 general purpose registers • A program counter • A status register
SimpleCore Instructions • Simple instructions • ALU, Shift/Rotate, Load, Store, Branch and Multiply • Similar to the commercial embedded processors Instruction Map
SimpleCore Instructions • ALU instructions Exit condition of a simulation
SimpleCore Instructions • Shift Instructions • Branch Instruction
Datapath • SimpleCore’s datapath consists of • pc: program counter, incr: address incrementor • regFile: register file, sr: status register • alu: ALU, shifter: barrel shifter, mul: multiplier • dIn: data input register, dOut: data output register
DATAPATH CONTROL LOGIC Design of Datapath • Datapath has regular structures • Data processing is often timing critical and requires large area; hence optimization is very important • Usually designed using datapath compiler or by full custom design manner Die photo of ARM7TDMI processor The datapath occupies about 40% of total area
RTL Description of Datapath • Block diagram of ALU
RTL Description of Datapath • RTL description of ALU always @ (aluAIn or aluBIn or aluCtl) begin casex(aluCtl) 3'b000: // MOVA {aluCarry, aluOut} = {1'b0, aluAIn}; 3'b001: // MOVB {aluCarry, aluOut} = {1'b0, aluBIn}; 3'b010: // AND {aluCarry, aluOut} = {1'b0, aluAIn & aluBIn}; 3'b011: // OR {aluCarry, aluOut} = {1'b0, aluAIn | aluBIn}; 3'b10?: // ADD {aluCarry, aluOut} = aluAIn + aluBIn; 3'b11?: // SUB {aluCarry, aluOut} = aluAIn - aluBIn; endcase end
RTL Description of Datapath • Block diagram of barrel shifter • Logic level implementation example • It can shift by any number of bits in a cycle Concept of barrel shifter (case of logical shift left by 5)
RTL Description of Datapath • RTL description of barrel shifter input [15:0] shiftIn; // shifter input input [ 3:0] shiftAmt; // shift amount input [ 2:0] shiftCtl; // control input {left/right, arith/logic, rotate} output [15:0] shiftOut; // shifter output reg [15:0] shiftOut; always @ (shiftIn or shiftAmt or shiftCtl) begin casex(shiftCtl) 3'b0?? : // LSL (logical shift left) shiftOut = shiftIn << shiftAmt; 3'b100 : // LSR (logical shift right) shiftOut = shiftIn >> shiftAmt; 3'b101 : // ASR (arithmetic shift right) shiftOut = ({16{shiftIn[15]}}<<(16-shiftAmt)) | (shiftIn >> shiftAmt); 3'b11? : // ROR (rotate right) shiftOut = (shiftIn >> shiftAmt) | (shiftIn << (16-shiftAmt)); endcase end
RTL Description of Datapath • Design of the multiplier • Multiplier occupies large area and is slower than other units • Many tools can synthesize multipliers but more sophisticated approaches are required for high performance • e.g., using module generator, employing Modified Booth algorithm, etc. • RTL description of multiplier input [15:0] mulAIn; // multiplier input A input [15:0] mulBIn; // multiplier input B output [15:0] mulOut; // multiplier output assign mulOut = mulAIn * mulBIn;
RTL Description of Datapath • Design of the register file • Register file requires large amount of area • Thus, commercial processors employ SRAM-like structures • For logic synthesis, however, SimpleCore uses flip-flops as storage elements
Design of Control Logic • Design of the decode unit • Instructions are compressed expression of execution information • Increases code density • Reduce the size of program memory • Decode unit extracts all the information required for execution Instruction Decoding
Design of Control Logic • RTL description of the decode unit • Distinction among instruction groups input [15:0] fInst; // fetch data output [ 2:0] instId; // instruction ID /* instruction ID */ always@(fInst) begin casex(fInst[15:12]) 4'b00?? : instId <= `INST_ALUI; // ALU imm. 4'b01?? : instId <= `INST_ALUR; // ALU reg. 4'b100? : instId <= `INST_SHRO; // Shift/Rotate 4'b1010 : instId <= `INST_LOAD; // Load 4'b1011 : instId <= `INST_STORE; // Store 4'b110? : instId <= `INST_BRANCH; // Branch 4'b11?0 : instId <= `INST_BRANCH; // Branch 4'b1111 : instId <= `INST_MUL; // Multiply endcase end instruction register Pre-defined macros with ‘define’ statements Ex) `define INST_ALUI 3’b0
Design of Control Logic • RTL description of the decode unit (cont’d) • Execution information extraction • Opcode, register indices, immediate values, condition flags, etc. /* field extraction */ assign opcode = fInst[13:11]; assign shift = fInst[12:11]; assign rs1Idx = fInst[ 3: 0]; assign rs2Idx = {1'b0, fInst[ 6: 4]}; assign rdIdx = fInst[10: 7]; assign imm = (instId == `INST_BRANCH) ? {{4{fInst[11]}}, fInst[11:0]} : // sign extension {9'b0, fInst[6:0]}; // zero extension assign immFlag = (instId == `INST_ALUI); assign cmpFlag = (fInst[15:11] == 5'b00101); assign branchFlag = (instId == `INST_BRANCH); assign exitFlag = (fInst[15:11] == 5'b00111); // end of simulation assign srWbEn = (fInst[15:11] == 5'b00110) || (fInst[15:11] == 5'b00101); // MSR or CMP assign srOEn = (fInst[15:11] == 5'b00111); // MRS
Design of Control Logic • Design of the execute unit • Most of the operation in execute stage are performed in the datapath • Execute unit generates control signals for the functional units in the datapath • Design of the execute unit is assigning of values of each control signal for the corresponding instruction Control signals for the functional unit
Design of Control Logic • RTL description of the execute unit
Design of Control Logic • Pipelining • Inserting pipeline register between pipeline stages • SimpleCore performs lots of operation in execute stage • Critical path exists in the execute stage Critical path
Design of Control Logic • Pipelining (cont’d) • Insertion of pipeline registers considering the critical path Inserting pipeline registers considering the critical path
Design of Control Logic • Hazard • Data Hazard • Not exists in SimpleCore as data are read and written in the same pipeline stage (no forwarding) • Control Hazard (Branch) • Newly fetched instructions are executed rather than prefetched instructions (cf., delayed branch) Branch instruction
Simulation • Memory models for both instruction(imem) and data(dmem) memories
Simulation • Displaying the results
Verification • Compare the status(e.g., register file) and data on the IOs between C-model(ISS) and RTL using PLI (program language interface)