280 likes | 422 Views
Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC. Outline. What is a “Soft” Processor What is the NIOS II? Architecture for NIOS II, what are the implications TigerSHARC VS. NIOS II Pipeline Issues Issues related to FIR Hardware acceleration, using FPGA logic.
E N D
Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
Outline • What is a “Soft” Processor • What is the NIOS II? • Architecture for NIOS II, what are the implications • TigerSHARC VS. NIOS II • Pipeline Issues • Issues related to FIR • Hardware acceleration, using FPGA logic
What’s is a “Soft” Processor? • Processor implemented in VHDL, Verilog, etc., and downloaded onto FPGA hardware • Can implement many parallel processors on one FPGA • Can use addition FPGA resources on the same chip that is not part of the processor core. • NIOS II is a “Soft” Processor
Why “Soft” Processor? • Higher level of design reuse • Reduced obsolescence risk • Simplified design update or change • Increased design implementation options • Lower latency between processor and FPGA components
What is NIOS II? • Software-defined processor • The processor core is loaded onto FPGA • Programmed using ‘normal’ programming tools (C, asm), not hardware description languages • Can use the rest of the FPGA hardware for accelerating parts of the code
How Is NIOS II Implemented • The custom FPGA logic that interacts with the processor is implemented in Altera Quartus II • The Avalon Interface bus (common instruction/data bus) is implemented in Quartus II • The architecture is generated in Quartus II and used for programming in Eclipse IDE
NIOS II IDE • Coding is implemented in Eclipse rather than VisualDSP.
The Different NIOS II Cores • There are 3 cores available from Altera • NIOSII/e: Economical Core • NIOSII/s: Standard Core • NIOSII/f: Fast Core
What’s the Difference between the Cores? An LE is equivalent to a 8-1 NAND gate + 1 D-Flip Flop An ALM is equivalent to 2 LE’s
NIOS II Architecture -thirty two 32-bit general registers, six 32-bit control registers -variable cache based on how much FPGA space you have -ALU- 32bit two input to one input, does shifts, logic and arithmetic. Shifter is not separate like TigerSHARC
Avalon Interface -separate address, data and control lines -up to 1024-bit data width transfer, can be set to any width (not power of 2) -one transfer per clock cycle.
NIOS II/f pipeline • Six stages • One instruction can be dispatched and/or retired pre cycle • Dynamic branch prediction: 2-bit branch history table (no BTB like in TigerSHARC)
NIOS II/f pipeline The pipeline stalls for: • Multi-cycle instructions • Cache misses • Data dependencies (2 cycles between calculating and using result) Mispredicted branch penalty: 3 cycles
Hardware multiply • Can use different options for multiplier (at the processor design stage) • No h/w multiply (saves FPGA gates) • Speed depends on algorithm • Use embedded multipliers (if FPGA has those) • 1-5 cycles (depends on FPGA) • Implement multipliers on FPGA gates • 11 cycles • Division 4-66 cycles on hardware
Compare to TigerSHARC • No support for parallel instructions • No support for SIMD operations • Multicycle instructions stall the pipeline All the above limitations can be overcome by using FPGA space unoccupied by the processor itself
Integer FIR algorithm intcoeff[]={1, 2, 3, 4, 5, 6, 7, 8}; int data1[] = {1, 0, 0, 0, 0 ,0 ,0,0}; int output[8]; inti=0, j=0, k=0; for(k=0; k<8; k++) output[k] =0; for( j =0; j< 8; j++) { for( i= 0; i< 8; i++) { output[j] += data1[i]*coeff[7-i]; } }
Speed analysis • 9 cycles per iteration except the first two (branch predicted not taken) and the last (branch predicted taken) – those will be 9+3=12 cycles • 1 data stall – can remove by moving instruction from line 4 to 7 • Speed: 8 cycles * (N-3) + 11 cycles * 3 = 8*(N-3)+33 cycles • For 1024-tap FIR: 8201 cycles • Clock cycle is 3 times longer (200MHz vs 600MHz)
Speed comparison • 8201 NIOS II cycles equivalent to 24603 TigerSHARC cycles • Lab3 timing: • 56000 cycles Debug mode • 13000 unoptimized ASM • 4000 Optimized ASM Worse than unoptimized assembly, but no hardware acceleration used, so this is not that bad
Hardware Acceleration • Profiling tool in Eclipse can show how long each function takes • If function takes too long, it can be sped up by • Custom instructions • Hardware Acceleration • Hardware Acceleration is to take the function and transform it into FPGA circuitry
Hardware Acceleration • Can be done using C2H compiler from Altera • Trades off Logic Size for Speed up.
Conclusion • “Soft” Processors such as the NIOSII offers another alternative in the embedded system scene. • The NIOSII offers the advantage of added configurability, and customization that blur the line between FPGAs and DSPs
References [1] http://www.fpgajournal.com/articles/behere.htm Describes an FPGA-DSP project based on Altera Nios [2] http://www.altera.com/products/ip/processors/nios2/ni2-index.html Official Nios II page [3] http://www.hunteng.co.uk/dsp-fpga.htm DSP or FPGA? What is better when? [4] http://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdf Article from Xilinx about FPGA DSPs [5] http://www.niosforum.com Community forum for NIOS [6] http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf NIOSII Processor Handbook –Altera Corporation [7] http://www.altera.com/literature/manual/mnl_avalon_spec.pdf Avalon Memory-Mapped Interface Specifications – Altera Corporation [8] http://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.html ADSP-TS201S 500/600 MHz TigerSHARC Processor with 24 Mbit on-chip embedded DRAM