180 likes | 273 Views
A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS. Using Distributed Arithmetic and Residue Number System. Project Scope.
E N D
A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS Using Distributed Arithmetic and Residue Number System
Project Scope • To compare the implementation efficiencies (area times delay) of Distributed Arithmetic (DA), RNS and DA-RNS based parallel multiply accumulate architectures on FPGAs
Background and Context • FPGAs increasingly used for DSP computations • FPGAs have potential for parallelism • FPGAs architecture exploitation (LUT based) • Novel MAC architectures especially suitable for FPGAs
Some More Background • In DSP MACs use constant coefficient (Fixed Multiplicand) • Full Multiplier Implementation Not Required • Not All Multiplier Architecture Efficient for FPGAs
Motivation • Distributed Arithmetic and Residue Arithmetic techniques are LUT based techniques • Explore the “synergy” between FPGA architecture and above mentioned techniques
Residue Arithmetic Overview • (z1, z2, ..., zn) = (x1, x2, …, xn) (y1 ,y2, …, yn) • zi = (xi yi) mod mi • denotes any of the modulo operations of addition, subtraction or multiplication
Modulo Constant Multiplier • Due to the small sizes of residues and a constant multiplicand, a direct LUT based implementation is very efficient 4-bit Constant Modulo Multiplier 5-bit Constant Modulo Multiplier A0 A0 A1 X[3:0] A1 X[4:0] A2 A2 A3 A3 A4
Conversion Issues in RNS • Binary to RNS and RNS to Binary Conversion are significant overheads • Binary to RNS relatively simple • RNS to Binary Using a Direct CRT Implementation Requires Modulo M adders
Critical Path Results Source: PSC8_0_PSC_0/I_Q7 (FF) Destination: SACC24_REG2/I_Q3 (FF) Data Path: PSC8_0_PSC_0/I_Q7 to SACC24_REG2/I_Q3e)