420 likes | 977 Views
Digital Predistortion. Agenda. Introduction Algorithm Standard lookup table method Phase related errors Memory effect Implementation Multipliers Memory Cordic Processors. Introduction: Purpose . Technology demonstrator Show DPD can be done efficiently on PLD
E N D
Agenda • Introduction • Algorithm • Standard lookup table method • Phase related errors • Memory effect • Implementation • Multipliers • Memory • Cordic • Processors
Introduction: Purpose • Technology demonstrator • Show DPD can be done efficiently on PLD • Provide starting point for design • Show efficient implementation of key components • Provide FPGA benchmark for customer design
Ideal PA Predistorter Real PA VPA VRF VRF PA Vrf = kVin DPD Vd = 1/fnlVin PA Vrf = fnlkVd Vd Vd Vin Vin Vin VRF Overall LinearResponse Vrf = kVin Vin Introduction: Predistortion
Algorithms: Overview • Adaptive Lookup Table (LUT) • Lookup table for phase and magnitude correction values • Deals with magnitude dependent errors • Volterra modelling of PA • Direct implementation • Indirect LUT implementation
Algorithm: Distance Gradient Method • Assumption: Error only depends on magnitude I,Q out Ir*sin(φ) I,Q in I & Qmod r(I2 +Q2)1/2 PA Qr*cos(φ) φarctan(I/Q) LUT(∆r & ∆φ) address r(I2 +Q2)1/2 I & QDemod ·(-1) delay φarctan(I/Q) ·(-1) delay
Algorithm: Distance Gradient Method • MATLAB simulation results using SALEH PA model • EVM improved in region of 90%
Dist – Gradient LUT Freq Plots • Predistortion improves ACLR by 70dB • (considering 700-900 and 1100-1300 as side-band regions for measurements) • Simplified simulation environment (using SALEH PA model)
Algorithm: Phase related error • Adds dimension to lookup table • Increases memory, logic same size as before • Increases time to converge LUT content, LUT content, Without phase error compensation With phase error compensation
Algorithm: Memory effect • Models effect of short term temperature increase on silicon • Three possible approaches • Add “delta look-up table” (Intersil solution) • Add new dimension to LUT (fully adaptive) • Use FIR in magnitude address calculation
Algorithm: Memory effect • Error mechanism: • Error depends on temperature • Temperature depends on previous magnitudes • New PA model: • Altera model • Error depends on current and previous magnitudes • Error independent of input phase • Solution • FIR filter in address calculation
DSP Blocks Predistortion Reference Design I,Q out I & Qmod PA I,Q in FIR address LUT(∆I & ∆Q) ∆I = ∆r*sin(∆φ) ∆Q = ∆r*cos(∆φ) r(I2 +Q2)1/2 I & Q Demod r(I2 +Q2)1/2 FIR ·(-1) delay φarctan(I/Q) φarctan(I/Q) ·(-1) delay Sync NCO Processor + hardware acceleration
+ - S Output MUX Input Registers Optional Pipelining Output Registers + + - S DSP Block Architecture & Resources • High Performance DSP Operation • 18x18 Functions at 282 MHz • Input, Output & Pipelining registers • Reduce overall Logic usage • Add/Accumulate/Subtract • Signed & unsigned operations • Dynamically change between Add & Subtract • Support complex multiplications • (Ar + jAi) x (Br + jBi) = (Ar Br – AiBi) + j(Ai Br +ArBi) • 4 Multiplications, 1 Addition & 1 Subtraction · · ·· - +
DSP Blocks RAM Predistortion Reference Design I,Q out I & Qmod PA I,Q in FIR address LUT(∆I & ∆Q) ∆I = ∆r*sin(∆φ) ∆Q = ∆r*cos(∆φ) r(I2 +Q2)1/2 I & Q Demod r(I2 +Q2)1/2 FIR ·(-1) delay φarctan(I/Q) φarctan(I/Q) ·(-1) delay Sync NCO Processor + hardware acceleration
More Bits For Larger Memory Buffering More Data Ports for Greater Memory Bandwidth TriMatrix™ Memory • Today’s applications need more high performance memory • One size does not fit all • Wide choice of modes and widths M512 Blocks M4K Blocks M-RAM External Memory Devices • True Dual Port RAM • Embedded Shift Register Mode • Operates Up to 312Mhz • Rate Changing • Embedded Shift Register Mode • Operates Up to 312Mhz • True Dual Port RAM • Embedded Shift Register Mode • 512K bits • Operates Up to 300Mhz • DDR SDRAM & SRAM • SDR SDRAM • QDR & QDRII SRAM • ZBT SRAM • DDR FCRAM
DSP Blocks RAM CORDIC Predistortion Reference Design I,Q out I & Qmod PA I,Q in FIR address LUT(∆I & ∆Q) ∆I = ∆r*sin(∆φ) ∆Q = ∆r*cos(∆φ) r(I2 +Q2)1/2 I & Q Demod r(I2 +Q2)1/2 FIR ·(-1) delay φarctan(I/Q) φarctan(I/Q) ·(-1) delay Sync NCO Processor + hardware acceleration
CORDIC • Hardware efficient algorithm for computing functions such as: • Trigonometric • Hyperbolic • Logarithmic • Iterative solution that uses only shifts and adding/subtracting • High performance as no multiplications and divisions • Simple/less hardware required
Altera CORDIC solution for DPD X_in CORDIC X_out Y_in Y_out Z_in Z_out mode • Cartesian to Polar conversion • X_in, Y_in = Cartesian values, Z_in=0, mode = 0 • X_out = magnitude, Z_out = phase • Polar to Cartesian conversion • X_in = magnitude, Z_in=phase, Y_in=0, mode = 1 • X_out, Y_out = Cartesian values • Mode selects conversion direction • Pipelined enabling new inputs to be applied in every clk cycle • After initial latency valid outputs will appear on every clk cycle • Timesharing : on each clk cycle the mode of the CORDIC can be changed
CORDIC Architecture Iteration 1 Iteration n Quadrant detect & IP modify Add/Sub & Shift Reg Quadrant Adjust • Parallel Architecture enabling high performance • CORDIC algorithm can only deal with vector rotations of –90 to +90 degrees • Require additional logic (Quadrant blocks) to be able to deal with vectors in any of the • four quadrants • Parameterisable code • input vector widths and • number of iterations can be changed.
CORDIC Implementation • LEs in Altera PLDs • Each LE is suited for implementing the required adders/subtractors. • LEs can dynamically change from operating as an adder to subtractor • Each LE contains a register • Performance
DSP Blocks RAM CORDIC Predistortion Reference Design I,Q out I & Qmod PA I,Q in FIR address LUT(∆I & ∆Q) ∆I = ∆r*sin(∆φ) ∆Q = ∆r*cos(∆φ) r(I2 +Q2)1/2 I & Q Demod r(I2 +Q2)1/2 FIR ·(-1) delay φarctan(I/Q) φarctan(I/Q) ·(-1) delay Sync NCO Processor + hardware acceleration
Implementation: Processor? • Should we use processor? • For • Flexibility • Easy to add custom interpolation or similar • Low data rate in feedback path at base band • Against • Straightforward data path (few “IF” branches) • Too slow at IF • No clear size advantage • Difficult to exploit deeply pipelined CORDIC
Excalibur™ Embedded Processor Cores • Hard Core Advantages • High Performance 922TDMI • Time-to-Market • Lots of On-Chip Memory • Leverage Large Existing Code • Base 200 100 • Soft Core Advantages • Flexibility • Low Cost • Portable Design • Scalability • Obsolescence Proof • Fits Broad Range of Altera PLD Families Performance (Dhrystone MIPS 2.1) 50 20 Soft Core Hard Core 0
Target devices • Stratix - • Contains DSP Blocks • TriMatrix RAM allows for Large lookup tables (multiple dimensions) • Suitable if up/down converters are also integrated • Cyclone - • Extensive use of CORDIC • Lowest cost
Ref Design Resource Utilisation Estimates • 5000 LEs (50% of avail in 1S10) • 4 DSP blocks (67% of avail in 1S10) • 3 M4K RAM blocks (5% of avail in 1S10) • 2 M512 RAM blocks (2% of avail in 1S10) • Assumes 18bit wide I/Q, 64 deep X 32 bit wide LUT. • The ref design only contains the adaptive lookup table algorithm.
Summary • Reference design based on lookup tables on Stratix 1S10 • Works for memoryless PA • Compensates for memory effect • Assumption: Errors independent of phase • Can be tuned and modified • Open Source – extract key components, leave the rest