260 likes | 279 Views
This paper discusses the implementation of high throughput LDPC decoders using a multiple split-row method. The proposed method reduces interconnect complexity and processor complexity while increasing throughput. It is well-suited for long-length LDPC codes and hardware implementations.
E N D
High Throughput LDPC Decoders Using a Multiple Split-Row Method Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis
Outline • Introduction to LDPC Codes and Decoders • Multi-Split-Row Decoding Method • Implementing Multi-Split-Row Decoders • Conclusion
Error Correction in Communication Systems Error correction is widely used in communication systems
LDPC Codes Applications • Standards • Digital Video Broadcasting (DVB-S2): 2005 • 10 Gigabit Ethernet (10GBASE-T): 2006 • Next generation of WiMAX • Challenges with LDPC decoders • High memory bandwidth requirement • High interconnect complexity • Many target applications are power and cost constrained
Row Processing é ù 0 0 1 1 0 0 0 1 0 ê ú 1 0 0 0 1 0 0 0 1 ê ú ê ú 0 1 0 0 0 1 1 0 0 ê ú Column Processing H = 0 0 1 0 1 0 1 0 0 ê ú ê ú 1 0 0 0 0 1 0 1 0 ê ú ê ú 0 1 0 1 0 0 0 0 1 ë û LDPC Decoding: Message Passing Algorithm α • Performs row and column operations iteratively • Example (9,5) LDPC Code • Code length (N) = 9 • Information length = 5 • Row weight (Wr) = 3 • Column weight (Wc) = 2 Row processing Column processing β
é ù 0 0 1 1 0 0 0 1 0 ê ú 1 0 0 0 1 0 0 0 1 ê ú ê ú 0 1 0 0 0 1 1 0 0 = ê ú H 0 0 1 0 1 0 1 0 0 ê ú ê ú 1 0 0 0 0 1 0 1 0 ê ú ê ú 0 1 0 1 0 0 0 0 1 ë û Message Passing (Row processing ) Row Processing SPA: MinSum:
é ù 0 0 1 1 0 0 0 1 0 ê ú 1 0 0 0 1 0 0 0 1 ê ú ê ú 0 1 0 0 0 1 1 0 0 = ê ú H 0 0 1 0 1 0 1 0 0 ê ú ê ú 1 0 0 0 0 1 0 1 0 ê ú ê ú 0 1 0 1 0 0 0 0 1 ë û Message Passing (Column processing ) Column Processing is the received information from the channel
Decoder Architectures • Serial decoders • Single row processor, column processor, shared memory • Simple and small area • Disadvantages • Low throughput: 100 Kbps - 10 Mbps • Semi-parallel decoders • Multiple row and column processors, multiple memory banks • Higher throughput • Example: 2048-bit, rate-1/2, (3,6) programmable decoder [Mansour 2006] • 14.3 mm2, 0.18 μm CMOS • 125 MHz, 640 Mbps
5x384x32 =61440 Row Row Row 2 1 384 5x2048x6 =61440 Col Col Col Col 2048 1 2 3 Full Parallel Decoders • Row and column processors are directly mapped according to the parity check matrix • Highest throughput • Major challenges • Routing congestion due to extrinsic information passed between row and column processors • Large delay, area, and power caused by long wires • Example: 1024-bit, irregular code, 4 bits per symbol, [Blanksby 2002] • 52.5 mm2, 0.16 μm CMOS • 64 MHz, 1Gbit/sec M N
Outline • Introduction to LDPC Codes • Split-Row Decoder Algorithm • Multi-Split-Row Decoding Method • Implementing Multi-Split-Row Decoders • Conclusion
Goals • Very high throughputs • Area efficient (small circuit area) • Therefore more energy efficient • Well suited for long-length LDPC codes • Well suited for hardware implementations
The Multi-Split-Row Decoder • Key ideas • H matrix is split into multiple blocks • Each block is processed almost independently • Minimal information is shared between blocks • Results • Lower interconnect complexity • Reduced processor complexity • Hardware results • Higher throughput • Smaller decoder area and higher area utilization • Slightly increased error rate
Standard vs. Multi-Split-Row Decoder Standard Multi-Split-Row
Multi-Split-Row Algorithm • The magnitude portion of the row processor output α is larger for the Multi-Split-Row decoder • By normalizing the α values with a scale factor S<1 the error performance of Multi-Split-Row decoder is improved Sign Magnitude S
Optimum Scale factor Multi-Split-4 Multi-Split-2 Bit Error Probability Bit Error Probability Scale Factor = 0.2 Scale Factor = 0.3 (2048,1723) RS-based LDPC code used by 10 Gbit Ethernet standard Row weight: 32 Column weight: 6 No. of iterations:15
Bit Error Rate Performance Comparison Code length: 2048 bits Message length: 1723 bits Row weight: 32 Column weight: 6 No. of iterations:15 SPA: Sum Product Algorithm [Mackay 1999] MinSum: [Fossorier 2002] WBF: Weighted Bit Flipping [Kou, Lin 2001] Improved WBF: [Fossorier 2004] BF: Bit Flipping [Gallager 1963] 0.35dB 0.25dB
Bit Error Rate Performance Comparison Code length: 5256 bits Message length: 4823 bits Row weight: 72 Column weight: 6 No. of iterations: 15 0.25 dB 0.3 dB
Optimum Scale Factors for Different Codes • Multi-split row works best for: • Regular codes • High row-weight codes • The optimum scale factor decreases as the partitioning of the H matrix increases
Outline • Introduction to LDPC Codes and Decoder Arch • Multi-Split-Row Decoding Method • Implementing Multi-Split-Row Decoders • Conclusion
Full-Parallel Decoder Implementations Standard Multi-Split-Row-2 Multi-Split-Row-4 • (2048,1723) RS-based (6,32) LDPC code
A Full-Parallel Decoder Implementation • Number of sign-passing wires is negligible compared to the total number of wires. TotalNumofWires = 2bMWr+ 2(Spn-1)M • (2048,1723) LDPC code with • N = 2048 • M (number of rows) = 384 • b (bits per symbol) = 5 • Wr = 32
Full Parallel Decoder Chips 0.18 µm CMOS Technology, 6M layer
Three Full Parallel MinSum Decoders • (6,32) (2048,1723) RS-based LDPC code • Resolution of 5 bits per message • Throughputs calculated at 15 decoding iterations • Results based on 0.18 µm CMOS, 1.8 V @ 85 C
Conclusion • Multi-Split-Row decoder method provides a significant reduction in circuit area • Results in: • Reduced wire interconnect complexity • Increased circuit area utilization • Increased speed • Simpler implementation • A good tradeoff between hardware complexity and error performance
Acknowledgments • Support • Intel Corporation • UC MICRO • NSF Grant No. 0430090 • NSF CAREER Award No. 0546907 • UCD Faculty Research Grant • Thanks • Prof. Shu Lin • Lan Lan • Eric Work • Zhiyi Yu