180 likes | 322 Views
A Scalable Architecture for LDPC Decoding. Cocco, M.; Dielissen, J.; Heijligers, M.; Hekstra, A.; Huisken, J. Design, Automation and Test in Europe Conference and Exhibition, 2004. Proceedings , Volume: 3 , Feb. 16-20, 2004 Pages:88 - 93. Outline. Introduction Serial approach UMP algorithm
E N D
A Scalable Architecture for LDPC Decoding Cocco, M.; Dielissen, J.; Heijligers, M.; Hekstra, A.; Huisken, J. Design, Automation and Test in Europe Conference and Exhibition, 2004. Proceedings ,Volume: 3 ,Feb. 16-20, 2004 Pages:88 - 93
Outline • Introduction • Serial approach • UMP algorithm • Dataset in check nodes • Check operation • Computation skill • Memory reduction • Computation for Iteration
Introduction • High code rate (=0.9) LDPC code • K (avg.=30):Row-weight • High code rate, codeword length and High SNR • Memory reduction (1/10)
Serial Approach • Storage media application (optical or magnetic) • Relaxed delay requirement • Process from first bit node to last bit node • Memory storage for message
UMP Algorithm • "FOR 40 ITERATIONS DO" • "FOR ALL BIT NODES DO" • "FOR EACH INCOMING ARC X" • "SUM ALL INCOMING LLRs EXCEPT OVER X" • "SEND THE RESULT BACK OVER X" • "NEXT ARC" • "NEXT BIT NODE" • "FOR ALL CHECK NODES DO" • "FOR EACH INCOMING ARC X" • "TAKE THE ABS MINIMUM OF THE INCOMING • LLRs EXCEPT OVER X" • “TAKE THE XOR OF THE INCOMING LLRs EXCEPT OVER X” • "SEND THE RESULT BACK OVER X" • "NEXT ARC“ • "NEXT CHECK NODE" • "NEXT ITERATION"
UMP algorithm • Not needed knowledge of SNR of channel Robust performance • Not needed complex mathematical function (tanh x) area saving
Check Node 4 Dataset in check nodes • Minimum: Overall minimum value • One-but-minimum • Index
Check operation • Compute exclusive or of all hard bits output by connected bit nodes, except jth. • Compute the minimum of all K absolute value of LLRs of bit nodes to which the check node is connected, except jth.
Computation skill • Minimum: LLRj is not minimum, minimum=overall minimum. Otherwise, minimum=second-to-minimum
Memory reduction • Original size • Reduced size
Computation for Iteration • "FOR 40 ITERATIONS DO" • "FOR ALL BIT NODES DO" • “CALCULATE THE OUTPUT MESSAGES FROM THE 3 CONNECTED CHECK NODES“ • “DO RUNNING CHECK NODE UPDATES ON THE 3 CHECK NODES” • “NEXT BIT NODES” • "NEXT ITERATION"
Computation for Iteration NEW | OLD NEW | OLD NEW | OLD NEW | OLD
Control R/W & address Serial input Serial output Time folded architecture FSM & PC μROM Computational Kernel Prefetcher Memory
Prefetch • Every dataset is statically used for 30 consecutive cycles. • Every clock cycle an average of 2R and 2W operations are required. • Delayed writeback • Datasets caching
Tiled architecture FSM & PC μROM Computational Kernel Prefetcher Memory
Result and area distribution • N=1020 R=0.5, 57 tiles 36mm2 with 0.13μm @1GHz, 300Mb/s
Conclusion • Speedup & Simultaneously multiple access Prefetch • Reduce memory access latency Memory hierarchy • Increase performance N-tiled architecture • Modified version can be pipelined