210 likes | 339 Views
Accelerating Belief Propagation in Hardware. Skand Hurkat and José Martínez Computer Systems Laboratory Cornell University http ://www.csl.cornell.edu /. The Cornell Team. Prof. José Martínez (PI), Prof. Rajit Manohar @ Computer Systems Lab
E N D
Accelerating Belief Propagation in Hardware SkandHurkat and José Martínez Computer Systems Laboratory Cornell University http://www.csl.cornell.edu/
The Cornell Team • Prof. José Martínez (PI), Prof. RajitManohar@ Computer Systems Lab • Prof. Tsuhan Chen@ Advanced Multimedia Processing Lab • MS/Ph.D. students • Yuan Tian, MS ’13 • SkandHurkat • Xiaodong Wang
The Cornell Project Inference Algorithm Graph • Provide hardware accelerators for belief propagation algorithms on embedded SoCs(retail/car/home/mobile) • High speed • Very low power • Self-optimizing • Highly programmable BP Accelerator within SoC Result
What is belief propagation? Belief propagation is a message passing algorithm for performing inference on graphical models, such as Bayesian networks or Markov Random Fields
What is belief propagation? • Labelling problem • Energy as a measure of convergence • Minimize energy (MAP label estimation) • Exact results for trees • Converges in exactly two iterations • Approximate results for graphs with loops • Yields “good” results in practice • Minimum over large neighbourhoods • Close to optimal solution
Not all “that” alien to embedded Remember the Viterbi algorithm? • Used extensively in digital communications
What does this mean? • Every mobile device uses Viterbi decoders • Error correction codes (eg: turbo codes) • Mitigating inter-symbol interference (ISI) • Increasing number of mobile applications involve belief propagation • More general belief propagation accelerators can greatly improve user experience with mobile devices
Target markets Retail/Car/Home/Mobile • Image processing • De-noising • Segmentation • Object detection • Gesture recognition • Handwriting recognition • Improved recognition through context identification • Speech recognition • Hidden Markov models are key to speech recognition Servers • Data mining tasks • Part-of-speech tagging • Information retrieval • “Knowledge graph” like applications • Machine learning based tasks • Constructive machine learning • Recommendation systems • Scientific computing • Protein structure inference
Hardware accelerator for BP Inference Algorithm Graph BP Accelerator within SoC Result
Work done so far Software • General purpose MRF inference library • Support for arbitrary graphs • Floating point math • Parallel techniques for faster inference • Library optimized for grid graphs • Optimized data structures • Template can use any data type • Multiple inference techniques optimized for early vision • Stereo matching in 200 ms Hardware • High level synthesis of message update unit • Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard • ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference • Fixed point math • GraphGen collaboration • On-going work • Stereo matching task mapped to multiple platforms • 10x speedup on GPU w.r.t. CPU only implementation
Work done so far Software • General purpose MRF inference library • Support for arbitrary graphs • Floating point math • Parallel techniques for faster inference • Library optimized for grid graphs • Optimized data structures • Template can use any data type • Multiple inference techniques optimized for early vision • Stereo matching in 200 ms Hardware • High level synthesis of message update unit • Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard • ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference • Fixed point math • GraphGen collaboration • On-going work • Stereo matching task mapped to multiple platforms • 10x speedup on GPU w.r.t. CPU only implementation
Work done so far Software • General purpose MRF inference library • Support for arbitrary graphs • Floating point math • Parallel techniques for faster inference • Library optimized for grid graphs • Optimized data structures • Template can use any data type • Multiple inference techniques optimized for early vision • Stereo matching in 200 ms Hardware • High level synthesis of message update unit • Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard • ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference • Fixed point math • GraphGen collaboration • On-going work • Stereo matching task mapped to multiple platforms • 10x speedup on GPU w.r.t. CPU only implementation
Work done so far Software • General purpose MRF inference library • Support for arbitrary graphs • Floating point math • Parallel techniques for faster inference • Library optimized for grid graphs • Optimized data structures • Template can use any data type • Multiple inference techniques optimized for early vision • Stereo matching in 200 ms Hardware • High level synthesis of message update unit • Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard • ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference • Fixed point math • GraphGen collaboration • On-going work • Stereo matching task mapped to multiple platforms • 10x speedup on GPU w.r.t. CPU only implementation
Work done so far Software • General purpose MRF inference library • Support for arbitrary graphs • Floating point math • Parallel techniques for faster inference • Library optimized for grid graphs • Optimized data structures • Template can use any data type • Multiple inference techniques optimized for early vision • Stereo matching in 200 ms Hardware • High level synthesis of message update unit • Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard • ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference • Fixed point math • GraphGen collaboration • On-going work • Stereo matching task mapped to multiple platforms • 10x speedup on GPU w.r.t. CPU only implementation
GraphGen synthesis of BP-M • BP-M update (logspace messages) implemented using GraphGen (Intel/CMU/UW) • GPU implementation 10x faster than CPU based implementation • On-going work on FPGA based implementation and on implementing hierarchical update
Cornell Publications (2013 only) • 3x Comp. Vision & Pattern Recognition (CVPR) • 3x Asynchronous VLSI (ASYNC) • 2x Intl. Symp. Computer Architecture (ISCA) • 1x Intl. Conf. Image Processing (ICIP) • 1x ASPLOS (w/ GraphGen folks, under review)
Year 3 Plans • GraphGen extensions for BP applications • Multiple inference techniques • Extraction of “BP ISA” • Ops on arbitrary graphs • Efficient representation • Amplification work on UAV ensembles • Self-optimizing, collaborative SoCs • One-day “graph” workshop with GraphGen+UIUC
Accelerating Belief Propagation in Hardware SkandHurkat and José Martínez Computer Systems Laboratory Cornell University http://www.csl.cornell.edu/