Approximate Computing on FPGA using Neural Acceleration

Presented By: Mikkel Nielsen, NirvedhMeshram, Shashank Gupta, Kenneth Siu Approximate Computing on FPGA using Neural Acceleration

Approximate Computing • Involves computations that do not need to be exact (tolerance to quality degradation) • Neural Network’s (NN) speed can be exploited • Optimization (performance and energy efficiency) in favor of accuracy • Implement a NN accelerator that interacts with the CPU • Useful in many computer vision and image processing applications like edge detection

Motivation • To combine approaches of specialized logic (accelerator) and approximate computing for enhanced performance and energy efficiency Top Level System Design

Architecture Design of NPU Top Level Diagram of NPU

Architecture and Features • Total of 8 Processing Elements in one Processing Unit (in initial design) • Weights needed for Neural Processing loaded into the weight FIFO at time of configuration • A Scheduling Buffer is configured in configuration phase and use to generate control signals used for Input, Output, Sigmoid and Accumulator FIFO, PE input selection and Sigmoid Function • After this Inputs are loaded into input FIFO (using enqd instruction) • Inputs & Weights are 16 bit wide Fixed with 7 fractional bits. NPU supports 32 bit integers and single precision Floating Points. Input interface does required format conversion

Architecture and Features • Compute Unit: Performs Multiplication and Addition operation • State Machine: Controls &configures NPU – stall due to insufficient input/Push output to FIFO • Accumulator FIFO: Stores intermediate results when No of Inputs > No of PE • Sigmoid Function Unit: Current NPU supports tan sigmoid and linear functions • Output FIFO: Holds output of NPU

Software for Configuration • Weights can be generated through custom MATLAB code or through a compiler • A perl based compiler which expects weights and the structure of the neural network as input • The compiler will then generate a sequence of instructions which will be loaded into the NPU • These instructions will load values in the weight buffers as well as the scheduling buffer

Zedboard Implementation • Used Vivado tools to set programmable logic and generate a bitstream for gates • Implements bitstream as a First-stage boot loader by wrapping bitstream with boot files • On Zedboard boot, programmable logic is loaded with design • Driver interfaces C code with programmable logic(NPU) • Comparison between C code runs native on DigilentLinux on Zedboard to test ARM core

Zedboard Challenges • Configuring Vivado to generate bitstream. • Synthesis/Implementation Debugging/Errors • Creating appropriate wrapper so Zedboard does not crash on boot

Benchmarks • Sobel Edge Detection • Good program for approximate computing • Uses convolution of a 3x3 matrix to find edges • Took 0.4 ms for 512x512 image

AxBench Benchmarks • Using AxBench • Utilizes software NN (FANN Library) • Need a hardware NN to fully utilize efficiency • Benchmarks run both with and without NN

In Progress • Compare the performance against another Processing Unit with 16 PEs and check speedup gains • Build an NPU with 2 Processing units with 8 PEs each and again compare the performance & speedup • Modify the scheduler to remove stalls due to unavailable data • More benchmarks

References [1] H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general- purpose approximate programs. MICRO, 2012. [2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 1986,vol.1, pp. 318–362. [3] Marc de Kruijf and K. Sankaralingam, “Exploring the synergy of emerging workloads and silicon reliability trends” in SELSE, 2009. [4] Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, HadiEsmaeilzadeh, Luis Ceze, Mark Oskin, SNNAP: Approximate computing on programmable SoCs via neural acceleration. HPCA 2015: 603-14.

Approximate Computing on FPGA using Neural Acceleration

Approximate Computing on FPGA using Neural Acceleration

Presentation Transcript

Multithreaded FPGA Acceleration of DNA Sequence Mapping

FPGA Setting Using Flash

Design Tradeoffs of Approximate Analog Neural Accelerators

Neural Computing

Parallel Computing Using FPGA ( Field Programmable Gate Arrays )

FPGA based Acceleration of Linear Algebra Computations.

OCCBIO 2007 Tutorial on FPGA-Acceleration Processors

FPGA Acceleration of Gene Rearrangement Analysis

Programming Abstractions for Approximate Computing

Reconfigurable Computing - FPGA structures

Neural and Evolutionary Computing

FPGA Neural Network

Mobile Application Acceleration using Always On Overlay

History of Neural Computing

NEU Neural Computing

Computing Approximate Weighted Matchings in Parallel

FPGA and Reconfigurable Computing

AES Acceleration Via FPGA Co-Processor

FPGA-based acceleration platform for chip verification

CIS 588 Neural Computing

reconfigurable/fpga computing part 1

Approximate Frequent Itemset Mining for Streaming Data on FPGA