230 likes | 338 Views
Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs. Zoltán Nagy, Péter Szolgay. Introduction. Cellular Neural/Nonlinear Networks Universal Machine (CNN-UM) Ocean modeling Results Conclusions. Cellular Neural/Nonlinear Networks (CNN). 2 or N dimensional grid
E N D
Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs Zoltán Nagy, Péter Szolgay
Introduction • Cellular Neural/Nonlinear Networks Universal Machine (CNN-UM) • Ocean modeling • Results • Conclusions 2
Cellular Neural/Nonlinear Networks (CNN) • 2 or N dimensional grid • Locally connected • Analog processing elements • State value is continuous in time 3
uij input xij state yij output zij constant bias Aij,kl feedback template Bij,kl feed-forward template Structure of a CNN cell 4
CNN-UM implementations • Software simulation • Easy to implement • Slow, even if using processor specific instructions • Emulated digital VLSI • Specialized digital architecture • Selectable computing precision (Castle architecture: 1, 6, 12 bit) • Orders faster than the software simulation • Long design time • Analog VLSI • Huge computing power (~TeraOP/s) • Low accuracy (7-8 bit) • Noise and temperature sensitivity 5
Structure of the Falcon emulated digital CNN-UM • Mixer • Contains cell values for the next updates • Memory unit • Contains a belt of the cell array • Template memory • Arithmetic unit • Processors can be connected on a grid • Linear speedup 6
Structure of the arithmetic unit • Cell update in row wise order • Cycle time depends on template size • Fully pipelined 7
Configurable parameters • State, template and constant width between 2 to 64 bits • Number of templates • Size of the templates • Width of the cell array slice • Number of layers • Number and arrangement of the processor cores 8
The Wave equation Spatial discretization 2 layer CNN Example: Solution of a simple PDE on CNN 9
Barotropic model Baroclinic models z-coordinate model σ-coordinate model isopycnal Fine resolution models Real-time forecast Fishing industry Search and rescue Coarse resolution models Long term predictions Climate modeling Ocean models 10
Sigma coordinate model Vertical coordinate is scaled on the water column depth Second moment turbulence closure sub-model Provides vertical mixing coefficients Solution technique: Mode splitting Internal mode (3D) Vertical structure equations Implicit solution External mode (2D) Vertically integrated equations Explicit solution (Leapfrog method) The Princeton Ocean Model (POM) 11
ux, uy mass transport η free surface elevation Ω angular rotation of the Earth Θ latitude H depth of the ocean g gravitational acceleration τw, τb wind and bottom stress A lateral viscosity Governing equations of the external (2D) mode 12
Solution on CNN • Spatial discretization on a uniform grid • 3-layer CNN structure • Non-linear template required for advection term • Cannot be solved on analog VLSI CNN chips • Solvable on the modified Falcon architecture • Support of non-linearity • Specialized cell model 13
Complicated arithmetic unit Fixed-point number representation Configurable precision High level hardware description language required(e.g.Handel-C) Implementation on FPGA 15
Performance 16
Results after 72 hours Circulation pattern Elevation 18
Memory requirements of the internal (3D) equations • Extended memory hierarchy • New level stores 3 cross sectional slices from the 3D array • Large memory required (e.g. 512x512x64 sized grid, 3x512x64 elements per state variable) • Cannot be stored on-chip • Off-chip storage requires huge I/O bandwidth • Processor array should be used • The 3D array is divided between the processors • Optimal data set for on chip storage: 2048 elements per cross sectional slice (512x32x64 sized grid per processor) • Each processor located on a separate FPGA 21
Solution of the internal (3D) equations • Implicit solution • Fixed-point solution • Requires large precision to avoid rounding errors • Seems to be impractical • Floating-point solution • Requires large area (especially add/sub) • Explicit solution • Smaller timestep • Simpler arithmetic unit 22
Conclusions • Ocean modeling using emulated digital CNN is very promising • Moderate precision is required in 2D mode • 1% accuracy using 24 bits • Expected speedup (compared to an Athlon64 2GHz microprocessor) • 80 times on our RC200 prototyping board • 3700 times on the largest available FPGA 23