220 likes | 499 Views
The Fast Evaluation of Hidden Markov Models on GPU. Presented by Ting-Yu Mu & Wooyoung Lee. Introduction. Hidden Markov Model (HMM): A statistical method (Probability based) Used in a wide range of applications: Speech recognition Computer vision Medical image analysis
E N D
The Fast Evaluation of Hidden Markov Models on GPU Presented by Ting-Yu Mu & Wooyoung Lee
Introduction • Hidden Markov Model (HMM): • A statistical method (Probability based) • Used in a wide range of applications: • Speech recognition • Computer vision • Medical image analysis • One of the problems need to be solved: • Evaluate the probability of an observation sequence on a given HMM (Evaluation) • The solution of above is the key to choose the best matched models among the HMMs
Introduction – Example • Application: Speech Recognition • Goal: Recognize words one by one • Input: • The speech signal of a given word • Represented as a time sequence of coded spectral vectors • Output: • The observation sequence • Represented as an index indicator in the spectral codebook
Introduction – Example • The tasks: • Design individual HMMs for each word of vocabulary • Perform unknown word recognition: • Using the solution of evaluation problem to score each HMM based on the observation sequence of the word • The model scores the highest is selected as the result • The accuracy: • Based on the correctness of the chosen result
Computational Load • Computational load of previous example consists of two parts: • Estimate the parameters of HMMs to build the models, and the load varies upon each HMM • Executed only one time • Evaluate the probability of an observation sequence on each HMM • Executed many times on recognition process • The performance is depend on the complexity of the evaluation algorithm
Efficient Algorithm • The lower order of complexity: • Forward-backward algorithm • Consists of two passes: • Forward probability • Backward probability • Used extensively • Computational intensive • One way to increase the performance: • Design the parallel algorithm • Utilizing the present day’s multi-core systems
General Purpose GPU • Why choose Graphic Processing Unit • Rapid increases in the performance • Supports floating-points operations • Fast computational power/memory bandwidth GPU is specialized for compute-intensive and highly parallel computation More transistors are devoted to data processing rather that data caching
CUDA Programming Model • The GPU is seen as a compute device to execute part of the application that: • Has to be executed multiple times • Can be isolated as a function • Works independently on different data • Such a function can be compiled to run on the device. The resulting program is called a Kernel • The batch of threads that executes a kernel is organized as a grid of blocks
CUDA Programming Model • Thread Block: • Contains the batch of threads that can be cooperate together: • Fast shared memory • Synchronizable • Thread ID • The block can be one-, two-, or three- dimensional arrays
CUDA Programming Model • Grid of Thread Block: • Contains the limited number of threads in a block • Allows larger numbers of thread to execute the same kernel with one invocation • Blocks identifiable through block ID • Leads to a reduction in thread cooperation • Blocks can be one- or two-dimensional arrays
Parallel Algorithm on GPU • The tasks of computing the evaluation probability is split into pieces and delivered to several threads • A thread block evaluates a Markov model • Calculating the dimension of the grid: • Obtained by dividing the number of states N by the block size • Forward probability is computed by a thread within a thread block • Needs to synchronize the threads due to: • Shared data
What is CUDAfy.Net? • Made by Hybrid DSP Systems in Netherlands • a set of libraries and tools that permit general purpose programming of NVIDIA CUDA GPUs from the Microsoft .NET framework. • combining flexibility, performance and ease of use • First release: March 17, 2011
Cudafy.NET SDK • Cudafy .NET Library • Cudafy Translator (Convert .NET code to CUDA C) • Cudafy Library (CUDA support for .NET) • Cudafy Host (Host GPU wrapper) • Cudafy Math (FFT + BLAS) • The translator converts .NET code into CUDA code. Based on ILSPY (Open Source .NET assembly browser and decompiler)
GENERAL CUDAFY PROCESS • Two main components to the Cudafy SDK: • Translation from .NET to CUDA C and compiling using NVIDIA compiler (this results in a Cudafy module xml file) • Loading Cudafy modules and communicating with GPU from host • It is not necessary for the target machine to perform the first step above. • 1. Add reference to Cudafy.NET.dll from your .NET project • 2. Add the Cudafy, Cudafy.Host and Cudafy.Translator namespaces to source files (using in C#) • 3. Add a parameter of GThread type to GPU functions and use it to access thread, block and grid information as well as specialist synchronization and local shared memory features. • 4. Place a Cudafy attribute on the functions. • 5. In your host code before using the GPU functions call Cudafy.Translator.Cudafy( ). This returns a CudafyModule instance. • 6. Load the module into a GPGPU instance. The GPGPU type allows you to interact seamlessly with the GPU from your .NET code.
Development Requirement • NVIDIA CUDA Toolkit 4.1 • Visual Studio 2010 • Microsoft VC++ Compiler (used by NVIDIA CUDA Compiler) • Windows( XP SP3, VISTA, 7 32bit/64bit) • NVIDIA GPUs • NVIDIA Graphic Driver
Reference • ILSpy : http://wiki.sharpdevelop.net/ilspy.ashx • Cudafy.NET : http://cudafy.codeplex.com/ • Using Cudafy for GPGPU Programming in .NET : http://www.codeproject.com/Articles/202792/Using-Cudafy-for-GPGPU-Programming-in-NET • Base64 Encoding on a GPU: http://www.codeproject.com/Articles/276993/Base64-Encoding-on-a-GPU • High Performance Queries: GPU vs LINQ vs PLINQ : http://www.codeproject.com/Articles/289551/High-Performance-Queries-GPU-vs-PLINQ-vs-LINQ