DNA Gene Identification

DNA Gene Identification Speed, Accurate and Efficient way to identify the DNA

AGENDA DNA Overview. Sequence Alignment. Problem & Previous Solutions. GPU & CUDA. Implemented Solution. GUI (Ribbon). Results.

DNA(Deoxyribo Nucleic Acid) • Describing the genetic information for cell growth, division and functions. • Diagnoses the case of an organism or a human, for example: - check if he has certain disease such as cancer or not . • feature of the human body. -Such as ( height, eye color, the shape of the nose, hair, skin color , gender,……. ).

DNA Structure • Chromosomes • Genes • Nucleotide • bases • Adenine (A) • Guanine (G) • Cytosine (C) • Thymine (T).

Genes structure

Fasta format • FASTA format is a text-based format used to represent any type of sequences as DNA .

Specifications of Fasta format • There should be no space between the ">" and the first letter of the identifier. • It is recommended that all lines of text be shorter than 80 characters but this is not satisfied all the time; may be 80 or 120. • It is recommended that all lines of text be shorter than 80 characters but this is not satisfied all the time; may be 80 or 120. • It is commonly used because it is very simple. Another format for the description

BIOLOGICAL FACT • biological sequences develop from preexisting sequences instead of being invented by nature from the beginning. • Three types of changes can occur at any given position within a sequence: • Point mutations. • Insertion. • Deletions. • Two identical characters produces a match, Two different nonblank characters produces a mismatch, and a blank is called an indel (insertion/deletion) or gap.

SEQUENCE ALIGNMENT TYPES • Global Sequence Alignment • Needleman-Wunsch Algorithm • Local Sequence Alignment • Smith-Waterman Algorithm

PROBLEM • The computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. The algorithm has a complexity of O(NxM) • Previous solutions: • FPGA: • High cost. • Not suitable for all users • Approximated algorithms: • Less accurate • Current Solution: Parallelization on Graphics Cards.

GPU( Graphics Processing Unit) • GPU is viewed as a compute device operating as a coprocessor to the main CPU (host). • CPU and GPU are separate devices with separate memory.

CUDACompute Unified Device Architecture • CUDA is NVidia's scalable parallel programming model and a software environment for parallel computing. • Language: CUDA C, minor extension to C/C++. • A heterogeneous serial-parallel programming model.

CUDA • CUDA program = serial code + parallel kernels (all in CUDA C). -Serial C code executes in a host thread (CPU thread). - Parallel kernel code executes in many device threads (GPU threads).

CUDA ARCHITECTURE • Blocks and grids may be 1d, 2d, or 3d. • gridDim, blockIdx, blockDim, threadIdx. • Threads/blocks have unique IDs.

CUDA Kernels • A kernel is a function executed on the CUDA device. • Threads are grouped into warps of 32 threads. -Warps are grouped into thread blocks. -Thread blocks are grouped into grids. • Each kernel has access to certain variables that define its position. -threadIdx.x. - blockIdx.x. -gridDim.x,blockDim.x.

Kernel Call Syntax • Kernels are called with the <<<>>> syntax. • Function name<<<Dg, Db>>>(arg[1],arg[2],…). Where: Dg = dimensions of the grid (type dim3). Db = dimensions of the block (type dim3).

Function Type Qualifiers • The kernel was defined as __global__. • This specifies that the function runs on the device and is callable from the host only. • __device__ and __host__ are other available qualifiers. __device__ - executed on device, callable only from device. __host__ - default if not specified. Executed on host, callable from host only.

CUDA PROGARMING Basic steps • Transfer data from CPU to GPU. • Explicitly call the GPU kernel designed -CUDA will implicitly assign threads to each multiprocessor and assign resources for computations. • Transfer results back from GPU to CPU.

GPU( Graphics Processing Unit) • GPU is viewed as a compute device operating as a coprocessor to the main CPU (host). • CPU and GPU are separate devices with separate memory.

CUDACompute Unified Device Architecture • CUDA is NVidia's scalable parallel programming model and a software environment for parallel computing. • Language: CUDA C, minor extension to C/C++. • A heterogeneous serial-parallel programming model.

CUDA • CUDA program = serial code + parallel kernels (all in CUDA C). -Serial C code executes in a host thread (CPU thread). - Parallel kernel code executes in many device threads (GPU threads).

CUDA ARCHITECTURE • Blocks and grids may be 1d, 2d, or 3d. • gridDim, blockIdx, blockDim, threadIdx. • Each kernel has access to certain variables that define its position. • -threadIdx.x. • - blockIdx.x. • -gridDim.x,blockDim.x.

CUDA Kernels • A kernel is a function executed on the CUDA device. • Threads are grouped into warps of 32 threads. -Warps are grouped into thread blocks. -Thread blocks are grouped into grids.

Kernel Call Syntax • Kernels are called with the <<<>>> syntax. • <<<Dg, Db >>>. Where: Dg = dimensions of the grid (type dim3). Db = dimensions of the block (type dim3).

Function Type Qualifiers • The kernel was defined as __global__. • This specifies that the function runs on the device and is callable from the host only. • __device__ and __host__ are other available qualifiers. __device__ - executed on device, callable only from device. __host__ - default if not specified. Executed on host, callable from host only.

PARALLELIZATION • The sequence alignment algorithm consumes large amount of time For processing. • parallelization capabilities found in the GPUs. • Parallelization=Performance Two levels of polarization • level 1: Paralleling the Database comparison --Assume 14 sequences in the database

PARALLELIZATION • Parallelization inside single sequence comparing. • Initializing the data matrix and pointers

PARALLELIZATION

PARALLELIZATION • data dependency in the calculation steps d

PARALLELIZATION

Implementation of this paralleling part

Ribbon UI

Performance

Speed Up

THANKS Any Questions ??

DNA Gene Identification