1 / 30

FAST MAP PROJECTION ON CUDA

FAST MAP PROJECTION ON CUDA. Yanwei Zhao Institute of Computing Technology Chinese Academy of Sciences July 29, 2011. Outline. Institute of Computing Technology, Chinese Academy of Sciences. Outline. Institute of Computing Technology, Chinese Academy of Sciences. Map Projection.

Download Presentation

FAST MAP PROJECTION ON CUDA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FAST MAP PROJECTION ON CUDA Yanwei Zhao Institute of Computing Technology Chinese Academy of Sciences July 29, 2011

  2. Outline Institute of Computing Technology, Chinese Academy of Sciences

  3. Outline Institute of Computing Technology, Chinese Academy of Sciences

  4. Map Projection • Establish the relationship between two different coordinate systems. • geographical coordinates → planar cartesian map space coordinate system • Complicated and time consuming arithmetic operations. • Fast answer with desired accuracy→ Slow exact answer • It's need to be accelerated for interactive GIS scenarios. Institute of Computing Technology, Chinese Academy of Sciences

  5. GPGPU(The general purpose computing on graphics processing units) • GPGPU is a young area of research. • Advantage of GPU • Flexibility • Power processing • Low cost • GPGPU in applications other than 3D graphics • GPU accelerates critical path of application Institute of Computing Technology, Chinese Academy of Sciences

  6. CUDA(Common Unified Device Architecture) • NVIDIA's parallel computing architecture • C base programming language and development toolkit • Advantage: • Programmer can focus on the important issues rather than an unfamiliar language • No need of graphics APIs and write efficient parallel code Institute of Computing Technology, Chinese Academy of Sciences

  7. The characteristic of Map Projection • Huge amount of coordinates to handle • The complexity of arithmetic operations • The requirement of a realtime response Institute of Computing Technology, Chinese Academy of Sciences

  8. Our proposals • using the new technology CUDA on the GPU • Take Universal Transverse Mercator (UTM) projection as an example • Performance: • Improvement of up to 6x to 8x • (include transfer time) • Speed up 70x to 90x • (not include transfer time) Institute of Computing Technology, Chinese Academy of Sciences

  9. Outline Institute of Computing Technology, Chinese Academy of Sciences

  10. Algorithm framework Striped partitioning Matrix distribution Institute of Computing Technology, Chinese Academy of Sciences

  11. Striped partitioning • Define the number of block and thread: • Block_num,Thread_num • CUDA built-in parameters: • GridDim, BlockDim • Geographic feature number: • fn • Each block runs features: • fn/GridDim.x Institute of Computing Technology, Chinese Academy of Sciences

  12. Striped partitioning • For surrounding loop: • Blocks and features • Block → Feature[i] • i = blockidx.x*(fn/GridDim.x) (1) • Block → next Feature[k] • k = i + fn/GridDim.x (2) • For inner loop: • Threads and coordinates • thread→coord[j] • j = threadIdx.x • thread→next coord[k] • k = j +Thread_num Institute of Computing Technology, Chinese Academy of Sciences

  13. Striped partitioning • For surrounding loop: • Blocks and features • Block → Feature[i] • i = blockidx.x*(fn/GridDim.x) • Block → next Feature[k] • k = i + fn/GridDim.x • For inner loop: • Threads and coordinates • thread→coord[j] • j = threadIdx.x (1) • thread→next coord[k] • k = j +Thread_num (2) Institute of Computing Technology, Chinese Academy of Sciences

  14. Matrix distribution • Define the number of block and thread: • grid(br,bc), block(tr,tc) • Each block run k features, where: • (1) • Feature[i]: • (2) • (3) Institute of Computing Technology, Chinese Academy of Sciences

  15. Matrix distribution • Each block run s coordnates, where: (1) • coord[j]: Institute of Computing Technology, Chinese Academy of Sciences

  16. Outline Institute of Computing Technology, Chinese Academy of Sciences

  17. Experiment Environment • Hardware: • CPU: Intel Core2 Duo CPU E8500 at 3.18GHz with 2GB of internal memory • GPU: NVIDIA GeForce 9800 GTX+ graphics card which has 512MB memory, 128 CUDA cores and 16 multiprocessors • Software: • Microsoft Windows XP Pro SP2 • Microsoft Visual Studio 2005 • NVIDIA driver 2.2, CUDA sdk 2.2 and CUDA toolkit 2.2 Institute of Computing Technology, Chinese Academy of Sciences

  18. The data parallel degree • total CPU time : • initialization and file reading time • serial projection time Institute of Computing Technology, Chinese Academy of Sciences

  19. The data parallel degree • total CPU time : • initialization and file reading time • serial projection time Map projection can achieve more than 90 percent of parallelism. Institute of Computing Technology, Chinese Academy of Sciences

  20. Comparing with CPU • Block_num=64 Thread_num=512 Institute of Computing Technology, Chinese Academy of Sciences

  21. Comparing with CPU • Total time = map projection time + data transfer time Institute of Computing Technology, Chinese Academy of Sciences

  22. Comparing with CPU • If consider the total time, the performance can obtain 6x to 8x. Institute of Computing Technology, Chinese Academy of Sciences

  23. Comparing with CPU • If only compare map projection time, we can obtain 70x to 90x speedups. Institute of Computing Technology, Chinese Academy of Sciences

  24. The performance of different task assignments • striped partitioning : • Block_num=64, Thread_num=512 • matrix distribution: • dim_grid(32,32) = 32*32 blocks • dim_block(256,256) = 256*256 threads Institute of Computing Technology, Chinese Academy of Sciences

  25. The performance of different task assignments • striped partitioning : • Block_num=64, Thread_num=512 • matrix distribution: • dim_grid(32,32) = 32*32 blocks • dim_block(256,256) = 256*256 threads Striped: 6x to 8x Matrix: 4x to 6x Institute of Computing Technology, Chinese Academy of Sciences

  26. The performance of different task assignments Matrix Striped Institute of Computing Technology, Chinese Academy of Sciences

  27. The performance of different task assignments Matrix Striped All threads in the block accessing consecutive memory. it can only ensure each row of threads in the block handle consecutive data Institute of Computing Technology, Chinese Academy of Sciences

  28. Outline Institute of Computing Technology, Chinese Academy of Sciences

  29. Conclusion and Future work • Implement a fast map projection method. • CUDA-enabled GPUs • high speed-up compared to the CPU-based method • the power of modern GPU is able to considerably speed up in the field of geoscience • DEM-based spatial interpolation • raster-based spatial analysis • Future work: • GPU implementation of other GIS application Institute of Computing Technology, Chinese Academy of Sciences

  30. Thank you!Q & A Yanwei Zhao Institute of Computing Technology Contact: zhaoyanwei@ict.ac.cn Institute of Computing Technology, Chinese Academy of Sciences

More Related