70 likes | 291 Views
CUDA. Introduction. 4096 x 4096 double byte matrix 128MB matrix CPU Single thread GPU CUDA Tooltik version : 4.0 CUDA Capability version : 2.0. CPU. Overhead time = input data read + output data write + memory alloc and free + etc. GPU.
E N D
Introduction • 4096 x 4096 double byte matrix • 128MB matrix • CPU • Single thread • GPU • CUDA Tooltik version : 4.0 • CUDA Capability version : 2.0
CPU • Overhead time = input data read + output data write + memory alloc and free + etc.
GPU • GPU rumtime = “Host ⇔ device” memory transport + GPU operation + GPU device memroy alloc and free + GPU device synchronize + etc. • Overhead time = input date read + output date write + memory alloc and free + etc.
GPU • GPU rumtime = “Host ⇔ device” memory transport + GPU operation + GPU device memroyalloc and free + GPU device synchronize + etc. • Overhead time = input data read + output data write + memory alloc and free + etc.
GPU • GPU rumtime = “Host ⇔ device” memory transport + GPU operation + GPU device memroyalloc and free + GPU device synchronize + etc. • Overhead time = input data read + output data write + memory alloc and free + etc.
GPU • GPU rumtime = “Host ⇔ device” memory transport + GPU operation + GPU device memroyalloc and free + GPU device synchronize + etc. • Overhead time = input data read + output data write + memory alloc and free + etc.