340 likes | 509 Views
Multi-Core Development. Kyle Anderson. Overview. History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism. History. First 4 bit microprocessor – 1971 60,000 instructions per second 2,300 transistors First 8 bit microprocessor – 1974 290,000 instructions per second
E N D
Multi-Core Development Kyle Anderson
Overview • History • Pollack’s Law • Moore’s Law • CPU • GPU • OpenCL • CUDA • Parallelism
History • First 4 bit microprocessor – 1971 • 60,000 instructions per second • 2,300 transistors • First 8 bit microprocessor – 1974 • 290,000 instructions per second • 4,500 transistors • Altair 8800 • First 32 bit microprocessor – 1985 • 275,000 transistors
History • First Pentium processor released – 1993 • 66 MHz • Pentium 4 released – 2000 • 1.5 GHz • 42,000,000 transistors • Approach 4GHz 2000 - 2005 • Core 2 Duo released – 2006 • 291,000,000 tranisitors
Pollack’s Law • Processor Performance grows with square root of area
Moore’s Law • “The Number of transistors incorporated in a chip will approximately double every 24 months.” – Gordon Moore, Intel co-founder • Smaller and smaller transistors
CPU • Sequential • Fully functioning cores • 16 cores maximum Currently • Hyperthreading • Little Latency
GPU • Higher latency • Thousands of cores • Simple calculations • Used for research
OpenCL • Multitude of Devices • Run-time compilation ensures most up to date features on device • Lock-Step
OpenCL Data Structures • Host • Device • Compute Units • Work-Group • Work-Item • Command Queue • Kernel • Context
OpenCL Types of Memory • Global • Constant • Local • Private
CUDA • NVidia's proprietary API for their GPU’s • Stands for “Compute Unified Device Architecture” • Compiles directly to hardware • Used by Adobe, Autodesk, National Instruments, Microsoft and Wolfram Mathematica • Faster than OpenCL because compiled directly on hardware and focus on a single architecture.
CUDA Function Call cudaMemcpy( dev_a, a, N * sizeof(int),cudaMemcpyHostToDevice ); cudaMemcpy( dev_b, b, N * sizeof(int),cudaMemcpyHostToDevice ); add<<<N,1>>>( dev _ a, dev _ b, dev _ c );
Types of Parallelism • SIMD • MISD • MIMD • Instruction parallelism • Task parallelism • Data parallelism
SISD • Stands for Single Instruction, Single Data • Does not use multiple cores
SIMD • Stands for “Single Instruction, Multiple Data Streams” • Can be process multiple data streams concurrently
MISD • Stands for “Multiple Instruction, Single Data” • Risky because several instructions are processing the same data
MIMD • Stands for “Multiple Instruction, Multiple Data” • Instructions are processed sequentially
Instruction Parallelism • Mutually exclusive • MIMD and MISD often use this • Allows multiple instructions to be run at once • Instructions considered operations • Not programmatically done • Hardware • Compiler
Task Parallelism • Dividing up of main tasks or controls • Runs multiple threads concurrently
Data Parallelism • Used by SIMD and MIMD • A list of instructions is able to work concurrently on a several data sets