110 likes | 250 Views
Multimedia. New Architecture Direction. “… media processing will become the dominant force in computer architecture and microprocessor design”
E N D
New Architecture Direction • “… media processing will become the dominant force in computer architecture and microprocessor design” • “… new media-rich applications … involve significant real-time processing of continuous media streams and make heavy use of vectors of packed 8-, 16-, and 32-bit integer and f.p.” • “How Multimedia Workloads will Change Processor Design,” Diefendorff & Dubey, IEEE Computer (9/97) • Needs includes high memory bandwidth, high network bandwidth, continuous media data types, real-time response, fine-grain parallelism • Also significant focus on system bus performance • Common bridge to the memory system and I/O • Critical performance component for SMP server platforms
Multimedia Workloads • Multimedia • Video conferencing • Video authoring • Animation • Games • Algorithms • Image compression (jpeg) • Video Compression (mpeg) • 3-D graphics • encryption
Multimedia Characteristics • Real-time response • Video, audio • Continuous media data types • 8-16 bits sufficient for many applications • Data parallelism • E.g. share same operation to whole image • Vector or SIMD work well here • Coarse-grained parallelism • E.g. video encoding/decoding, audio encoding/decoding • Small loops • Most time spent in kernal • Amenable to hand-optimization • High memory bandwidth • Video, 3d graphics • Caches not large enough
Multimedia ISA Extensions • HP PA-RISC • MAX-2 • SUN SPARC • VIS • Intel x86 • MMX • MIPS • MDMX • PowerPC • Altivec
MMX • “MMX Technology Extension to the Intel Architecture” Alex Peleg and Uri Weiser, IEEE Micro, August 1996 • Goals • Improve performance of multimedia applications • Graphics, MPEG video • Image processing, speech recognition • Remain completely compatible with Intel x86 ISA • Minimize cost • Approach • Use packed data types • Exploit SIMD parallelism • Make use of existing wide data paths
Data Types and Operands • Three fixed-point integer types packed into 64 bit quad word • Packed Byte: 8 8-bit bytes • Packed Word: 4 16-bit words • Packed Doubleword: 2 32-bit words • User-controlled fixed point • Eight 64-bit GP registers (mm0-mm7) • MMX shares FPU • Can’t do FP an MMX at the same time • Random Access • Learned lesson from FP unit design.
MMX Operations • 57 MMX instructions work on all data types • Support for saturation arithmetic • Simplifies handling of underflow and overflow • Matches physical behavior • Packed operations • Addition/subtraction, multiplication, compares, shifts • Conversion operations • Pack/unpack • Performance improvement • Fewer loads and stores • Fewer arithmetic operations, but more conversion
MMX Operations A3 A2 A1 A0 Packed multiply-add To doubleword X X X X B3 B2 B1 B0 A3 X B3 A2 X B2 A1 X B1 A0 X B0 A3XB3 + A2XB2 A3XB3 + A2XB2 51 3 5 23 > > > > Packed compare Greater-than word 73 2 5 6 00…0 11…1 00…0 11…1
Using MMX • Assembly language coding • Use of libraries • E.g. IDCT, DCT, matrix multiply… • Use of C macros (“intrinsics”) • Generate optimized assembly code • Performs register allocation and instruction scheduling • MMX64 t0, t1; t0 = padd(t0, t1); • Requires intimate knowledge of MMX • Could a compiler generate MMX code?
Chroma Keying • Weatherman example • For (I = 0; I < imagesize; I++) new_image = (x[I] == blue) ? Y[I] : X[I]; • Movq mm3, mem1 ; load 8 pixels from weathermanmovq mm4, mem2 ; load 8 pixels from mapPcmpeq mm1, mm3 ; generate select mask pand mm4, mm1 ; AND map with maskpandn mm1, mm3 ; AND weatherman with inverse maskpor mm4, mm1 ; OR masked images together