310 likes | 330 Views
Dive into MMX technology featuring new data types, arithmetic models, instructions, and enhanced CPUID extensions. Learn about SIMD execution models, cooperation with FPU, and further advancements like SSE extensions.
E N D
MMX Multi Media eXtensions Starting with Pentium II MMX TUC-N dr. Emil CEBUC
Outline • Overview • MMX programming environment • Data types • SIMD execution model • New arithmetic • MMX Instructions • Cooperation with FPU • Further Enhancements TUC-N dr. Emil CEBUC
Overview • Eight new 64-bit data registers, called MMX registers • Three new packed data types: • — 64-bit packed byte integers (signed and unsigned) • — 64-bit packed word integers (signed and unsigned) • — 64-bit packed doubleword integers (signed and unsigned) • Instructions that support the new data types and to handle MMX state • Management • Extensions to the CPUID instruction TUC-N dr. Emil CEBUC
MMX programming env. TUC-N dr. Emil CEBUC
MMX Registers TUC-N dr. Emil CEBUC
Data Types • 64-bit packed byte integers — eight packed bytes • 64-bit packed word integers — four packed words • 64-bit packed doubleword integers — two packed double words TUC-N dr. Emil CEBUC
SIMD Execution Model • MMX instructions move 64-bit packed data types (packed bytes, packed words, or packed double words) and the quadword data type between MMX registers and memory or between MMX registers in 64-bit blocks • However, when performing arithmetic or logical operations on the packed data types, MMX instructions operate in parallel on the individual bytes, words, or double words contained in MMX registers TUC-N dr. Emil CEBUC
SIMD Execution Model TUC-N dr. Emil CEBUC
New arithmeticWraparound • Wraparound arithmetic • With wraparound arithmetic, a true out-of-range result is truncated (that is, the carry or overflow bit is ignored and only the least significant bits of the result are returned to the destination) TUC-N dr. Emil CEBUC
New arithmeticSigned saturation • Signed saturation arithmetic • With signed saturation arithmetic, out-of range results are limited to the representable range of signed integers for the integer size being operated on TUC-N dr. Emil CEBUC
New arithmeticUnsigned saturation • Unsigned saturation arithmetic • With unsigned saturation arithmetic, out of-range results are limited to the representable range of unsigned integers for the integer size. So, positive overflow when operating on unsigned byte integers results in FFH being returned and negative overflow results in 00H being returned TUC-N dr. Emil CEBUC
New arithmeticSaturation ranges Saturation arithmetic provides an answer for many overflow situations. For example, in color calculations, saturation causes a color to remain pure black or pure white without allowing inversion TUC-N dr. Emil CEBUC
MMX Instructions • The MMX instruction set consists of 47 instructions, grouped into the following categories: • Data transfer • Arithmetic • Comparison • Conversion • Unpacking • Logical • Shift • Empty MMX state instruction (EMMS) TUC-N dr. Emil CEBUC
MMX Instruction set summary TUC-N dr. Emil CEBUC
MMX Instruction set summary TUC-N dr. Emil CEBUC
MMX Instruction set summary TUC-N dr. Emil CEBUC
PMADDWD TUC-N dr. Emil CEBUC
Cooperation with FPU • Applications can contain both x87 FPU floating-point and MMX instructions. However, because the MMX registers are aliased to the x87 FPU register stack, care must be taken when making transitions between x87 FPU instructions and MMX instructions • When an MMX instruction (other than the EMMS instruction) is executed, the processor changes the x87 FPU state as follows: • The TOS (top of stack) value of the x87 FPU status word is set to 0. • The entire x87 FPU tag word is set to the valid state (00B in all tag fields) • When an MMX instruction writes to an MMX register, it writes ones (11B) to the exponent part of the corresponding floating-point register (bits 64 through 79) TUC-N dr. Emil CEBUC
Further Enhancements • streaming SIMD extensions (SSE) were introduced into the IA-32 architecture in the Pentium III processor family • Eight 128-bit data registers (called XMM registers) in non-64-bit modes; • Sixteen XMM registers are available in 64-bit mode. • The 32-bit MXCSR register, which provides control and status bits for operations performed on XMM registers. TUC-N dr. Emil CEBUC
SSE • The 128-bit packed single-precision floating-point data type (four IEEE single precision floating-point values packed into a double quadword). • Instructions that perform SIMD operations on single-precision floating-point values and that extend SIMD operations that can be performed on integers: • 128-bit Packed and scalar single-precision floating-point instructions that operate on data located in MMX registers • 64-bit SIMD integer instructions that support additional operations on packed integer operands located in MMX registers • instructions that save and restore the state of the MXCSR register TUC-N dr. Emil CEBUC
SSE2Pentium 4 and Intel Xeon processors • support for packed double-precision floating-point values and for 128-bit packed integers. • Five data types: • 128-bit packed double-precision floating-point (two IEEE Standard 754 double-precision floating-point values packed into a double quadword) • 128-bit packed byte integers • 128-bit packed word integers • 128-bit packed doubleword integers • 128-bit packed quadword integers TUC-N dr. Emil CEBUC
SSE2 • flexibility is provided with instructions that operate on single (scalar) double-precision floating-point values located in the low quadword of an XMM register • greater throughput when performing SIMD operations on packed integers. • The capability is particularly useful for applications such as RSA authentication and RC5 encryption TUC-N dr. Emil CEBUC
SSE2Data types TUC-N dr. Emil CEBUC
SSE2Instructions • Packed and scalar double-precision floating-point instructions • 64-bit and 128-bit SIMD integer instructions • 128-bit extensions of SIMD integer instructions introduced with the MMX technology and the SSE extensions • Cacheability-control and instruction-ordering instructions TUC-N dr. Emil CEBUC
SSEScalar Instructions TUC-N dr. Emil CEBUC
SSE3 SSSE3 • The Pentium 4 processor supporting Hyper-Threading Technology introduces Streaming SIMD Extensions 3 (SSE3). The Intel Xeon processor 5100 series, Intel Core 2 processor families introduced Supplemental Streaming SIMD Extensions 3 (SSSE3). TUC-N dr. Emil CEBUC
Asymmetric Processing TUC-N dr. Emil CEBUC
Horizontal Processing TUC-N dr. Emil CEBUC
SSE3 Instructions • x87 FPU instruction • One instruction that improves x87 FPU floating-point to integer conversion • SIMD integer instruction • One instruction that provides a specialized 128-bit unaligned data load • SIMD floating-point instructions • Three instructions that enhance LOAD/MOVE/DUPLICATE performance • Two instructions that provide packed addition/subtraction • Four instructions that provide horizontal addition/subtraction • Thread synchronization instructions • Two instructions that improve synchronization between multi-threaded agents TUC-N dr. Emil CEBUC
SSSE3Instructions • Twelve instructions that perform horizontal addition or subtraction operations. • Six instructions that evaluate the absolute values. • Two instructions that perform multiply and add operations and speed up the evaluation of dot products. • Two instructions that accelerate packed-integer multiply operations and produce integer values with scaling. • Two instructions that perform a byte-wise, in-place shuffle according to the second shuffle control operand. • Six instructions that negate packed integers in the destination operand if the signs of the corresponding element in the source operand is less than zero. • Two instructions that align data from the composite of two operands TUC-N dr. Emil CEBUC