480 likes | 725 Views
SSE2. with a focus on floating point. Supported data types. For floating point (i.e., real numbers), MASM supports: real4 single precision; IEEE standard; analogous to float real8 double precision; IEEE standard; analogous to double real10 double extended precision Not IEEE standard
E N D
SSE2 with a focus on floating point
Supported data types • For floating point (i.e., real numbers), MASM supports: • real4 • single precision; IEEE standard; analogous to float • real8 • double precision; IEEE standard; analogous to double • real10 • double extended precision • Not IEEE standard • NaN = Not a Number (see p. 4-14 of v1)
IEEE Standard 754 • SSE2 supports 32 and 64 bit f.p. data • x87 supports 32, 64, and 80 bit f.p. data
Note: These are 24-bit binary numbers. Here they are in base 10: 2.00000000000000 1.99999988079071
SSE2 • SSE2 = Streaming SIMD Extensions 2 • SIMD = Single Instruction Multiple Data instructions • SSE2 introduced in 2000 on Pentium 4 and Intel Xeon processors.
History of SSE • 1996 Intel MMX • 1998 AMD 3DNow! • 1999 Intel SSE on P3 • 2001 Intel SSE2 on P4 • 2003 Intel SSE3 (since Prescott P4) • 2006 Intel SupplementalSSE3 (since Woodcrest Xeons) • 2006 Intel SSE4 (4.1 and 4.2) • 2007 AMD SSE5 (proposed 2007, implemented 2011) • 2008 Intel AVX (proposed 2008, implemented 2011 in Intel Westmere and AMD Bulldozer) • XMM registers go from 128 bit to 256 bit, called YMM.
SSE2 and MASM • You must use MASM v6.15 or newer for SIMD support. (MASM v6.15 is available from the course software web page.) • You must enable MASM support for these instructions with the following: .686 ;instructions for Pentium Pro (or better) .xmm ;allow simd instructions .model flat, stdcall ;no crazy segments!
SSE2 • Each one of the 8 128-bit registers (xmm0...xmm7) can hold: • 16 packed 1 byte integers • 8 packed word (2 byte) integers • 4 packed doubleword (4 byte) integers • 2 packed quadword (8 byte) integers • 1 double quadword (16 byte) • 4 packed single precision (4 bytes each) floating point values • 2 packed double precision (8 bytes each) floating point values
IA32 Registers: • 8 32-bit GPRs • Integer only • 8 80-bit fp regs • Floating point only • 8 64-bit mmx regs • Integer only • Re-uses fp regs • 8 128-bit xmm regs • Integer and fp
IA32 Registers: • 8 32-bit GPRs • Integer only • 8 80-bit fp regs • Floating point only • 8 64-bit mmx regs • Integer only • Re-uses fp regs • 8 128-bit xmm regs • Integer and fp
IA32 Registers: • 8 32-bit GPRs • Integer only • 8 80-bit fp regs • Floating point only • 8 64-bit mmx regs • Integer only • Re-uses fp regs • 8 128-bit xmm regs • Integer and fp
IA32 Registers: • 8 32-bit GPRs • Integer only • 8 80-bit fp regs • Floating point only • 8 64-bit mmx regs • Integer only • Re-uses fp regs • 8 128-bit xmm regs • Integer and fp • These will be the focus of our discussion.
XMM register formats
Using the SSE2 registers • The utilities.asm MASM code (on the course’s software web page) contains a function that you can call to display the contents of the 8 xmm registers (dump) as pairs of 64 bit double precision fp values. call dumpXmm64
Sample SSE2 instructions • Data movement • Arithmetic • Comparison • Conversion
Sample SSE2 instructions • Data movement • Arithmetic • Comparison • Conversion
SSE2 data movement instructions • movhpd • Move High Packed Double-Precision Floating-Point Value • movlpd • Move Low Packed Double-Precision Floating-Point Value • movsd • Move Scalar Double-Precision Floating-Point Value
SSE2 data movement instructions • movhpd - Move High Packed Double-Precision Floating-Point Value • for memory to XMM move: • DEST[127-64] ← SRC; DEST[63-0] unchanged • Ex. movhpd xmm0, m64 • for XMM to memory move: • DEST ← SRC[127-64] • Ex. movhpd m64, xmm2
SSE2 data movement instructions • movlpd - Move Low Packed Double-Precision Floating-Point Value • for memory to XMM move: • DEST[127-64] unchanged; DEST[63-0] ← SRC • Ex. movlpd xmm1, m64 • for XMM to memory move: • DEST ← SRC[63-0] • Ex. movlpd m64, xmm2
SSE2 data movement instructions • movsd - Move Scalar Double-Precision Floating-Point Value • when source and destination operands are both XMM registers: • DEST[127-64] remains unchanged; DEST[63-0] ← SRC[63-0] • Ex. movsd xmm1, xmm3 • when source operand is XMM register and destination operand is memory location: • DEST ← SRC[63-0] • Ex. movsd m64, xmm2 • when source operand is memory location and destination operand is XMM register: • DEST[127-64] ← 0000000000000000H; DEST[63-0] ← SRC • Ex. movsd xmm1, m64
Sample SSE2 instructions • Data movement • Arithmetic (scalar) • Comparison • Conversion
SSE2 scalar arithmetic instructions • addsd - Add Scalar Double-Precision Floating-Point Values • subsd - Subtract Scalar Double-Precision Floating-Point Values • mulsd - Multiply Scalar Double-Precision Floating-Point Values • divsd - Divide Scalar Double-Precision Floating-Point Values • Also sqrtsd but no sin or cos SSE2 instructions! We have to use the x87 instructions for that!
SSE2 scalar arithmetic instructions • addsd • DEST[63-0] ← DEST[63-0] + SRC[63-0] • DEST[127-64] remains unchanged
SSE2 scalar arithmetic instructions • subsd • DEST[63-0] ← DEST[63-0] − SRC[63-0] • DEST[127-64] remains unchanged
SSE2 scalar arithmetic instructions • mulsd • DEST[63-0] ← DEST[63-0] * xmm2/m64[63-0] • DEST[127-64] remains unchanged
SSE2 scalar arithmetic instructions • divsd • DEST[63-0] ← DEST[63-0] / SRC[63-0] • DEST[127-64] remains unchanged
Sample SSE2 instructions • Data movement • Arithmetic (packed) • Comparison • Conversion
SSE2 packed arithmetic instructions • addpd - Add Packed Double-Precision Floating-Point Values • subpd - Subtract Packed Double-Precision Floating-Point Values • mulpd - Multiply Packed Double-Precision Floating-Point Values • divpd - Divide Packed Double-Precision Floating-Point Values
SSE2 packed arithmetic instructions • addpd - Add Packed Double-Precision Floating-Point Values • DEST[63-0] ← DEST[63-0] + SRC[63-0] • DEST[127-64] ← DEST[127-64] + SRC[127-64]
SSE2 packed arithmetic instructions • subpd - Subtract Packed Double-Precision Floating-Point Values • DEST[63-0] ← DEST[63-0] / (SRC[63-0]) • DEST[127-64] ← DEST[127-64] / (SRC[127-64])
SSE2 packed arithmetic instructions • mulpd - Multiply Packed Double-Precision Floating-Point Values • DEST[63-0] ← DEST[63-0] / (SRC[63-0]) • DEST[127-64] ← DEST[127-64] / (SRC[127-64])
SSE2 packed arithmetic instructions • divpd - Divide Packed Double-Precision Floating-Point Values • DEST[63-0] ← DEST[63-0] / (SRC[63-0]) • DEST[127-64] ← DEST[127-64] / (SRC[127-64])
Sample SSE2 instructions • Data movement • Arithmetic • Comparison • Conversion
SSE2 compare instruction • comisd • Compare Scalar Ordered Double-Precision Floating-Point Values and Set EFLAGS
Sample SSE2 instructions • Data movement • Arithmetic • Comparison • Conversion
SSE2 conversion instructions • cvtsd2si • Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer • cvtsi2sd • Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value
SSE2 conversion instructions • cvtsd2si • Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer • DEST[31-0] ← Convert_Double_Precision_Floating_Point_To_Integer(SRC[63-0])
SSE2 conversion instructions • cvtsi2sd • Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value • DEST[63-0] ← Convert_Integer_To_Double_Precision_Floating_Point(SRC[31-0]) • DEST[127-64] remains unchanged