Intel's current main floating point unit is called SSE. This includes:
Here's a typical use. For a function that returns "float", the compiler will expect the answer in xmm0:
movss xmm0,[a] ; load from memory addss xmm0,xmm0 ; add to itself (double it) ret ; Done with function section .data a: dd 1.234
You can also call functions taking floats, like "print_float", which takes one argument in xmm0. Annoyingly, functions that take floats will crash if the stack is not aligned to a multiple of 16 bytes, so you need to allocate enough stack space to make that happen--in this case, main pushed an 8 byte return address, so we need to push 8 more bytes to get to a multiple of 16 bytes.
movss xmm0,[a] ; load from memory addss xmm0,xmm0 ; add to itself (double it) sub rsp,8 ; align stack for print_float extern print_float call print_float add rsp,8 ; Clean up stack ret ; Done with function section .data a: dd 1.234
Because all the xmm registers are trashable, not preserved, you can't count on your value still being there after *any* function call. This means you need to save the value to the stack. The instruction "movaps" saves the whole xmm0 register to 16 bytes of memory, but careful! "movaps" will crash if the address you pass isn't a multiple of 16 bytes, an "aligned" address.
movss xmm0,[a] ; load from memory addss xmm0,xmm0 ; add to itself (double it) sub rsp,8+16 ; align stack, and leave space for xmm0 movaps [rsp],xmm0 ; save our xmm0 extern print_float call print_float movaps xmm0,[rsp] ; restore our xmm0 add rsp,8+16 ; Clean up stack ret ; Done with function section .data a: dd 1.234
You can also print floating point info by storing it to memory, and calling a memory-accessing function like farray_print:
movss xmm0,[a] ; load from memory addss xmm0,xmm0 ; add to itself (double it) movss [a],xmm0 ; store back to memory mov rdi,a; address of our float mov rsi,1; number of floats to print sub rsp,8 ; align stack for farray_print extern farray_print call farray_print add rsp,8 ; Clean up stack ret ; Done with function section .data a: dd 1.234
The full list of single-float instructions is below. There are also double precision instructions, ending in "sd", and some very interesting parallel instructions (we'll talk about these after the midterm).
Instruction | Comments | |
Arithmetic | addss | sub, mul, div all work the same way |
Compare | minss | max works the same way |
Sqrt | sqrtss | Square root (sqrt), reciprocal (rcp), and reciprocal-square-root (rsqrt) all work the same way |
Move | movss | Copy DWORD sized data to and from memory. |
Convert | cvtss2si cvttss2si |
Convert to ("2", get it?) Single Integer (si, stored in register like eax). "cvtt" versions do truncation (round toward zero, like C++ default); "cvt" versions round to nearest. |
Compare to flags | ucomiss | Sets CPU flags like normal x86 "cmp" instruction, but from SSE registers. Use with "jb", "jbe", "je", "jae", or "ja" for normal comparisons (but not jl, jle, jg, or jge, for some reason). Sets "pf", the parity flag, if either input is a NaN. |
Here's an example of using the instruction cvtss2si to convert to
integer:
movss xmm3,[pi]; load up constant
addss xmm3,xmm3 ; add pi to itself
cvtss2si eax,xmm3 ; round to integer
ret
section .data
pi: dd 3.14159265358979 ; constant
Here we're using ucomiss to compare two floats:
movss xmm3,[a]
ucomiss xmm3,[b]
jbe wejumped
mov eax, 1
ret
wejumped:
mov eax,3
ret
a: dd 1.23
b: dd 1.27
Note that above we're using data
declared with "dd" (data DWORD) and instructions ending in "ss"
(scalar single-precision float), which corresponds to the C/C++
type "float" (4 bytes).
On 64-bit machines, the xmm
registers are always available, and are the standard way to pass
and return floating point values. CAUTION: On
32-bit machines, xmm is not used to pass or return floating point
values (the old
"floating point register stack" is used on 32 bit machines),
and on really old machines (pre-Pentium) xmm may not even be
available.