Floating Point in Assembly

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

Intel's current main floating point unit is called SSE. This includes:

A brand new set of registers, xmm0 through xmm15. xmm0 is used to return values from functions, and as the first function argument. They're *all* scratch registers; a perfectly anarchist setup. Save things to memory before calling any functions, because everything can get trashed!
A brand new set of instructions, like "movss" and "addss".

Here's a typical use. For a function that returns "float", the compiler will expect the answer in xmm0:

movss xmm0,[a] ; load from memory
addss xmm0,xmm0 ; add to itself (double it)

ret ; Done with function

section .data
a: dd 1.234

(Try this in NetRun now!)

You can also call functions taking floats, like "print_float", which takes one argument in xmm0. Annoyingly, functions that take floats will crash if the stack is not aligned to a multiple of 16 bytes, so you need to allocate enough stack space to make that happen--in this case, main pushed an 8 byte return address, so we need to push 8 more bytes to get to a multiple of 16 bytes.

movss xmm0,[a] ; load from memory
addss xmm0,xmm0 ; add to itself (double it)

sub rsp,8 ; align stack for print_float
extern print_float
call print_float
add rsp,8 ; Clean up stack

ret ; Done with function

section .data
a: dd 1.234

(Try this in NetRun now!)

Because all the xmm registers are trashable, not preserved, you can't count on your value still being there after *any* function call. This means you need to save the value to the stack. The instruction "movaps" saves the whole xmm0 register to 16 bytes of memory, but careful! "movaps" will crash if the address you pass isn't a multiple of 16 bytes, an "aligned" address.

movss xmm0,[a] ; load from memory
addss xmm0,xmm0 ; add to itself (double it)

sub rsp,8+16 ; align stack, and leave space for xmm0
movaps [rsp],xmm0 ; save our xmm0
extern print_float
call print_float
movaps xmm0,[rsp] ; restore our xmm0
add rsp,8+16 ; Clean up stack

ret ; Done with function

section .data
a: dd 1.234

(Try this in NetRun now!)

You can also print floating point info by storing it to memory, and calling a memory-accessing function like farray_print:

movss xmm0,[a] ; load from memory
addss xmm0,xmm0 ; add to itself (double it)
movss [a],xmm0 ; store back to memory

mov rdi,a; address of our float
mov rsi,1; number of floats to print
sub rsp,8 ; align stack for farray_print
extern farray_print
call farray_print
add rsp,8    ; Clean up stack

ret ; Done with function

section .data
a: dd 1.234

(Try this in NetRun now!)

The full list of single-float instructions is below. There are also double precision instructions, ending in "sd", and some very interesting parallel instructions (we'll talk about these after the midterm).

	Instruction	Comments
Arithmetic	addss	sub, mul, div all work the same way
Compare	minss	max works the same way
Sqrt	sqrtss	Square root (sqrt), reciprocal (rcp), and reciprocal-square-root (rsqrt) all work the same way
Move	movss	Copy DWORD sized data to and from memory.
Convert	cvtss2si cvttss2si	Convert to ("2", get it?) Single Integer (si, stored in register like eax). "cvtt" versions do truncation (round toward zero, like C++ default); "cvt" versions round to nearest.
Compare to flags	ucomiss	Sets CPU flags like normal x86 "cmp" instruction, but from SSE registers. Use with "jb", "jbe", "je", "jae", or "ja" for normal comparisons (but not jl, jle, jg, or jge, for some reason). Sets "pf", the parity flag, if either input is a NaN.

Here's an example of using the instruction cvtss2si to convert to integer:

movss xmm3,[pi]; load up constant
addss xmm3,xmm3 ; add pi to itself
cvtss2si eax,xmm3 ; round to integer
ret
section .data
pi: dd 3.14159265358979 ; constant

(Try this in NetRun now!)

Here we're using ucomiss to compare two floats:

movss xmm3,[a]
ucomiss xmm3,[b]
jbe wejumped
mov eax, 1
ret

wejumped:
mov eax,3
ret

a: dd 1.23
b: dd 1.27

(Try this in NetRun now!)

Note that above we're using data declared with "dd" (data DWORD) and instructions ending in "ss" (scalar single-precision float), which corresponds to the C/C++ type "float" (4 bytes).

On 64-bit machines, the xmm registers are always available, and are the standard way to pass and return floating point values. CAUTION: On 32-bit machines, xmm is not used to pass or return floating point values (the old "floating point register stack" is used on 32 bit machines), and on really old machines (pre-Pentium) xmm may not even be available.