Linking Assembly and C/C++

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

First, you should be familiar with the idea that compiling your .cpp source code first produces a .obj object file, which gets linked with your libraries into an executable.

Input	Output	Name	Programs
.cpp	.obj	C++ compiler	g++, clang++, cl
.c	.obj	C compiler	gcc, clang, cl /TC
.S	.obj	Assembler	nasm, gas, masm
Several .obj files	One .exe	Linker	ld (Although it's more common to call it via the compiler.)

These .obj files are also how we combine code from any compiled language, including Fortran, Rust, D, etc.

The only languages that don't use .obj files for linking are interpreted languages, such as Python (where you call non-Python code in a library using ctypes), or Java (where you cram non-Java code into a Java Native Interface shared library that gets loaded by the Java runtime).

Defining a Function in Assembly

Here's how you write an entire function in assembly. The "global bar" keyword in assembly tells the assembler to make the function name "bar" visible from outside the file.

global bar
bar:
	add rdi,1000
	mov rax,rdi
	ret

(Try this in NetRun now!)

The "Link With:" box (under "Options") tells NetRun to link together two different projects, in this case one in C++ and the other in assembly. This C++ code calls the assembly here.

extern "C" int bar(int param);

int foo(void) {
	return bar(6);
}

(Try this in NetRun now!)

You can call C++ code from assembly almost as easily, by making the C++ code extern "C", using "extern someName" in assembly, and then call the function normally--this is exactly the same way you call plain C code from C++ or vice versa. (If you don't do extern "C" on the C++ side, it's difficult to find the function's name, due to C++ name mangling. Name mangling also makes it hard to call C++ operators and methods from C or Assembly, so stick with functions.)

	Calling a function `bar`written in...
	C++	C	Assembly
Called from C++	`long bar(long) { ... }` `long bar(long);`	`long bar(long) { ... }` `extern "C" long bar(long);`	`global bar` `e``xtern "C" long bar(long);`
Called from C	`extern "C"long bar(long) { ... }` `extern long bar(long);`	`long bar(long) { ... }` `extern long bar(long);`	`global bar` `extern long bar(long);`
Called from Assembly	`extern "C"long bar(long) { ... }` `extern bar`	`long bar(long) { ... }` `extern bar`	`global bar` `extern bar`

Linking Assembly and C++ at the Command Line

The most portable way to include some assembly functions in your code is to compile the assembly in a separate file, then link it with the rest of your C++. For example, in a file named "foo.S":

section .text
global _foo
_foo:
	mov eax,7
	ret

(Note the weird underscore in front of the function name--the compiler adds these on Windows in 32-bit mode, and on OS X, but never adds them on Linux or in 64-bit Windows.)

You'd assemble this into "foo.obj" using NASM with this command line (on Windows 32-bit):

    nasm -f win32 foo.S

Then in a file named "main.cpp", we call foo with an extern "C" prototype:

#include <iostream>
extern "C" int foo(void);

int main() {
	std::cout<<"Foo returns "<<foo()<<"\n";
	return 0;
}

We compile the C++ and link it to the assembly using the Microsoft Visual C++ compiler like this:

    cl -EHsc main.cpp foo.obj

(You may have to run "vc_vars.bat" to get "cl" into your PATH.)

We now have a functioning C++/Assembly executable! The same exact command-line trick works on Linux (nasm -f elf32 or nasm -f elf64) or OS X (nasm -f macho32 or nasm -f macho64) as long as you're compiling with g++ or gcc. If you don't like the command line, you can hide the NASM command line inside Visual C++ as I explain here.

	64-bit		32-bit
OS	Function	Command Line	Function	Command Line
Windows	foo	nasm -f win64 foo.S	_foo	nasm -f win32 foo.S
OS X	_foo	nasm -f macho64 foo.S	_foo	nasm -f macho32 foo.S
Linux	foo	nasm -f elf64 foo.S	foo	nasm -f elf32 foo.S

Caution: NetRun is a 64 bit linux machine, and passes function parameters in rdi, rsi, rdx, rcx, r8, and r9. Linux moved to "position-independent executables (PIE)" by default around 2010; since then you need to pass "-no-pie" to g++ to avoid an error if you just call functions. I compile Linux assembly (bar.asm) and link with C++ (main.cpp) using:

nasm -f elf64 bar.asm
g++ -Wall -no-pie bar.o main.cpp -o prog
./prog

OS X in 64 bit uses the same parameter scheme, but added an underscore in front of function names (the last time I looked).

Windows in 64 bit passes function parameters using different registers: rcx, rdx, r8, and r9. Also, rdi and rsi are preserved registers.

In 32 bit mode, the biggest register is eax (rax hasn't been invented yet). Parameters are passed in 32-bit mode by pushing them onto the stack in reverse order, so the function's first parameter is on top of the stack before making the call.

Whenever I work in assembly on a new OS or new compiler setup, I try to assume *nothing* works the way I expect until I verify it. I might start in assembly by writing a function that just returns, verify that works without linker or runtime errors, then write a function that returns a constant, then accesses parameters, and finally work my way up to calling functions, allocating memory, and working with arrays and classes. Don't just write the whole thing in one go, something will inevitably break!