Linking Assembly and C/C++

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

First, you should be familiar with the idea that compiling your .cpp source code first produces a .obj object file, which gets linked with your libraries into an executable.

Input
Output
Name
Programs
.cpp
.obj
C++ compiler
g++, clang++, cl
.c
.obj
C compiler
gcc, clang, cl /TC
.S
.obj
Assembler
nasm, gas, masm
Several
.obj files
One .exe
Linker
ld
(Although it's more common to call it via the compiler.)

These .obj files are also how we combine code from any compiled language, including Fortran, Rust, D, etc. 

The only languages that don't use .obj files for linking are interpreted languages, such as Python (where you call non-Python code in a library using ctypes), or Java (where you cram non-Java code into a Java Native Interface shared library that gets loaded by the Java runtime).

Defining a Function in Assembly

Here's how you write an entire function in assembly.  The "global bar" keyword in assembly tells the assembler to make the function name "bar" visible from outside the file.

global bar
bar:
add rdi,1000
mov rax,rdi
ret

(Try this in NetRun now!)

The "Link With:" box (under "Options") tells NetRun to link together two different projects, in this case one in C++ and the other in assembly.  This C++ code calls the assembly here.

extern "C" int bar(int param);

int foo(void) {
return bar(6);
}

(Try this in NetRun now!)

You can call C++ code from assembly almost as easily, by making the C++ code extern "C", using "extern someName" in assembly, and then call the function normally--this is exactly the same way you call plain C code from C++ or vice versa.  (If you don't do extern "C" on the C++ side, it's difficult to find the function's name, due to C++ name mangling.  Name mangling also makes it hard to call C++ operators and methods from C or Assembly, so stick with functions.)


Calling a function bar written in...

C++
C
Assembly
Called
    from C++
long bar(long) { ... }
   long bar(long);
long bar(long) { ... }
   extern "C" long bar(long);

global bar
  
extern "C" long bar(long);
Called
    from C
extern "C" long bar(long) { ... }
   extern long bar(long);
long bar(long) { ... }
   extern long bar(long);
global bar
   extern long bar(long);
Called
    from Assembly
extern "C" long bar(long) { ... }
   extern bar
long bar(long) { ... }
   extern bar
global bar
   extern bar


Linking Assembly and C++ at the Command Line

The most portable way to include some assembly functions in your code is to compile the assembly in a separate file, then link it with the rest of your C++.  For example, in a file named "foo.S":
section .text
global _foo
_foo:
mov eax,7
ret

(Note the weird underscore in front of the function name--the compiler adds these on Windows in 32-bit mode, and on OS X, but never adds them on Linux or in 64-bit Windows.)

You'd assemble this into "foo.obj" using NASM with this command line (on Windows 32-bit):

    nasm -f win32 foo.S

Then in a file named "main.cpp", we call foo with an extern "C" prototype:

#include <iostream>
extern "C" int foo(void);

int main() {
std::cout<<"Foo returns "<<foo()<<"\n";
return 0;
}

We compile the C++ and link it to the assembly using the Microsoft Visual C++ compiler like this:

    cl -EHsc main.cpp foo.obj

(You may have to run "vc_vars.bat" to get "cl" into your PATH.)

We now have a functioning C++/Assembly executable!  The same exact command-line trick works on Linux (nasm -f elf32   or   nasm -f elf64) or OS X (nasm -f macho32   or   nasm -f macho64) as long as you're compiling with g++ or gcc.  If you don't like the command line, you can hide the NASM command line inside Visual C++ as I explain here.


64-bit
32-bit
OS
Function
Command Line
Function
Command Line
Windows
foo
nasm -f win64 foo.S
_foo
nasm -f win32 foo.S
OS X
_foo
nasm -f macho64 foo.S
_foo
nasm -f macho32 foo.S
Linux
foo
nasm -f elf64 foo.S
foo
nasm -f elf32 foo.S

Caution: NetRun is a 64 bit linux machine, and passes function parameters in rdi, rsi, rdx, rcx, r8, and r9.  Linux moved to "position-independent executables (PIE)" by default around 2010; since then you need to pass "-no-pie" to g++ to avoid an error if you just call functions.  I compile Linux assembly (bar.asm) and link with C++ (main.cpp) using:

nasm -f elf64 bar.asm
g++ -Wall -no-pie bar.o main.cpp -o prog
./prog

OS X in 64 bit uses the same parameter scheme, but added an underscore in front of function names (the last time I looked).

Windows in 64 bit passes function parameters using different registers: rcx, rdx, r8, and r9.   Also, rdi and rsi are preserved registers.

In 32 bit mode, the biggest register is eax (rax hasn't been invented yet).  Parameters are passed in 32-bit mode by pushing them onto the stack in reverse order, so the function's first parameter is on top of the stack before making the call.

Whenever I work in assembly on a new OS or new compiler setup, I try to assume *nothing* works the way I expect until I verify it.  I might start in assembly by writing a function that just returns, verify that works without linker or runtime errors, then write a function that returns a constant, then accesses parameters, and finally work my way up to calling functions,  allocating memory, and working with arrays and classes.  Don't just write the whole thing in one go, something will inevitably break!