Receiving Calls from C/C++: The Linker


CS 301 Lecture, Dr. Lawlor

Here's some C++ code that calls an external function "bar".  Note that this code gives a link error when you try to run it in NetRun, because "bar" is never defined. The "extern "C"" tells C++ to just look for a C-style plain function "bar", instead of a fancy overloaded C++ function "bar(int,int,int)".
extern "C" int bar(int a,int b,int c);

int foo(void) {
return bar(0xA0B1C2D3, 0xE0E1E2E3, 0xF0F1F2F3);
}

(executable NetRun link)

We can actually write this "bar" function in assembly, like this:
global bar
bar:
mov eax,[esp+4]
ret

(executable NetRun link)

When we first get called, sitting on the top of the stack (at DWORD[esp]) should be our caller's return address.  Deeper into the stack is our first argument (at DWORD[esp+4]), then our second argument (at DWORD[esp+8]), and so on.  Of course, if you change the stack pointer, the location of your arguments relative to the stack pointer changes too!

The "global" keyword in assembly tells the assembler to make a symbol, in this case bar, visible from outside the file.

The "Link With:" box tells NetRun to link together two different projects, in this case one in C++ and the other in assembly.

Frame Pointer

It's pretty common for compiler-generated code, or long human-written assembly code, to use a "frame pointer".  The problem the frame pointer is trying to solve is that esp keeps moving around as you add and remove stuff from the stack.  So the frame pointer is just a copy of the stack pointer from somewhere early in the function execution. 

For example, we can start with our argument-fetching assembly code from before:
global bar
bar:
mov eax,[esp+4]
ret

(executable NetRun link)

Say we need to make some space on the stack for an array.  Now our code becomes:
global bar
bar:
sub esp,100
mov eax,[esp+104]
add esp,100
ret

(executable NetRun link)

Note that because esp moved down, we have to adjust our accesses to get to the same locations.
If, instead, we make a copy of the "old" esp (for example in register ecx), then we have a fixed point of reference in memory:
global bar
bar:
mov ecx,esp ;<- backup copy of old stack pointer
sub esp,100
mov eax,[ecx+4] ;<- always our first argument, regardless of the current value of esp
add esp,100 ;<- "mov esp,ecx" would work here too!
ret

(executable NetRun link)

It's traditional to use register "ebp" (Extended Base Pointer) to store the old value of the stack pointer.  The compiler always sets up register ebp in every function (unless you ask it to omit the frame pointer with "-fomit-frame-pointer").  Unfortunately, ebp is a "callee saved" register--you can't just start using the value like you can with eax through edx, you have to make sure you set it back to the old value (just like the stack pointer!).  So it's traditional to push and pop ebp at the start and end of your function, like this:
global bar
bar:
push ebp ;<- save the old ebp onto the stack (warning: this does change esp!)
mov ebp,esp ;<- backup copy of old stack pointer
sub esp,100
mov eax,[ebp+8] ;<- always our first argument, regardless of the current value of esp
 add esp,100 ;<- or "mov esp,ebp"
pop ebp ; <- restore the old ebp, so we don't crash after we return
ret

(executable NetRun link)

The push-and-move at the start of the function is often called the "function prologue"; and the restore-and-pop at the end is often called the "function epilogue".  Your function arguments are always at +8, +12, +16, ... bytes from ebp, and your local variables are always at negative offsets from ebp.  Because ebp is callee saved, you don't have to worry that print_int is going to change your ebp value--it's every function's responsibility to preserve esp and ebp from their caller.

Name Mangling and extern "C"

C++ "mangles" the linker names of its functions to include the data types of the function arguments. This is good, because it lets you overload function names; but it's bad, because plain C and assembly don't do anything special to the linker names of functions.

In C or assembly, a function "foo" shows up as just plain "foo" in the linker. In C++, a function foo shows up as "foo()" or "foo(int,double,void *)". (Check out the disassembly to be sure how your linker names are coming out.)

So if you call C or assembly code from C++, you have to turn off C++'s name mangling by declaring the C or assembly routines 'extern "C"', like this:

extern "C" void some_assembly_routine(int param1,char *param2);
or wrapped in curly braces like this:
extern "C" {
void one_assembly_routine(int x);
void another_assembly_routine(char c);
}

In fact, it's common to provide a "magic" header file for C code that automatically provides 'extern "C"' prototypes for C++, but just works normally in plain C:

#ifdef __cplusplus /* only defined in C++ code */
extern "C" {
#endif
void one_assembly_routine(int x);
void another_assembly_routine(char c);
#ifdef __cplusplus /* only defined in C++ code */
}
#endif
Definitely try these things out yourself:

Plain C bar routine:
int bar(int i,int j) {
printf("bar(%d,%d)\n",i,j);
return i;
}
(executable NetRun link)

C++ foo routine that calls bar:
extern "C" int bar(int i,int j);
int foo(void)
{
return bar(2,3);
}
(executable NetRun link)

Try:
Code written in
With name
Has linker name
C++
int bar(int a,int b)
bar(int,int)    <- But "mangled" to be alphanumeric...
C++
extern "C" int bar(int a,int b) bar
C
int bar(int a,int b)
bar
Assembly
global bar
bar:
bar
Fortran
SUBROUTINE bar()
bar_, BAR, BAR_, bar__, or some such.  Disassemble to be sure...

Bottom line: to call code written in anything else (C, Assembly, Fortran) from C++, or to call C++ from anything else, add extern "C" to the C++ code!

Argument Passing

C and C++ are kinda asymmetrical, because "int" parameters are placed directly on the stack (like "push 3"), while arrays are always passed via pointer (like "push my_array", which pushes the *address* of the array, not the actual integers in the array).  C/C++ do this because you can cheaply copy an "int", but copying an array might take a lot of time and memory.

Fortran, curiously, passes *everything* via pointer--if a Fortran function takes an int parameter, what gets pushed on the stack is a *pointer* to an int, not the int itself!

To summarize:
When passing...
In C/C++, you...
In Fortran, you...
an int
push the int
push a pointer to the integer
an array
push a pointer to the first element of the array
push a pointer to the first element of the array
a char
push an int containing the character's value
push a pointer to the character

Fortran 1D arrays are indexed using round brackets, like "myarr(i)".  And the index of the first array element in Fortran is "myarr(1)", not "myarr[0]" like C/C++.  But beyond those small differences, arrays work exactly the same in Fortran as in C/C++, and in fact it's not always possible from looking at the generated assembly code whether the original code was written in C, C++, Fortran, or Assembly!

Fun With Fortran!

CS 301 isn't a computer languages course, but I think it's pretty interesting to look at old-school Fortran, a language from 1956.  Note how this little function returns 10, like you'd expect.  And the assembly code is pretty much exactly what you'd get from C/C++!
       function foo()
INTEGER foo

i = 7;
foo = i + 3;

end function

(executable NetRun link)

Here's a "do loop" (the Fortran equivalent of C/C++ "for"):
       function foo()
INTEGER foo

do i=1,10
CALL print_int(i)
end do
foo = i + 3;

end function

(executable NetRun link)

Note that "print_int" is defined in NetRun's "inc.c" as:
    CDECL void print_int__(int *i) {print_int(*i);}
Here,

This sort of Fortran/C/C++ interfacing is really common in big projects.