Macros in NASM and C

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

Macros in NASM

Because assemblers are line-oriented, there's often a difference between single-line and multi-line macros.

In NASM, "%define" is for single-line macros, such as renaming registers.  It works very much like C/C++ "#define".

%define param rdi 
add param,1
mov rax,param
ret

(Try this in NetRun now!)

You can also rename instructions, such as adding the missing "e" to "move":

%define move mov 
add rdi,1
move rax,rdi
ret

(Try this in NetRun now!)

You can also cross syntax between instruction and register, and include arbitrary syntax like commas inside the macro body:

%define mover mov rax,
add rdi,1
mover rdi
ret

(Try this in NetRun now!)

For multi-line macros, which can expand out to several instructions, an entirely different syntax is used.  This is normally used to create new pseudo-instructions, such as this new "return" operation.  The first 1 is the number of parameters expected; the %1 refers to the first macro parameter.

; Make a multi-line macro named "return", working like C/C++
;   See http://www.nasm.us/doc/nasmdoc4.html#section-4.3
%macro return 1
	mov rax,%1
	ret
%endmacro

return 3;

(Try this in NetRun now!)

Inside a macro, you can define a macro-local label by putting %% in front of the label name.  This lets you use the macro in several places without giving conflicting multiple labels.

Macros in C and C++

Macros are an ancient and controversial way to transform C or C++ source code.  The controversy is due to the fact that like "goto", macros are too powerful and general to be trusted.  Though C and C++ use the exact same syntax for macros, their use is common in plain C, but less popular in C++.

There are a number of different features in the C preprocessor:

Constant-like Macro

The standard uses of this are pretty straightforward: 
    #define symbol replacement
makes the preprocessor replace every occurrence of "symbol" with "replacement" before compiling.  So this works fine, and returns 17:

#define n 17

return n;
(Try this in NetRun now!)

Unlike everything else in C++, "n" is now defined as 17 from this point onwards, even across scope like classes or functions.  Declaring "int n=3;" gets rewritten to "int 17=3;", which won't compile!  To avoid this, by convention macros are written in capital letters, like "NUM_ELEMENTS", not just bare n.

There's another problem with plain string replacement.  Let's say you've defined a constant like:

#define n 10+10

return n*n;
(Try this in NetRun now!)

This returns 10+10*10+10 = 120.  What?!

There are several well known fixes for this bug:

Include Guards

If somebody #includes your header, but they also #include a second header that itself #includes your header, your header code gets compiled twice and causes weird error messages about multiple declarations.  To avoid this, it is common to start your header with:

#ifndef LAWLOR_CODE_ALREADY_INCLUDED

#define LAWLOR_CODE_ALREADY_INCLUDED

...

#endif

The first time around, the ifndef fires, and your header gets compiled normally.  Any subsequent time, the macro has been defined, so your whole header gets skipped.  

As usual, there are several dangerous things about this.  First, the macro name must be globally unique, or the second header using a given name will be skipped the first time through, so the macro name should include at least the name of the header, project, and author.  Second, each header must remember to do this, or the second use of the header will fail.

Portability

Portable code often needs to do different things on different compilers.  For example, code that can be compiled by both a C++ or a C compiler can add additional features to the C++ version using:

#ifdef __cplusplus

class foo { ... };

#endif

Portable code intended to be run on different systems like OS X and Windows often needs to distinguish between them, to access OS-dependent features or work around OS-specific bugs.  The full list of predefined platform macros is long, but the most useful platform macros are:

Inside NetRun's timing code, I use #ifdef _WIN32 to access a different timer on Windows systems.

Function-like Macro

In plain C, before inline functions, it was pretty common to write short utility functions using macros that take arguments.  As with all macros, the argument gets text-replaced at the very first stage of compilation:

#define times3(x) x*3
return times3(10);

(Try this in NetRun now!)

Again, the problem is this is a plain text replacment.  So calling the same macro with a sum:

    return times3(10+10);

(Try this in NetRun now!)

This returns... 40.  Thats 10+10*3, not (10+10)*3.   Again, the fix is to wrap the expression in parenthesis.  Note that arguments and the overall expression both need parenthesis, to protect against operators inside or outside the macro call:

#define times3(x) ((x)*3)
return times3(10+10);

(Try this in NetRun now!)

This works basically reliably, although there's not much benefit to using a macro over using a small function here, and there are many limitations--for example, there's no standard way to make a temporary variable inside a function-like macro.

Syntax-changing Macros

Macros have the ability to adjust syntax in interesting and abitrary ways.  For example, I can rename the curly braces with macros, like so:

#define begin {
#define end }

int x=5;
if (x>3)
begin
return 7;
end

(Try this in NetRun now!)

A FORTRAN programmer used to typing in ALL CAPS might be more comfortable using #define FOR for  

People are merely annoyed by most of the above uses.  It can get much worse, though:

#define BOOGA int x=9;  return (x+


BOOGA 7);

(Try this in NetRun now!)

In addition to screwing up any syntax highlighting editor, this sort of syntax-bending can easily destroy the readability of the code, and is bad enough to make people seriously talk of banning all macros.  On the plus side, it lets you mutate C++ into a very different looking language.

Advanced C++ Macro Features

Here's an example of stringification.  Note the quoted lines are just one long macro argument, and the newlines get removed before printing.  Annoyingly, you can't have a bare comma, or the preprocessor thinks you're passing two arguments into "quoted", which only takes one argument.

#define quoted(x) #x

cout<<quoted(
  This is my string.
  There are many like it but this one is mine.
)<<"\n";

(Try this in NetRun now!)

I use stringification all the time for GPU programming, where the graphics driver wants the GPU code as a string, but I don't want to wrap every line in a quote, so I pass a long multi-line "argument" to a quoting macro.  Also, I can have the same code run in C++ directly, then have a macro spit out a stringified version for the GPU to run.

Another place stringification is useful is in error checking.  This error checking macro not only checks for errors, but shows you the code that went wrong, and tells you the line number where it happened:

#define checkErrs(code) { int err=code; /* run */  if (err!=0) std::cout<<"Error in "<<#code<<" at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; }

int x=18;
checkErrs(x-18);
checkErrs(x-10);
return 0;
(Try this in NetRun now!)

This macro definition looks a little better using backslashes to separate the lines (known as "line splicing"):

#define checkErrs(code) { \
int err=code; /* run */ \
if (err!=0) { \
std::cout<<"Error "<<err<<" in '"<<#code<<"' at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; \
} \
}

I need the curly braces to be able to declare "int err" repeatedly. But people typically add an extra semicolon at the end of a function-like macro call, like checkErrs(x-18); above.  This extra semicolon is untidy, and will throw off an "if..else" statement with the macro in the middle.  There's a bizarre well-known solution, which is to add a worthless do{}while(0) that only exists to consume the semicolon:

#define checkErrs(code) do { \
int err=code; /* run */ \
if (err!=0) { \
std::cout<<"Error "<<err<<" in '"<<#code<<"' at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; \
} \
} while(0)

I don't like the do {} while(0) trick, although I have used it.

Another trick is to generate classes inside a macro.  For example, if my calculator needs ten "operator" classes for each of the basic operators, instead of typing them all out I'll generate them with a macro like this:

#define makeop(name,op) \
class calcop_##name { public: \
int calculate(int a,int b) { return a op b; } \
}

makeop(add,+);
makeop(sub,-);
makeop(mul,*);
makeop(div,/);
makeop(and,&);
makeop(or,|);
makeop(left,<<);
makeop(right,>>);

int foo(void) {
calcop_add a;
return a.calculate(100,10);
}
(Try this in NetRun now!)

The nice part is now if you need the calcop classes to inherit from some base class, you can add it to the macro.  Forgot the "const" in calculate?  Just add it to the macro.  To add a "getSymbol" method returning the operator's symbol, you can use stringification to add it to the macro definition like "const char *getSymbol(void) const { return #op; }".  If each operator needs to be registered into the list of operators, you can add that as well.

C++ macros can become quite complex, which is bad, but they provide very useful "metaprogramming" abilities, especially in big programs.   "Metaprogramming" is when the output of the first program (the preprocessor) is the source code for another program, here C++.  You can even do explicit metaprogramming, where one program outputs a second program:
    Compile generator code.
    Run generator.  Output is final code.
    Compile final code.
    Run final code.

This sort of complexity is common in real systems!