Macros in NASM and C

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

Macros in NASM

Because assemblers are line-oriented, there's often a difference between single-line and multi-line macros.

In NASM, "%define" is for single-line macros, such as renaming registers. It works very much like C/C++ "#define".

%define param rdi 
add param,1
mov rax,param
ret

(Try this in NetRun now!)

You can also rename instructions, such as adding the missing "e" to "move":

%define move mov 
add rdi,1
move rax,rdi
ret

(Try this in NetRun now!)

You can also cross syntax between instruction and register, and include arbitrary syntax like commas inside the macro body:

%define mover mov rax,
add rdi,1
mover rdi
ret

(Try this in NetRun now!)

For multi-line macros, which can expand out to several instructions, an entirely different syntax is used. This is normally used to create new pseudo-instructions, such as this new "return" operation. The first 1 is the number of parameters expected; the %1 refers to the first macro parameter.

; Make a multi-line macro named "return", working like C/C++
;   See http://www.nasm.us/doc/nasmdoc4.html#section-4.3
%macro return 1
	mov rax,%1
	ret
%endmacro

return 3;

(Try this in NetRun now!)

Inside a macro, you can define a macro-local label by putting %% in front of the label name. This lets you use the macro in several places without giving conflicting multiple labels.

Macros in C and C++

Macros are an ancient and controversial way to transform C or C++ source code. The controversy is due to the fact that like "goto", macros are too powerful and general to be trusted. Though C and C++ use the exact same syntax for macros, their use is common in plain C, but less popular in C++.

There are a number of different features in the C preprocessor:

#include <foo> This finds the file foo in the system include path, and pastes it into the current program.
#include "foo" As above, but it checks the user include path.
#define foobar This "constant-like" macro replaces all occurrences of foo with the string bar.
#undef foo Remove the macro foo (if it exists).
#if defined(foo) ... #endif If somebody has defined a macro foo, compile the included lines; otherwise skip them.
#ifdef foo ... #endif This is shorthand for above.
#ifndef foo ... #endif Shorthand for "if not defined".
#if 0 ... #endif This always skips the included lines. This is useful for "commenting" out large sections of code, especially those that already have /* */ comments.
#error "oh noes" Stop compiling and show this error. Useful for the #else branch of an #ifdef chain.
#pragma foo Exactly what this does depends on the compiler. It can be used to access compiler internals, or access special features, but it is not portable.

#pragma comment(lib, "wsock32.lib") /* link with wsock32.lib network library on Windows, ignored elsewhere */

Constant-like Macro

The standard uses of this are pretty straightforward:
#define symbol replacement
makes the preprocessor replace every occurrence of "symbol" with "replacement" before compiling. So this works fine, and returns 17:

#define n 17

return n;
(Try this in NetRun now!)

Unlike everything else in C++, "n" is now defined as 17 from this point onwards, even across scope like classes or functions. Declaring "int n=3;" gets rewritten to "int 17=3;", which won't compile! To avoid this, by convention macros are written in capital letters, like "NUM_ELEMENTS", not just bare n.

There's another problem with plain string replacement. Let's say you've defined a constant like:

#define n 10+10

return n*n;
(Try this in NetRun now!)

This returns 10+10*10+10 = 120. What?!

There are several well known fixes for this bug:

Avoidance: never use macros. "enum" or "const int" provide basically the same functionality, and don't have this problem. Some see the preprocessor as an unwanted holdover from plain C that should never be used.
Workaround: if you #define something, wrap it in parenthesis, like
```
#define n (10+10)
```
This works, but you must remember to do it every time!

Include Guards

If somebody #includes your header, but they also #include a second header that itself #includes your header, your header code gets compiled twice and causes weird error messages about multiple declarations. To avoid this, it is common to start your header with:

#ifndef LAWLOR_CODE_ALREADY_INCLUDED

#define LAWLOR_CODE_ALREADY_INCLUDED

...

#endif

The first time around, the ifndef fires, and your header gets compiled normally. Any subsequent time, the macro has been defined, so your whole header gets skipped.

As usual, there are several dangerous things about this. First, the macro name must be globally unique, or the second header using a given name will be skipped the first time through, so the macro name should include at least the name of the header, project, and author. Second, each header must remember to do this, or the second use of the header will fail.

Portability

Portable code often needs to do different things on different compilers. For example, code that can be compiled by both a C++ or a C compiler can add additional features to the C++ version using:

#ifdef __cplusplus

class foo { ... };

#endif

Portable code intended to be run on different systems like OS X and Windows often needs to distinguish between them, to access OS-dependent features or work around OS-specific bugs. The full list of predefined platform macros is long, but the most useful platform macros are:

#ifdef _WIN32 for *any* Windows (not just 32-bit)
#ifdef __unix__ for many UNIX-like systems (Linux, BSD, IBM AIX, even Cygwin; but not OS X for some reason)
#ifdef __linux__ for Linux systems (Ubuntu, Debian, Red Hat, etc)
#ifdef __APPLE__ for OS X or iOS

Inside NetRun's timing code, I use #ifdef _WIN32 to access a different timer on Windows systems.

Function-like Macro

In plain C, before inline functions, it was pretty common to write short utility functions using macros that take arguments. As with all macros, the argument gets text-replaced at the very first stage of compilation:

#define times3(x) x*3
return times3(10);

(Try this in NetRun now!)

Again, the problem is this is a plain text replacment. So calling the same macro with a sum:

    return times3(10+10);

(Try this in NetRun now!)

This returns... 40. Thats 10+10*3, not (10+10)*3. Again, the fix is to wrap the expression in parenthesis. Note that arguments and the overall expression both need parenthesis, to protect against operators inside or outside the macro call:

#define times3(x) ((x)*3)
return times3(10+10);

(Try this in NetRun now!)

This works basically reliably, although there's not much benefit to using a macro over using a small function here, and there are many limitations--for example, there's no standard way to make a temporary variable inside a function-like macro.

Syntax-changing Macros

Macros have the ability to adjust syntax in interesting and abitrary ways. For example, I can rename the curly braces with macros, like so:

#define begin {
#define end }

int x=5;
if (x>3) 
begin
	return 7;
end

(Try this in NetRun now!)

A FORTRAN programmer used to typing in ALL CAPS might be more comfortable using #define FOR for

People are merely annoyed by most of the above uses. It can get much worse, though:

#define BOOGA int x=9;  return (x+


BOOGA 7);

(Try this in NetRun now!)

In addition to screwing up any syntax highlighting editor, this sort of syntax-bending can easily destroy the readability of the code, and is bad enough to make people seriously talk of banning all macros. On the plus side, it lets you mutate C++ into a very different looking language.

Advanced C++ Macro Features

__FILE__ expands to a string with the filename of the current source code. Handy for debugging.
__LINE__ expands to an integer with the current line number in the source code. Handy for generating names that should be different for each call to the macro.
#b makes a quoted string version of the argument b, which is handy for print statements. This is "stringification".
a##b sticks together constants or arguments a and b without any spaces. This "token pasting" is handy for generating new names, like "myClass_##name" or "myClass_##__LINE__".
You can extend a macro across several lines with a backslash. These are only for your convenience in writing the macro, and don't make it out to the compiler. In some compilers, a // comment inside a macro will thus kill off the whole rest of the macro!

Here's an example of stringification. Note the quoted lines are just one long macro argument, and the newlines get removed before printing. Annoyingly, you can't have a bare comma, or the preprocessor thinks you're passing two arguments into "quoted", which only takes one argument.

#define quoted(x) #x

cout<<quoted(
  This is my string.
  There are many like it but this one is mine.
)<<"\n";

(Try this in NetRun now!)

I use stringification all the time for GPU programming, where the graphics driver wants the GPU code as a string, but I don't want to wrap every line in a quote, so I pass a long multi-line "argument" to a quoting macro. Also, I can have the same code run in C++ directly, then have a macro spit out a stringified version for the GPU to run.

Another place stringification is useful is in error checking. This error checking macro not only checks for errors, but shows you the code that went wrong, and tells you the line number where it happened:

#define checkErrs(code) { int err=code; /* run */  if (err!=0) std::cout<<"Error in "<<#code<<" at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; }

int x=18;
checkErrs(x-18);
checkErrs(x-10);
return 0;
(Try this in NetRun now!)

This macro definition looks a little better using backslashes to separate the lines (known as "line splicing"):

#define checkErrs(code) { \
	int err=code; /* run */  \
	if (err!=0) { \
		std::cout<<"Error "<<err<<" in '"<<#code<<"' at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; \
	} \
}

I need the curly braces to be able to declare "int err" repeatedly. But people typically add an extra semicolon at the end of a function-like macro call, like checkErrs(x-18); above. This extra semicolon is untidy, and will throw off an "if..else" statement with the macro in the middle. There's a bizarre well-known solution, which is to add a worthless do{}while(0) that only exists to consume the semicolon:

#define checkErrs(code) do { \
	int err=code; /* run */  \
	if (err!=0) { \
		std::cout<<"Error "<<err<<" in '"<<#code<<"' at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; \
	} \
} while(0)

I don't like the do {} while(0) trick, although I have used it.

Another trick is to generate classes inside a macro. For example, if my calculator needs ten "operator" classes for each of the basic operators, instead of typing them all out I'll generate them with a macro like this:

#define makeop(name,op) \
class calcop_##name { public: \
	int calculate(int a,int b) { return a op b; } \
}

makeop(add,+);
makeop(sub,-);
makeop(mul,*);
makeop(div,/);
makeop(and,&);
makeop(or,|);
makeop(left,<<);
makeop(right,>>);

int foo(void) {
	calcop_add a;
	return a.calculate(100,10);
}
(Try this in NetRun now!)

The nice part is now if you need the calcop classes to inherit from some base class, you can add it to the macro. Forgot the "const" in calculate? Just add it to the macro. To add a "getSymbol" method returning the operator's symbol, you can use stringification to add it to the macro definition like "const char *getSymbol(void) const { return #op; }". If each operator needs to be registered into the list of operators, you can add that as well.

C++ macros can become quite complex, which is bad, but they provide very useful "metaprogramming" abilities, especially in big programs.   "Metaprogramming" is when the output of the first program (the preprocessor) is the source code for another program, here C++. You can even do explicit metaprogramming, where one program outputs a second program:
    Compile generator code.
    Run generator. Output is final code.
    Compile final code.
    Run final code.

This sort of complexity is common in real systems!