Systems Programming: Registration

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

Real software gets very big and complex--the Linux kernel currently consists of over 20 million lines of code spread across over 40,000 files.  It's hard to convey how huge this is.  (Say you're really good and can read and understand 1 line of code per second, and you spend 12 hours every day just reading the kernel source code.  It will still take over 1 year to read the code.)

This complexity makes the code hard to understand and change.  One of the biggest tools we have to fight this complexity is abstraction, where we split up the code into interchangeable parts that all work the same way.  Sometimes this shows up at the user level, in the form of plugins or extensions, but it's really widespread at a lower level inside the software.  For example, different USB cameras need to be set up differently, and format their images in slightly different ways, but they all generate a series of image frames.  This means we can define a single USB camera interface abstraction that any USB camera can use (interface at include/media/soc_camera.h; implementations at drivers/media/i2c/soc_camera).

(These layers of registrations do add function call overhead.  Some modern networking work bypasses the kernel, and all its layers, to get higher performance by directly mmap'ing the network hardware.)

As a concrete example of how you might use registration, imagine reading the level description file for your game.  Modern games are complex, so you need to read several different kinds of objects, but to start with, there's just obstacles and monsters in your levels, so you bang out this code:
// A thing that lives in the game world
class entity {
public:
	std::string type; 
	float x,y; // location
	std::string texture; // graphics texture file
	
	virtual void print(std::ostream &out) {
		out<<type<<" "<<texture<<" "<<x<<" "<<y<<" ";
	}
};

// An obstacle
class obstacle : public entity {
public:
};


// A monster
class monster : public entity {
public:
	float hitpoints;
	
	virtual void print(std::ostream &out) {
		entity::print(out);
		out<<hitpoints<<" ";
	}
};


// A game level
class game_level {
public:
	std::vector<entity *> stuff;
	
	// Read the whole level from this file
	void read(std::istream &level_file) {
		while (level_file) {
			std::string type;
			level_file>>type;
			if (type=="obstacle") {
				obstacle *e=new obstacle;
				e->type=type;
				level_file >> e->texture >> e->x >> e->y;
				stuff.push_back(e);
			}
			if (type=="monster") {
				monster *e=new monster;
				e->type=type;
				level_file >> e->texture >> e->x >> e->y >> e->hitpoints;
				stuff.push_back(e);
			}
		}
	}
	
	// Print everything in the level
	void print(std::ostream &out) {
		for (entity *e : stuff) {
			out<<"Game entity "<<e<<" = ";
			e->print(out);
			out<<"\n";
		}
	}
};

void foo(void) {
	game_level l;
	l.read(std::cin);
	l.print(std::cout);
}

(Try this in NetRun now!)

Notice how everything is pretty simple, except reading the file--the level read needs to know about all the entity types that exist, and what they contain in order to read them.  So if we need to add a new type of entity, we need to add another check to the level read.  In real games, where you might have a thousand different entity types and sub-types, this gets unmaintainable pretty quickly.

The fix is to register each type of entity in a single table, here "entity_maker_by_type":
// A thing that lives in the game world
class entity {
public:
	std::string type; 
	float x,y; // location
	std::string texture; // graphics texture file
	
	virtual void read(std::istream &level_file) {
		level_file >> texture >> x >> y;
	}
	virtual void print(std::ostream &out) {
		out<< type <<" "<< texture <<" "<< x <<" "<< y <<" ";
	}
};

// A maker function creates a new entity object of the right type.
//   This is a pointer to a function returns an "entity *".
typedef entity * (*entity_maker) (void);

// This contains the maker function for each type of entity.
//   New entity types need to be registered into this table.
std::map<std::string, entity_maker> entity_maker_by_type;


// An obstacle
class obstacle : public entity {
public:
};
// Makes obstacles
entity *make_obstacle(void) { return new obstacle; }

// A monster
class monster : public entity {
public:
	float hitpoints;
	
	virtual void read(std::istream &level_file) {
		entity::read(level_file);
		level_file >> hitpoints;
	}
	virtual void print(std::ostream &out) {
		entity::print(out);
		out << hitpoints << " ";
	}
};
// Makes monsters
entity *make_monster(void) { return new monster; }


// A game level
class game_level {
public:
	std::vector<entity *> stuff;
	
	// Read the whole level from this file
	void read(std::istream &level_file) {
		while (level_file) {
			std::string type;
			level_file>>type;
			if (type=="") break;
			
			entity_maker maker=entity_maker_by_type[type];
			if (!maker) throw std::runtime_error("unknown type "+type);
			entity *e=maker(); // make the entity
			
			e->type=type;
			e->read(level_file);
			stuff.push_back(e);
		}
	}
	
	// Print everything in the level
	void print(std::ostream &out) {
		for (entity *e : stuff) {
			out<<"Game entity "<<e<<" = ";
			e->print(out);
			out<<"\n";
		}
	}
};

void foo(void) {
	// Register the different types of entity here:
	entity_maker_by_type["monster"]=make_monster;
	entity_maker_by_type["obstacle"]=make_obstacle;
	
	game_level l;
	l.read(std::cin);
	l.print(std::cout);
}

(Try this in NetRun now!)

Notice how much simpler the game level read code has become.  The downside is there's more support code for the sub-objects.  But we can write a macro to generate the maker function, and put it into the table using a dedicated global variable's constructor.
// A thing that lives in the game world
class entity {
public:
	std::string type; 
	float x,y; // location
	std::string texture; // graphics texture file
	
	virtual void read(std::istream &level_file) {
		level_file >> texture >> x >> y;
	}
	virtual void print(std::ostream &out) {
		out<< type <<" "<< texture <<" "<< x <<" "<< y <<" ";
	}
};

// A maker function creates a new entity object of the right type.
//   This is a pointer to a function returns an "entity *".
typedef entity * (*entity_maker) (void);

// This contains the maker function for each type of entity.
//   New entity types need to be registered into this table.
std::map<std::string, entity_maker> entity_maker_by_type;

// This macro registers your entity type, by creating
//  a maker function, then adding it to the entity_maker_by_type table.
#define REGISTER_GAME_ENTITY(typename) \
	entity *make_##typename(void) { return new typename; } \
	class global_constructor_##typename { public: \
		global_constructor_##typename() { \
			entity_maker_by_type[#typename]=make_##typename; \
		} \
	} global_constructor_variable_##typename; 



// An obstacle
class obstacle : public entity {
public:
};
REGISTER_GAME_ENTITY(obstacle);

// A monster
class monster : public entity {
public:
	float hitpoints;
	
	virtual void read(std::istream &level_file) {
		entity::read(level_file);
		level_file >> hitpoints;
	}
	virtual void print(std::ostream &out) {
		entity::print(out);
		out << hitpoints << " ";
	}
};
REGISTER_GAME_ENTITY(monster);


// A game level
class game_level {
public:
	std::vector<entity *> stuff;
	
	// Read the whole level from this file
	void read(std::istream &level_file) {
		while (level_file) {
			std::string type;
			level_file>>type;
			if (type=="") break;
			
			entity_maker maker=entity_maker_by_type[type];
			if (!maker) throw std::runtime_error("level contains unknown type "+type);
			entity *e=maker(); // make the entity
			
			e->type=type;
			e->read(level_file);
			stuff.push_back(e);
		}
	}
	
	// Print everything in the level
	void print(std::ostream &out) {
		for (entity *e : stuff) {
			out<<"Game entity "<<e<<" = ";
			e->print(out);
			out<<"\n";
		}
	}
};

void foo(void) {
	game_level l;
	l.read(std::cin);
	l.print(std::cout);
}

(Try this in NetRun now!)

In class, we built a templated factory that registers itself, which comes out a little cleaner than the macros above:
class entity_factory {
public:
	static std::map<std::string /* type */, entity_factory * > table_of_factories;
	void register_factory(std::string type) { table_of_factories[type]=this; }
	virtual entity *create(void) =0;
};
std::map<std::string /* type */, entity_factory * > entity_factory::table_of_factories;

// This factory template registers itself on construction
//   Use it like:  factory<foo>  foo_factory("foo");
template <class TYPE>
class factory : public entity_factory {  
public:  
	factory(std::string type) { register_factory(type); }  
	virtual entity *create(void) { return new TYPE; } 
};

// An obstacle
class obstacle : public entity {
public:
};
factory<obstacle> obstacle_factory("obstacle");

// A monster
class monster : public entity {
public:
	float hitpoints;
	
	virtual void print(std::ostream &out) {
		entity::print(out);
		out<<hitpoints<<" ";
	}
};
factory<monster> monster_factory("monster");

(Try this in NetRun now!)

In plain C, you don't have templates, classes, or virtual methods, but you do have linked lists and function pointers.  Here, we're giving string names to functions, which lets us call functions based on their string name:
struct string_to_code {
	const char *string; // name of function
	void (*code)(void); // function pointer
	string_to_code *next; // linked list of these things
};

string_to_code *head=NULL;

void code_register(const char *string,void (*code)(void))
{
	struct string_to_code *c=(struct string_to_code *)malloc(sizeof(struct string_to_code));
	c->string=string;
	c->code=code;
	c->next=head;
	head=c;
}

void code_call(const char *str) {
	for (struct string_to_code *c=head;c!=NULL;c=c->next) {
		if (0==strcmp(c->string,str))
			c->code();
	}
}

void say_hello(void) {
	printf("Hello!\n");
}

long foo(void) {
	code_register("hello",say_hello);
	code_call("hello");
	return 0;
}

(Try this in NetRun now!)



I use the same registration trick, using a "factory" class, to build new Arduino device objects in our RobotMoose infrastructure.  This lets the Arduino find its compiled-in devices to reconfigure itself at runtime without re-flashing the control code, based on strings sent down from the web server.

One downside with registration is there isn't much language support for it in C or C++, so folks tend to build rather weird-looking custom data structures to store and look up the registered types--as we do above.  Some languages have builtin support for converting type names to actual objects; Java has "class.forName"; most interpreted languages have a string-to-code function, often named eval, that can be used to build new objects at runtime.  Even in those languages, it can still be much faster (and safer) to look up a type name in a registered table, exactly like in C++, than to use a bare eval to convert the string to an object.