Network Programming 3

CS 493/693 Lecture, Dr. Lawlor, 2006/01/27

Network Protocol Design

So you've opened a socket, and are able to send data back and forth. What data do you send?

Example--HTTP

The HyperText Transfer Protocol is actually quite simple. (Although you wouldn't be able to tell from reading the official "Request For Comment" standard rfc2068!) The simplest HTTP exchange is just this:

Client sends a request as ASCII text:
    "GET /foo.html HTTP/1.1\r\n"
    "Host: www.foobar.com\r\n" (the host name is required, in case you're talking to a "virtual server")
    "\r\n"     (a blank line terminates the HTTP request)

The server receives this request, processes it, and sends a response starting with ASCII text:
    "HTTP/1.1 200 OK\r\n" (200 is the "OK" status. 404 would be "not found".)
    "Content-Type: text/html\r\n" (MIME type: describes to the browser how to parse this data)
    "Content-Length: 100\r\n" (length of data to follow, in bytes)
    "\r\n"    (blank line indicates end of header; Content-Length bytes of data immediately follow)
The server then sends the whole file. The client knows to expect Content-Length bytes of data.

So overall there's just one round trip--a request, and a response.   Parsing is only required for the message headers--the body of the web page or image is sent as a big binary chunk of Content-Length bytes.

The Good

HTTP is totally standard in that each exchange starts with a well-defined "header" (the ASCII text piece) that describes the more complicated stuff to follow. The standard things to put in the header are:

The protocol you're trying to speak. This helps you immediately recognize when the other side isn't speaking your language.
The protocol version. This allows backward and forward compatability--if the client sends HTTP/1.0, the server knows it better only speak the older 1.0 dialect. If the client sends a too-new version like HTTP/2.0, the server can issue a sensible error message, which is way better than trying to muddle along and crashing.
The purpose of the request, like the URL you're requesting.
The length of the data to follow. See below for the advantages of sending a byte count first.

There are lots of nice things about sending data in a big binary chunk of a known size, like HTTP's Content-Length:

The client can preallocate memory to receive the whole chunk. With ASCII, you don't know how much data to expect, so you have to either start with a small allocation and grow (ugly, although std::string makes it pretty easy) or else assume some fixed maximum buffer size (both ugly and error-prone).
The client can issue a single receive call to grab the chunk off the network.
The server can send *anything* in those data bytes--there aren't any special disallowed values like newlines, spaces, or control characters.
The client doesn't have to waste time parsing the chunk, since there aren't any special values to watch out for.
Both client and server are easier to write, and less likely to have performance, correctness, and security bugs. Parsing input data is very error-prone; moving data in big chunks less so.

The only thing you need to do before sending a block of binary data is make sure both sides know how much you're sending. Good solutions to this "how many bytes?" problem are:

Send the byte count as ASCII. This is what HTTP does, in the Content-Type field. One disadvantage is then you've got to (carefully, slowly) parse the byte count before you can actually send data.
Hardcode the byte count. Always send 8, or 32, or 117 bytes if that's how many you always need. This is the easiest solution, but when you need change the protocol to send more data, you may be in trouble.
Send the byte count in binary. A standard way to do this is to send some standard binary integer size and representation, like a 32-bit big-endian integer (see Ugly section below). The osl/socket.h "Big32" class is stored in memory like a 32-bit big-endian integer, but uses C++ magic to act like a normal int otherwise.

Human-readable formats like ASCII text have some advantages during debugging, since humans are way better at recognizing newlines than counting binary bytes. But computers are pretty much the opposite!

Writing code to parse ASCII text is tough. Parsing it securely and quickly, without writing too much code, is really tough. People are (thankfully) beginning to use XML as their human-readable format of choice, although then you need a not-yet-standardized XML parsing library. It'll probably be years before XML is common enough people rely on it for basic protocols.

The Bad, and the Ugly

There are a bunch of really ridiculous problems you have to work around when exchanging binary data (in files or network packets) between two different machines.

Different machines have different end-of-line characters in their text files. UNIX machines use just "\n". DOS machines use "\r\n". Mac OS 9 machines used just "\r". There are several different programs to change one kind of newline to another. Web browsers and FTP clients try to hide these differences by converting on the fly (when transferring in "ascii mode" or "text mode"), but this of course screws up non-text files that happen to have a few newline characters (which must be transferred in "binary mode"). Most network protocols are using the DOS-style \r\n nowadays, but you really have to read the documentation (or sniff packets!) to be sure.

Different machines have different sizes for "int" (some machines are 32-bit, some 64-bit; ancient MS-DOS machines had an "int" of 16 bits). This of course causes disaster if you take a bunch of "ints" from one machine to another--the sizes just aren't the same. Two 32-bit machines can still be unable to directly transfer if one machine is "little-endian" (like x86 machines) and the other is "big-endian" (like PowerPC macs and pretty much all other UNIX boxes). Big and little endian differences can be resolved with "byte swapping", but this doesn't help if one machine is little-endian 64 -bit and the other big-endian 32-bit.   The best solution (in my opinion) is to write a little C++ class with a known in-memory representation. osl/socket.h includes "Big32", a class that's stored in memory like a big-endian 32-bit integer on every machine, so you can send and receive a "Big32" safely between any two machines.

To be specific,
    int byte_count=compute_message_length();
    skt_sendN(s,&byte_count,sizeof(byte_count));
and
    int byte_count;
    skt_recvN(s,&byte_count,sizeof(byte_count));
    char *buf=new char[byte_count];
JUST WON'T WORK because it's possible on the sender, sizeof(byte_count)==4 bytes (on a 32-bit machine); while on the receiver sizeof(byte_count)==8 bytes (on a 64-bit machine). Further, even if the sizes are the same, the endianness might be different so the byte_count value would get screwed up.

Instead, it's much better to send and receive a Big32 object:
    Big32 byte_count=compute_message_length();
    skt_sendN(s,&byte_count,sizeof(byte_count));
and
    Big32 byte_count;
    skt_recvN(s,&byte_count,sizeof(byte_count));
    char *buf=new char[byte_count];
and this WILL work on big and little endian machines, and machines with different integer sizes. Note how we can treat a Big32 pretty much like an "int", but unlike an int a Big32 is always stored in memory the same way on every machine.

Different machines have different structure and alignment padding requirements. For example, on an x86,
    struct deathSize {
       float x; double z;
    }
takes up 12 bytes--4 bytes for the float, and 8 bytes for the double. But on any other machine, the struct takes up 16 bytes, since the compiler has to insert 4 bytes of padding to make the "double" land on an 8-byte boundary. This means it won't work to send and receive structures WITH DIFFERENT SIZED ELEMENTS between processors, because the structure size may be different due to alignment padding. One solution is to never use structs. The other solution is to use Big32 and Big16 exclusively, since they have no alignment requirements (in memory they're just an array of unsigned char).

Recommendations

My personal favorite way to design a network protocol header is to use a bunch of Big32 network ints inside a struct. So I'd say
    struct fooHeader {
       Big32 protocol; /* protocol: always 0xF00BA7 */
       Big32 version; /* 1 for latest version */
       Big32 reqLen; /* bytes of request data */
       Big32 optLen; /* bytes of optional data (after request data) */
    };

Sending a foo header just means filling out each field, and sending off the whole struct:
    fooHeader h;
    h.protocol=0xF00BA7; h.version=1; h.reqLen=reqLen; h.optLen=optLen;
    skt_sendN(s,&h,sizeof(h));
    skt_sendN(s,req,h.reqLen);
    skt_sendN(s,opt,h.optLen);

You'd then receive and check a foo header and the accompanying data like this:
    fooHeader h;
    skt_recvN(s,&h,sizeof(h));
    if (h.protocol!=0xF00BA7) error_exit("Protocol mismatch! (network sent 0x%08x)\n",(int)h.protocol);
    if (h.version!=1) error_exit("Version mismatch! (network sent 0x%08x)\n",(int)h.version);
    unsigned int reqLen=h.reqLen; /* turn lengths into unsigned integers */
    unsigned int optLen=h.optLen;
    /* sanity check lengths before allocating memory */
    if (reqLen>10000) error_exit("Request length absurd! (network sent 0x%08x)\n",reqLen);
    if (optLen>10000) error_exit("Option length absurd! (network sent 0x%08x)\n",optLen);
    byte *req=new byte[reqLen];
    byte *opt=new byte[optLen];
    skt_recvN(s,req,reqLen);
    skt_recvN(s,opt,optLen);

Compared to parsing an ASCII header, this is a lot easier. It's also much easier to prove to yourself there aren't any security holes here, because there's so little data-dependent processing.

I've added this little example as "fooclient.cpp/fooserver.cpp" to the hw1 support directory. Nothing else has changed.