Network Programming 3
CS 493/693 Lecture,
Dr. Lawlor, 2006/01/27
Network Protocol Design
So you've opened a socket, and are able to send data back and forth. What data do you send?
Example--HTTP
The HyperText Transfer Protocol is actually quite simple. (Although you
wouldn't be able to tell from reading the official "Request For
Comment" standard rfc2068!) The simplest HTTP exchange is just this:
Client sends a request as ASCII text:
"GET /foo.html HTTP/1.1\r\n"
"Host: www.foobar.com\r\n" (the host name is required, in case you're talking to a "virtual server")
"\r\n" (a blank line terminates the HTTP request)
The server receives this request, processes it, and sends a response starting with ASCII text:
"HTTP/1.1 200 OK\r\n" (200 is the "OK" status. 404 would be "not found".)
"Content-Type: text/html\r\n" (MIME type: describes to the browser how to parse this data)
"Content-Length: 100\r\n" (length of data to follow, in bytes)
"\r\n" (blank line indicates end of header; Content-Length bytes of data immediately follow)
The server then sends the whole file. The client knows to expect Content-Length bytes of data.
So overall there's just one round trip--a request, and a
response. Parsing is only required for the message
headers--the body of the web page or image is sent as a big binary
chunk of Content-Length bytes.
The Good
HTTP is totally standard in that each exchange starts with a
well-defined "header" (the ASCII text piece) that describes the more
complicated stuff to follow. The standard things to put in the
header are:
- The protocol you're trying to speak. This helps you immediately recognize when the other side isn't speaking your language.
- The protocol version.
This allows backward and forward compatability--if the client sends
HTTP/1.0, the server knows it better only speak the older 1.0
dialect. If the client sends a too-new version like HTTP/2.0, the
server can issue a sensible error message, which is way better than
trying to muddle along and crashing.
- The purpose of the request, like the URL you're requesting.
- The length of the data to follow. See below for the advantages of sending a byte count first.
There are lots of nice things about sending data in a big binary chunk of a known size, like HTTP's Content-Length:
- The client can preallocate memory to receive the whole
chunk. With ASCII, you don't know how much data to expect, so you
have to either start with a small allocation and grow (ugly, although
std::string makes it pretty easy) or else assume some fixed maximum
buffer size (both ugly and error-prone).
- The client can issue a single receive call to grab the chunk off the network.
- The server can send *anything* in those data bytes--there aren't any special disallowed values like newlines, spaces, or control characters.
- The client doesn't have to waste time parsing the chunk, since there aren't any special values to watch out for.
- Both client and server are easier to write, and less likely to
have performance, correctness, and security bugs. Parsing input
data is very error-prone; moving data in big chunks less so.
The only thing you need to do before sending a block of binary data is
make sure both sides know how much you're sending. Good solutions
to this "how many bytes?" problem are:
- Send the byte count as ASCII. This is what HTTP does, in the
Content-Type field. One disadvantage is then you've got to (carefully,
slowly) parse the byte count before you can actually send data.
- Hardcode the byte count. Always send 8, or 32, or 117 bytes
if that's how many you always need. This is the easiest solution,
but when you need change the protocol to send more data, you may be in
trouble.
- Send the byte count in binary. A standard way to do this is
to send some standard binary integer size and representation, like a
32-bit big-endian integer (see Ugly section below). The
osl/socket.h "Big32" class is stored in memory like a 32-bit big-endian
integer, but uses C++ magic to act like a normal int otherwise.
Human-readable formats like ASCII text have some advantages during
debugging, since humans are way better at recognizing newlines than
counting binary bytes. But computers are pretty much the opposite!
Writing code to parse ASCII text is tough. Parsing it securely
and quickly, without writing too much code, is really tough.
People are (thankfully) beginning to use XML as their human-readable
format of choice, although then you need a not-yet-standardized XML
parsing library. It'll probably be years before XML is common
enough people rely on it for basic protocols.
The Bad, and the Ugly
There are a bunch of really ridiculous problems you have to work around
when exchanging binary data (in files or network packets) between two
different machines.
Different machines have different end-of-line characters in their text
files. UNIX machines use just "\n". DOS machines use
"\r\n". Mac OS 9 machines used just "\r". There are several
different programs to change one kind of newline to another.
Web browsers and FTP clients try to hide these differences by
converting on the fly (when transferring in "ascii mode" or "text
mode"), but this of course screws up non-text files that happen to have
a few newline characters (which must be transferred in "binary
mode"). Most network protocols are using the DOS-style \r\n
nowadays, but you really have to read the documentation (or sniff
packets!) to be sure.
Different machines have different sizes for "int" (some machines are
32-bit, some 64-bit; ancient MS-DOS machines had an "int" of 16
bits). This of course causes disaster if you take a bunch of
"ints" from one machine to another--the sizes just aren't the
same. Two 32-bit machines can still be unable to directly
transfer if one machine is "little-endian"
(like x86 machines) and the other is "big-endian" (like PowerPC macs
and pretty much all other UNIX boxes). Big and little endian
differences can be resolved with "byte swapping", but this doesn't help
if one machine is little-endian 64 -bit and the other big-endian
32-bit. The best solution (in my opinion) is to write a
little C++ class with a known in-memory representation.
osl/socket.h includes "Big32", a class that's stored in memory like a
big-endian 32-bit integer on every machine, so you can send and receive a "Big32" safely between any two machines.
To be specific,
int byte_count=compute_message_length();
skt_sendN(s,&byte_count,sizeof(byte_count));
and
int byte_count;
skt_recvN(s,&byte_count,sizeof(byte_count));
char *buf=new char[byte_count];
JUST WON'T WORK because it's possible on the sender,
sizeof(byte_count)==4 bytes (on a 32-bit machine); while on the
receiver sizeof(byte_count)==8 bytes (on a 64-bit machine).
Further, even if the sizes are the same, the endianness might be
different so the byte_count value would get screwed up.
Instead, it's much better to send and receive a Big32 object:
Big32 byte_count=compute_message_length();
skt_sendN(s,&byte_count,sizeof(byte_count));
and
Big32 byte_count;
skt_recvN(s,&byte_count,sizeof(byte_count));
char *buf=new char[byte_count];
and this WILL work on big and little endian machines, and machines with
different integer sizes. Note how we can treat a Big32 pretty
much like an "int", but unlike an int a Big32 is always stored in
memory the same way on every machine.
Different machines have different structure and alignment padding requirements. For example, on an x86,
struct deathSize {
float x; double z;
}
takes up 12 bytes--4 bytes for the float, and 8 bytes for the
double. But on any other machine, the struct takes up 16 bytes,
since the compiler has to insert 4 bytes of padding to make the
"double" land on an 8-byte boundary. This means it won't work to
send and receive structures WITH DIFFERENT SIZED ELEMENTS between
processors, because the structure size may be different due to
alignment padding. One solution is to never use structs.
The other solution is to use Big32 and Big16 exclusively, since they
have no alignment requirements (in memory they're just an array of
unsigned char).
Recommendations
My personal favorite way to design a network protocol header is to use
a bunch of Big32 network ints inside a struct. So I'd say
struct fooHeader {
Big32 protocol; /* protocol: always 0xF00BA7 */
Big32 version; /* 1 for latest version */
Big32 reqLen; /* bytes of request data */
Big32 optLen; /* bytes of optional data (after request data) */
};
Sending a foo header just means filling out each field, and sending off the whole struct:
fooHeader h;
h.protocol=0xF00BA7; h.version=1; h.reqLen=reqLen; h.optLen=optLen;
skt_sendN(s,&h,sizeof(h));
skt_sendN(s,req,h.reqLen);
skt_sendN(s,opt,h.optLen);
You'd then receive and check a foo header and the accompanying data like this:
fooHeader h;
skt_recvN(s,&h,sizeof(h));
if (h.protocol!=0xF00BA7) error_exit("Protocol mismatch! (network sent 0x%08x)\n",(int)h.protocol);
if (h.version!=1) error_exit("Version mismatch! (network sent 0x%08x)\n",(int)h.version);
unsigned int reqLen=h.reqLen; /* turn lengths into unsigned integers */
unsigned int optLen=h.optLen;
/* sanity check lengths before allocating memory */
if (reqLen>10000) error_exit("Request length absurd! (network sent 0x%08x)\n",reqLen);
if (optLen>10000) error_exit("Option length absurd! (network sent 0x%08x)\n",optLen);
byte *req=new byte[reqLen];
byte *opt=new byte[optLen];
skt_recvN(s,req,reqLen);
skt_recvN(s,opt,optLen);
Compared to parsing an ASCII header, this is a lot easier. It's
also much easier to prove to yourself there aren't any security holes
here, because there's so little data-dependent processing.
I've added this little example as "fooclient.cpp/fooserver.cpp" to the hw1 support directory. Nothing else has changed.