Network Protocols

CS 441/641 Lecture Notes, Dr. Lawlor

A network protocol specifies how programs send data across the network.

Below, we'll look in detail at HTTP and binary data exchange.

Example Protocol: HTTP

HTTP is the protocol used by web pages (that's why URLs usually start with "http://").  HTTP servers listen on port 80 by default, but you can actually use the :port syntax to connect to any port you like (for example, "https://lawlor.cs.uaf.edu:8888/some_url").

An HTTP client, like a web browser, starts by doing a DNS lookup on the server name.  That's the "resolving host name" message you see in your browser.  The browser then does a TCP connection to that port on the server ("Connecting to server").

Once connected, the HTTP client usually sends a "GET" request.  Here's the simplest possible GET request:
    "GET / HTTP/1.0\r\n\r\n"

Note the DOS newlines, and the extra newline at the end of the request.  You can list a bunch of optional data in your  GET request, like the languages you're willing to accept ("Accept-Language: en-us\r\n") and so on.  HTTP 1.1 (not 1.0) requires a Host to be listed in the request ("Host: www.foobar.com\r\n"), which is used by virtual hosts.

The HTTP server then sends back some sort of reply.  Officially, this is supposed to be a "HTTP/1.1 200 OK\r\n" followed by another set of line-oriented ASCII optional data, such as the Content-Length in bytes ("Content-Length: 187\r\n").  Because the receiver knows how many bytes are coming, after the ASCII header you can efficiently exchange binary data without worrying about newlines or malicious comments that mimic HTTP headers.  This is an important point:

Do not exchange payload data using ASCII: you'll never get it right. Send a byte count and switch to binary for the arbitrary payload.

(The new HTTP/2 uses binary communication for the headers too.)

Here's an example of a real HTTP exchange between Firefox and Apache:

	GET /my_name_is_url.html HTTP/1.1
Host: lawlor.cs.uaf.edu:8888
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.9) Gecko/20070126 Ubuntu/dapper-security Firefox/1.5.0.9
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: __utmz=62224958.1163103248.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utma=62224958.570638686.1163103248.1163107343.1164832326.3
<- blank line at end of HTTP request headers

	HTTP/1.1 200 OK
Date: Fri, 06 Apr 2007 20:20:50 GMT
Server: Apache/2.0.55 (Ubuntu)
Accept-Ranges: bytes
Content-Length: 9443
Connection: close
Content-Type: text/html; charset=UTF-8
<- blank line at end of HTTP response headers
<html><head><title>UAF Department of ... rest of web page, total of 9443 bytes after blank line

This is a pretty simple, ASCII-based protocol.  The only binary data is the contents of the web resource, transmitted by the server after the blank line.  The "Content-Length:" field tells the client how many bytes to expect.

A typical very simple custom web client might look like this:

#include "osl/socket.h"
#include "osl/socket.cpp"

int foo(void) {
skt_ip_t ip=skt_lookup_ip("137.229.25.247"); // lawlor.cs.uaf.edu
SOCKET s=skt_connect(ip,80,2);

/* Send off HTTP request to server (with URL) */
const char *req=
"GET / HTTP/1.1\r\n" // the "/" is the URL I'm requesting
"Host: lawlor.cs.uaf.edu\r\n" // hostname is required for HTTP 1.1
"User-Agent: Raw socket example code (lawlor@alaska.edu)\r\n" // web browser ID string
"\r\n"; // blank line == end of HTTP request
skt_sendN(s,req,strlen(req));

/* Receive HTTP response headers, up to the newline */
std::string response;
int length=0;
while ((response=skt_recv_line(s))!="")
{
std::cout<<response<<"\n";
if (response.substr(0,15)=="Content-Length:")
length=atoi(response.substr(16).c_str());
}

/* Receive HTTP response data, and print it */
std::cout<<"-- bottom line: "<<length<<" bytes of data\n";
if (length>0 && length<10000) { // sanity check
std::vector<char> page(length); // place to store data
skt_recvN(s,&page[0],length); // grab data from server
for (int i=0;i<length;i++) std::cout<<page[i]; // print to screen
}

skt_close(s);
return 0;
}
(Try this in NetRun now!)

A typical similarly simplified web server might look like this.  Note it just keeps being a webserver forever:

#include "osl/socket.h"
#include "osl/socket.cpp"

int foo(void) {
/* Make a TCP socket listening on port 8888 */
unsigned int port=8888;
SERVER_SOCKET serv=skt_server(&port);

/* Keep servicing clients */
while (1) {
std::cout<<"Waiting for connections on port "<<port<<"\n";
skt_ip_t client_ip; unsigned int client_port;
SOCKET s=skt_accept(serv,&client_ip,&client_port);
std::cout<<"Connection from "<<skt_print_ip(client_ip)<<":"<<client_port<<"!\n";

// Grab HTTP request line, typically GET /url HTTP/1.1
std::string req=skt_recv_line(s);
// Grab rest of HTTP header info (mostly useless)
std::string hdr;
while ((hdr=skt_recv_line(s))!="") std::cout<<"Client header: "<<hdr<<"\n";

// Prepare HTTP response header
std::string page="<html><body>IT WORKED!</body></html>";
char response[1024];
sprintf(response, // needed to get the page length into the string
"HTTP/1.1 200 OK\r\n"
"Server: Random example code (lawlor@alaska.edu)\r\n"
"Content-Length: %d\r\n"
"\r\n" // blank line: end of header
,(int)page.length());
skt_sendN(s,&response[0],strlen(response));
skt_sendN(s,&page[0],page.length());

skt_close(s);
}

return 0;
}

(Try this in NetRun now!)

Here are the same programs, written with raw sockets instead of my wrappers (see Beej's Guide to Socket Programming):

/* 
Trivial example web client, using raw network sockets.

References:
    "Beej's Guide to Network Programming"
    http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html

    "Creating a Basic Winsock Application", by Microsoft
    https://msdn.microsoft.com/en-us/library/windows/desktop/ms737629(v=vs.85).aspx

Dr. Orion Lawlor, lawlor@alaska.edu, 2016-03-24 (public domain)
*/
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <fstream>
#include <iomanip>
#include <vector>
#include <map>
#include <string>


// Socket headers:
#if defined(_WIN32) && ! defined(__CYGWIN__)
/* For windows systems:*/

//#  include <winsock.h> /* for SOCKET and others */
//#  pragma comment (lib, "wsock32.lib")  /* link with winsock library */

#  include <winsock2.h> /* new style, for getaddrinfo */
#  include <WS2tcpip.h> /* new style, for getaddrinfo */
#  pragma comment(lib, "Ws2_32.lib")
inline void close_socket_portable(int sock) { closesocket(sock); }

class initialize_winsock {
public:
	initialize_winsock() {
		WSADATA wsaData;
		WSAStartup(MAKEWORD(2,2), &wsaData);
	}
};
initialize_winsock initialize_winsock_now;

#else 
/* Non-windows (UNIX-y) systems: */
#  include <sys/types.h>
#  include <sys/socket.h>
#  include <netdb.h>
#  include <unistd.h>
inline void close_socket_portable(int sock) { close(sock); }
#endif

// Check if an error (negative return value) came back from this code.
#define SKT_CHECK(code) { int status=code; if (status<0) { perror(#code); std::cout<<"Error "<<status<<" with code "<<#code<<" on line "<<__LINE__<<" of file "<<__FILE__<<"\n"; exit(1); } }

#include <sstream> // for ostringstream
#include <string.h> // for memset

// Read a newline-terminated string from this socket.
std::string recv_text_line(int socket) {
	std::string req="";
	char last_char=0;
	while (1) { 
		char c=0;
		SKT_CHECK(recv(socket,&c,1,0));
	
		if (c=='\n' || c=='\r') { // newline char
			if (last_char=='\r' && c=='\n') // full newline
				break;
		}
		else req+=c;
		last_char=c;
	}
	return req;
}


int main() {
	struct addrinfo hints;
	struct addrinfo *res;  // will point to the results

	memset(&hints, 0, sizeof hints); // make sure the struct is empty
	hints.ai_family = AF_UNSPEC;     // don't care IPv4 or IPv6
	hints.ai_socktype = SOCK_STREAM; // TCP stream sockets
	hints.ai_flags = AI_PASSIVE;     // fill in my IP for me
	
	std::string host="137.229.25.247"; // "lawlor.cs.uaf.edu";
	std::string protocol="80"; // http";
	
	SKT_CHECK(getaddrinfo(host.c_str(),protocol.c_str(), &hints,&res));

	int s=socket(res->ai_family, res->ai_socktype, res->ai_protocol);
	
	SKT_CHECK(connect(s,res->ai_addr,res->ai_addrlen)); // client
	
	freeaddrinfo(res);
	

/* Send off HTTP request to server (with URL) */
	const char *req=
		"GET / HTTP/1.1\r\n"  // the "/" is the URL I'm requesting
		"Host: lawlor.cs.uaf.edu\r\n" // hostname is required for HTTP 1.1
		"User-Agent: Raw socket example code (lawlor@alaska.edu)\r\n" // web browser ID string
		"\r\n"; // blank line == end of HTTP request
	SKT_CHECK(send(s,req,strlen(req),0));

/* Receive HTTP response headers, up to the newline */
	std::string response;
	int length=0;
	while (1) // (response=skt_recv_line(s))!="")
	{
		std::string response=recv_text_line(s);
		if (response=="") break;
		
		std::cout<<response<<"\n";
		if (response.substr(0,15)=="Content-Length:")
			length=atoi(response.substr(16).c_str());
	}

/* Receive HTTP response data, and print it */
	std::cout<<"-- bottom line: "<<length<<" bytes of data\n";
	if (length>0 && length<10000) { // sanity check
		std::vector<char> page(length); // place to store data
		SKT_CHECK(recv(s,&page[0],length,0));  // grab data from server
		for (int i=0;i<length;i++) std::cout<<page[i]; // print to screen
	}

	close_socket_portable(s);
	return 0;
}

(Try this in NetRun now!)

And the web server.  A web server just waits forever, but NetRun will kill it after 2 seconds.  If you're quick, you can load http://137.229.25.211:8080 during those 2 seconds, and grab the page (only works from on campus, off campus requests are firewalled).

/* 
Trivial example web server, using raw network sockets.

References:
    "Beej's Guide to Network Programming"
    http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html

    "Creating a Basic Winsock Application", by Microsoft
    https://msdn.microsoft.com/en-us/library/windows/desktop/ms737629(v=vs.85).aspx

Dr. Orion Lawlor, lawlor@alaska.edu, 2016-03-24 (public domain)
*/
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <fstream>
#include <iomanip>
#include <vector>
#include <map>
#include <string>


// Socket headers:
#if defined(_WIN32) && ! defined(__CYGWIN__)
/* For windows systems:*/

//#  include <winsock.h> /* for SOCKET and others */
//#  pragma comment (lib, "wsock32.lib")  /* link with winsock library */

#  include <winsock2.h> /* new style, for getaddrinfo */
#  include <WS2tcpip.h> /* new style, for getaddrinfo */
#  pragma comment(lib, "Ws2_32.lib")
inline void close_socket_portable(int sock) { closesocket(sock); }

class initialize_winsock {
public:
	initialize_winsock() {
		WSADATA wsaData;
		WSAStartup(MAKEWORD(2,2), &wsaData);
	}
};
initialize_winsock initialize_winsock_now;

#else 
/* Non-windows (UNIX-y) systems: */
#  include <sys/types.h>
#  include <sys/socket.h>
#  include <netdb.h>
#  include <unistd.h>
inline void close_socket_portable(int sock) { close(sock); }
#endif

// Check if an error (negative return value) came back from this code.
#define SKT_CHECK(code) { int status=code; if (status<0) { perror(#code); std::cout<<"Error with code "<<#code<<" on line "<<__LINE__<<" of file "<<__FILE__<<"\n"; exit(1); } }

#include <sstream> // for ostringstream
#include <string.h> // for memset

// Read a newline-terminated string from this socket.
std::string recv_text_line(int socket) {
	std::string req="";
	char last_char=0;
	while (1) { 
		char c=0;
		SKT_CHECK(recv(socket,&c,1,0));
	
		if (c=='\n' || c=='\r') { // newline char
			if (last_char=='\r' && c=='\n') // full newline
				break;
		}
		else req+=c;
		last_char=c;
	}
	return req;
}



// Convert type to string
template <class T>
std::string to_string_portable(T t) {
	std::ostringstream s;
	s<<t;
	return s.str();
}


int main() {
	// Set up our network address:
	struct addrinfo hints;
	struct addrinfo *res;  // will point to the results

	memset(&hints, 0, sizeof hints); // make sure the struct is empty
	hints.ai_family = AF_UNSPEC;     // don't care IPv4 or IPv6
	hints.ai_socktype = SOCK_STREAM; // TCP stream sockets
	hints.ai_flags = AI_PASSIVE;     // fill in my IP for me
	
	// TCP port number to listen on:
	int server_port=8080;
	SKT_CHECK(getaddrinfo(NULL,to_string_portable(server_port).c_str(), &hints,&res));

	// Make a server socket:
	int serv=socket(res->ai_family, res->ai_socktype, res->ai_protocol);

	// Allow the server socket to be reopened within 2 minutes:
	int yes=1;
	SKT_CHECK(setsockopt(serv, SOL_SOCKET, SO_REUSEADDR, (const char *)&yes, sizeof(int)));
	
	// Give the server socket an address:
	SKT_CHECK(bind(serv,res->ai_addr,res->ai_addrlen)); // server
	freeaddrinfo(res);

	// Listen for client connections
	listen(serv, 10);
	
	std::cout<<"Listening at http://localhost:"<<server_port<<"\n";
	
	while (true) { // keep being a server forever

		std::cout<<"Waiting to accept() next client\n";   
		struct sockaddr_storage their_addr;
	    	socklen_t addr_size=sizeof(their_addr);
		int s=accept(serv,(struct sockaddr *)&their_addr,&addr_size);
		std::cout<<"Accepted client\n";	
	// Ideally you'd create a separate thread for this client here...

	/* Receive HTTP request headers, up to the newline */
		std::string req_URL="?";
		while (1) // (response=skt_recv_line(s))!="")
		{
			// Read a newline-terminated string:
			std::string req=recv_text_line(s);
			if (req=="") break;
		
			std::cout<<"	Header: "<<req<<"\n";
			if (req.substr(0,3)=="GET") {
				req_URL=req.substr(4);
				std::cout<<"GOT URL!  "<<req_URL<<"\n";
			}
		}

	/* Send the client back a web page. */
		std::string page_contents="This is web.  Sorta.\n\n";
		page_contents+="Your request URL was "+req_URL+"\n";
		std::string page_length=to_string_portable(page_contents.size());

		std::string response=
			"HTTP/1.1 200 OK\r\n"  // status code, e.g., 404 not found
			"Content-Length: "+page_length+"\r\n" // bytes in message
			"Content-Type: text/plain\r\n" // MIME type
			"\r\n" // blank line == end of HTTP request
			+page_contents; 

		SKT_CHECK(send(s,&response[0],response.size(),0));

		close_socket_portable(s);

	}

	return 0;
}

(Try this in NetRun now!)

You can also download the web client or web server as .cpp files.



Note that using these tiny learning examples for real production servers sounds like a terrible idea (use a lightweight C++ library like mongoose, or a real server like Apache, nginx, or node.js), but they do work for simple testing, or as examples of how to use sockets to get work done.

Endian-ness in Binary Data Exchange

One recurring issue in exchanging binary data is there are no standards: x86 and ARM machines store data to memory starting with the little end, so 0x4321 gets stored as 0x21, then 0x43.  Other CPUs like PowerPC (by default, it's switchable) store data starting with the big end, which was once the standard, and so is called "network byte order".  I've programmed on machines where "int" is 2 bytes, most set it to 4 bytes for now, but there are hints of an eventual transition to 8 bytes.  This means you might send a binary "int", and get any of three sizes in either of two orders!  The protocol fix is to just specify, such as "big-endian 32 bit unsigned integer", and then everybody will know what to use.

How do you send a 32-bit big endian integer from a little endian machine?  There are several ways such as htonl() to handle byte order, and other ways such as <stdint.h> uint32_t to handle sizes, but my favorite fix is to solve both at once by specifying the bytes manually, via a special class.

class Big32 { //Big-endian (network byte order) 32-bit integer
        typedef unsigned char byte;
byte d[4]; public: Big32() {} Big32(unsigned int i) { set(i); } operator unsigned int () const { return (d[0]<<24)|(d[1]<<16)|(d[2]<<8)|d[3]; } unsigned int operator=(unsigned int i) {set(i);return i;} void set(unsigned int i) { d[0]=(byte)(i>>24); d[1]=(byte)(i>>16); d[2]=(byte)(i>>8); d[3]=(byte)i; } };

The cool part about this class is it's always the correct size, and it never needs alignment padding, so you can stick together a long struct to represent the network message order and it will match byte for byte.  My osl/socket.h header includes this class (and Big16) by default.

Objects in Binary Data Exchange

Consider two processes exchanging more complex binary data over a network socket.  To make it easy to run, I'll start both processes myself, using fork.

#include <sys/wait.h> /* for wait() */
#include "osl/socket.h"
#include "osl/socket.cpp"

/* Run child process's code. Socket connects to parent. */
void run_child(SOCKET s) {
cout<<"Child alive! Sending data to parent."<<std::endl;

std::string str="Cybertron";
skt_sendN(s,&str,sizeof(str));
}

/* Run parent process's code. Socket connects to child */
void run_parent(SOCKET s) {
cout<<"Parent alive! Getting data from child"<<std::endl;

std::string str="";
skt_recvN(s,&str,sizeof(str));

cout<<"Parent done. got="<<str<<std::endl;
}


int foo(void) {
unsigned int port=12345; cout.flush();
int newpid=fork();
if (newpid!=0) { /* I'm the parent */
SERVER_SOCKET serv=skt_server(&port);
SOCKET s=skt_accept(serv,0,0); /* connection from child */
run_parent(s);
skt_close(s);
int status=0;
wait(&status); /* wait for child to finish */
} else { /* I'm the child */
SOCKET s=skt_connect(skt_lookup_ip("127.0.0.1"),port,1); /* connect to parent */
usleep(1000); /* slow down child, to avoid corrupted cout! */
run_child(s);
skt_close(s);
exit(0); /* close out child process when done */
}
return 0;
}

(Try this in NetRun now!)

Drat!  This crashes!  Yet it works fine if we send and receive integers, floats, or simple flat classes.  The problem with sending a std::string (or std::vector, or map, etc) is this basic fact:

You can't send pointers over the network.
 

The problem is my pointer is a reference to a place in my memory.  If we've each got our own separate memory, then you dereferencing my pointer is not going to work--the best you could hope for is a crash.

And inside a std::string is a pointer to the data.  On my machine, this pointer is "basic_string::_M_dataplus._M_p", in the nearly unreadable /usr/include/c++/4.4.3/bits/basic_string.h.    Inside std::vector?  Also a pointer.  

This is really annoying, because real applications use complicated variable-sized data structures like std::vector<std::string> all over the place, and you'd like to just send them, not break them up into little sendable pointer-free pieces.

For example, here's one correct way to send a string: first send the length, then send the data. 

// Send side:
std::string str="Cybertron";
int length=str.length();
skt_sendN(s,&length,sizeof(length)); // OK because length is an integer
skt_sendN(s,&str[0],length); // OK because now we're sending the string *data*

// Receive side:
std::string str="";
int length=0;
skt_recvN(s,&length,sizeof(length)); // OK because length is an integer
str.resize(length);
skt_recvN(s,&str[0],length); // OK because we reallocated the string

(Try this in NetRun now!)