Basic Network Programming with Sockets

CS 441/641 Lecture, Dr. Lawlor

One can imagine lots of programming interfaces for talking to the network, and there are in fact lots of totally different interfaces for talking via NetBIOS, AppleTalk, etc.  But suprisingly there's basically only one major programming interface used for talking on a TCP/IP network, and that's "Berkeley sockets", the original UNIX interface as implemented by the good folks at UC Berekeley.

The Berkeley sockets interface is implemented in:

Brian Hall, or "Beej", maintains the definitive readable introduction to Berkeley sockets programming, Beej's Guide to Network Programming.  He's got a zillion examples and a readable style.  Go there.

Sadly, bare Berkeley sockets are fairly tricky and ugly, especially for creating connections.  The problem is Berkeley sockets support all sorts of other protocols, addressing modes, and other features like "raw sockets" (used for packet sniffers).  But when I write network code, I find it a lot easier to use my own little library of public domain utility routines called "osl/socket.h".  I'll give examples here using my library. 

Writing UDP Code

UDP is the User Datagram Protocol.  Unlike its more complex cousin, the connection-oriented TCP, UDP is "connectionless".  This means you can just send data to an IP address at any time, which has some advantages and drawbacks:
Generally, UDP is more like a postcard (inherently one-way unreliable shot in the dark); while TCP is more like a telephone connection (two-way reliable connection).

My library uses a few funny datatypes to represent the communication:
Here I'm using my library to set up a UDP ("datagram") socket, and then I use that socket to send and receive data.  Since UDP is connectionless, there's no real problem sending data to myself.
#include "osl/socket.h"
#include "osl/socket.cpp"

int foo(void) {
skt_ip_t ip=skt_lookup_ip("127.0.0.1");
unsigned int sport=37331, dport=37331;
SOCKET s=skt_datagram(&sport,70000); /* set port number. 70000 is the kernel data buffer size. */
{ /* Send UDP packet to ip */
struct sockaddr_in sin=skt_build_addr(ip,dport);
const char *data="There are many like it, but this is my message.";
sendto(s,data,strlen(data),0,
(const struct sockaddr *)&sin,sizeof(sin));
}
printf("Sent off data. Receiving:\n");
{ /* Receive one UDP packet */
struct sockaddr_in sout; socklen_t soutL=sizeof(sout);
int len=1000;
char *data=new char[len];
int n=recvfrom(s,data,len,0,
(struct sockaddr *)&sout,&soutL);
printf("Incoming data of %d bytes from addr len %d:\n",
n,soutL);
if (n>0 && n<len) {
data[n]=0;
printf(" Data: '%s'\n",data);
}
}
return 0;
}

(Try this in NetRun now!)

Try this!

You can also separate the send and receive sides of this program, and send UDP packets from one machine to another machine.  If you try this among several machines, you'll quickly realize an annoying problem: network firewalls often filter UDP, especially incoming UDP packets.  Working around firewalls is a fact of life.

Writing TCP Code

TCP has several advantages over UDP:
To connect to a server "serverName" at TCP port 80, and send some data to it, you'd call:
Here's an example TCP client in NetRun:
#include "osl/socket.h" /* <- Dr. Lawlor's funky networking library */
#include "osl/socket.cpp"

int foo(void) {
skt_ip_t ip=skt_lookup_ip("127.0.0.1");
unsigned int port=80;
SOCKET s=skt_connect(ip,port,2);
skt_sendN(s,"hello",5);
skt_close(s);
return 0;
}
(executable NetRun link)

Easy, right?  The same program is a great deal longer in pure Berkeley sockets, since you've got to deal with error handling (and not all errors are fatal!), a long and complicated address setup process, etc.

This same code works in Windows, too.  On NetRun, "Download this file as a .tar archive" to get the socket.h and socket.cpp files.

To listen on a socket, you create a server socket and then accept connections from incoming clients.  This program accepts exactly one client, but you typically have a loop (or multiple threads) between accept and close, to keep accepting clients indefinitely.
#include "osl/socket.h"
#include "osl/socket.cpp" /* include body for easy linking */

int foo(void)
{
unsigned int port=8888;
SERVER_SOCKET serv=skt_server(&port);

std::cout<<"Waiting for connections on port "
<<port<<"\n";
skt_ip_t client_ip; unsigned int client_port;
SOCKET s=skt_accept(serv,&client_ip,&client_port);
std::cout<<"Connection from "
<<skt_print_ip(client_ip)
<<":"<<client_port<<"!\n";

/* Receive some data from the client */
std::string buf(3,'?');
skt_recvN(s,(char *)&buf[0],3);
std::cout<<"Client sent data '"<<buf<<"'\n";

/* Send some data back to the client */
skt_sendN(s,"gdaymate\n",9);

skt_close(s);
std::cout<<"Closed socket to client\n";

skt_close(serv);
return 0;
}
(executable NetRun link)

If you're on campus, if you're fast enough you can actually connect to this server from a web browser!

Example Protocol: HTTP

HTTP is the protocol used by web pages (that's why URLs usually start with "http://").  HTTP servers listen on port 80 by default, but you can actually use the :port syntax to connect to any port you like (for example, "https://lawlor.cs.uaf.edu:8888/some_url").

An HTTP client, like a web browser, starts by doing a DNS lookup on the server name.  That's the "resolving host name" message you see in your browser.  The browser then does a TCP connection to that port on the server ("Connecting to server").

Once connected, the HTTP client usually sends a "GET" request.  Here's the simplest possible GET request:
    "GET / HTTP/1.0\r\n\r\n"

Note the DOS newlines, and the extra newline at the end of the request.  You can list a bunch of optional data in your  GET request, like the languages you're willing to accept ("Accept-Language: en-us\r\n") and so on.  HTTP 1.1 (not 1.0) requires a Host to be listed in the request ("Host: www.foobar.com\r\n"), which is used by virtual hosts.

The HTTP server then sends back some sort of reply.  Officially, this is supposed to be a "HTTP/1.1 200 OK\r\n" followed by another set of line-oriented ASCII optional data, such as the Content-Length in bytes ("Content-Length: 187\r\n").  But many browsers will print out plain ASCII text if you just return that.

Here's an example of a real HTTP exchange between Firefox and Apache:
Firefox connects to server.  Apache accepts the connection.
Firefox, the client, sends this ASCII data, with DOS newlines:
GET /my_name_is_url.html HTTP/1.1
Host: lawlor.cs.uaf.edu:8888
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.9) Gecko/20070126 Ubuntu/dapper-security Firefox/1.5.0.9
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: __utmz=62224958.1163103248.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utma=62224958.570638686.1163103248.1163107343.1164832326.3
<- blank line at end of HTTP request headers

Apache, the server, sends this data back:
HTTP/1.1 200 OK
Date: Fri, 06 Apr 2007 20:20:50 GMT
Server: Apache/2.0.55 (Ubuntu)
Accept-Ranges: bytes
Content-Length: 9443
Connection: close
Content-Type: text/html; charset=UTF-8
<- blank line at end of HTTP response headers
<html><head><title>UAF Department of ... rest of web page, total of 9443 bytes after blank line
This is a pretty simple, ASCII-based protocol.  The only binary data is the contents of the web resource, transmitted by the server after the blank line.  The "Content-Length:" field tells the client how many bytes to expect.

A typical very simple custom web client might look like this:
#include "osl/socket.h"
#include "osl/socket.cpp"

int foo(void) {
skt_ip_t ip=skt_lookup_ip("137.229.25.247"); // lawlor.cs.uaf.edu
SOCKET s=skt_connect(ip,80,2);

/* Send off HTTP request to server (with URL) */
const char *req=
"GET / HTTP/1.1\r\n" // the "/" is the URL I'm requesting
"Host: lawlor.cs.uaf.edu\r\n" // hostname is required for HTTP 1.1
"User-Agent: Raw socket example code (lawlor@alaska.edu)\r\n" // web browser ID string
"\r\n"; // blank line == end of HTTP request
skt_sendN(s,req,strlen(req));

/* Receive HTTP response headers, up to the newline */
std::string response;
int length=0;
while ((response=skt_recv_line(s))!="")
{
std::cout<<response<<"\n";
if (response.substr(0,15)=="Content-Length:")
length=atoi(response.substr(16).c_str());
}

/* Receive HTTP response data, and print it */
std::cout<<"-- bottom line: "<<length<<" bytes of data\n";
if (length>0 && length<10000) { // sanity check
std::vector<char> page(length); // place to store data
skt_recvN(s,&page[0],length); // grab data from server
for (int i=0;i<length;i++) std::cout<<page[i]; // print to screen
}

skt_close(s);
return 0;
}
A typical similarly simplified web server might look like this.  Note it just keeps being a webserver forever:
#include "osl/socket.h"
#include "osl/socket.cpp"

int foo(void) {
/* Make a TCP socket listening on port 8888 */
unsigned int port=8888;
SERVER_SOCKET serv=skt_server(&port);

/* Keep servicing clients */
while (1) {
std::cout<<"Waiting for connections on port "<<port<<"\n";
skt_ip_t client_ip; unsigned int client_port;
SOCKET s=skt_accept(serv,&client_ip,&client_port);
std::cout<<"Connection from "<<skt_print_ip(client_ip)<<":"<<client_port<<"!\n";

// Grab HTTP request line, typically GET /url HTTP/1.1
std::string req=skt_recv_line(s);
// Grab rest of HTTP header info (mostly useless)
std::string hdr;
while ((hdr=skt_recv_line(s))!="") std::cout<<"Client header: "<<hdr<<"\n";

// Prepare HTTP response header
std::string page="<html><body>IT WORKED!</body></html>";
char response[1024];
sprintf(response, // needed to get the page length into the string
"HTTP/1.1 200 OK\r\n"
"Server: Random example code (lawlor@alaska.edu)\r\n"
"Content-Length: %d\r\n"
"\r\n" // blank line: end of header
,(int)page.length());
skt_sendN(s,&response[0],strlen(response));
skt_sendN(s,&page[0],page.length());

skt_close(s);
}

return 0;
}

(Try this in NetRun now!)

Note that using these for real production servers sounds like a terrible idea, but they do work for simple testing.