Network Hardware and Software

CS 641 Lecture, Dr. Lawlor

I claim message passing programming is a very handy way to write parallel software. With message passing, "parallel weirdness" only happens at message sends and receives. Shared memory, by contrast, encounters parallel weirdness during memory accesses, which can happen anywhere. The only downside is that with message passing, you do need to call functions to pass messages, rather than just using memory normally.

Background

A network is just a way of getting information from one machine to another. This is a simple idea, which means that everybody in the world has tried to implement it from scratch--there are way too many networks out there, although thankfully the weirder ones are dying off.

You always start with a way to get bytes from one machine to the other. For example, you can use the serial port, parallel port, or a network card to send and receive bytes. Bytes actually physically sent between machines are said to be "on the wire", even if they're sent over a fiber optic cable or microwave radio link!

Just sending bytes back and forth, however, is almost never enough. You immediately find you need:

Error checking, because almost no method of shipping bytes is fault-free.
Error correction, like asking the other side to resend that piece, to use when an error occurs.
Flow control, to keep a fast sender from swamping a slow receiver. In a big network, you need congestion flow control, where the sender and receiver can handle the traffic, but some piece in between them can't. In a shared-bus network like ethernet, you need collision control to keep several computers from using the same wires to try to say two different things at once.
Multiplexing, or the ability to use the same stream of bytes to handle several different ongoing communication streams.

There are quite a few different ways to handle these issues. The standard way to do this is to wrap all data in little "packets". A packet consists of a header, some data, and possibly a trailer. The "header" indicates who the message is for, which piece of the message it is, and other housekeeping. The trailer usually includes a checksum for error detection.

The International Standards Organization (ISO) defined a very complicated layered model for networking called the Open Systems Interconnect (OSI) model. Almost nobody implements the thing, but the conceptual model is pretty popular. The layers of the ISO OSI model are:

Physical layer: how do you represent bits on the wire?
Link layer: how do you decide who gets to put their bits on the wire?
Network layer: routing and addressing--how do bits get where they need to go?
Transport layer: correct bit errors and provide end-to-end reliable communication
Session layer: manage connections between programs (handshaking)
Presentation layer: compress, encrypt, and multiplex connections.
Application layer: get stuff done for the user.

People have built lots and lots of different networking interfaces. Totally unique networking interfaces I've used include:

Ethernet, the now-standard physical protocol. OSI network layer and below.
PPP, the Point-to Point Protocol still spoken today by modems. OSI transport layer.
NetBIOS/NetBEUI, the dying-out IBM PC network protocol. OSI session and transport layers.
Appletalk, the almost extinct native Mac network protocol. OSI session and transport layers.
Token Ring, the almost extinct cousin of ethernet. Formerly popular at IBM. OSI network layer.

Today, "the network" means TCP/IP, the standard protocol spoken on the internet. TCP/IP is really at least three different protocols:

IP, the Internet Protocol, is the lowest level protocol--close to the OSI network layer. IP version 4 identifies machines with a 4-byte "IP address", often written in "dotted decimal", where you print the value of each byte in decimal separated by periods, like "127.0.0.1" (the IP address of your own machine). An IPv4 packet consists of 5 big-endian 32-bit integers. An IPv6 packet consists of 10 big-endian 32-bit integers.

ARP, the Address Resolution Protocol, is a way to find out the network-hardware addresses (Media Access Control, or MAC addresses) of an IP address you want to talk to. ARP uses broadcasts "Hey, anybody know who's using 10.0.0.2?", which makes it fundamentally insecure.
ICMP, the Internet Congestion and Messaging Protocol, is used for flow control and routing.

UDP, the User Datagram Protocol, is an unreliable connectionless (or "datagram") protocol built on IP. Datagram communication is nice, because you don't have to tediously set up a connection before you send a few bytes. But UDP is unreliable--if a UDP message is lost on the network, it's up to the application to resend. Hence it's almost never a good idea to use UDP for nontrivial interactions--use TCP instead.

DNS, the Domain Name System, is built on UDP. The overhead of setting up TCP connections would make DNS even more of a bottleneck than it already is.

TCP, the Transmission Control Protocol, is a reliable connection oriented protocol also built on IP. TCP is what the web's built on--all HTTP accesses go over TCP. "Reliable" means the kernel's TCP implementation will do retransmissions *itself* in case of errors or packet loss, which is what makes it easy to use and hence popular. "Connection oriented" means you have to set up a connection between two machines before they can actually exchange information.

Both TCP and UDP allow many different pieces of software to run on a single machine at once. This means an IP address alone isn't enough to specify who you're talking to--the IP address identifies the machine, and the "TCP port number" identifies the program running on that machine. TCP port numbers are 16-bit unsigned integers, so there are 65,536 possible port numbers. Zero is not a valid port number, and the low-numbered ports (below 1024) are often reserved for "well-known services", which usually require special privileges to open.

We'll focus on TCP, since it's by far the most popular protocol for doing anything on the internet. For example, the following all use TCP:

Web servers, which listen on TCP port 80.
Email servers, which use TCP port 25 (SMTP).
IRC servers, which use TCP port 194.
Bittorrent, which uses TCP ports 6881-6889.

Network Sockets

Just about the only important network interface today is TCP/IP. suprisingly there's basically only one major programming interface used for talking on a TCP/IP network, and that's "Berkeley sockets", the original UNIX interface as implemented by the good folks at UC Berekeley.

The Berkeley sockets interface is implemented in:

All flavors of UNIX, including Linux, Mac OS X, Solaris, all BSD flavors, etc.
Windows 95 and higher, as "winsock".

Brian Hall, or "Beej", maintains the definitive readable introduction to Berkeley sockets programming, Beej's Guide to Network Programming. He's got a zillion examples and a readable style. Go there.

Bare Berkeley sockets are pretty tricky and ugly, especially for creating connections. The problem is Berkeley sockets support all sorts of other protocols, addressing modes, and other features like "raw sockets" (that have serious security implications!). But when I write TCP code, I find it a lot easier to use my own little library of public domain utility routines called socket.h and socket.cpp. It's way too nasty to write portable Berkeley code for basic TCP, so I'll give examples using my library.

My library uses a few funny datatypes:

SOCKET: datatype for a "socket": one end of a network connection between two machines. This is actually just an int.
skt_ip_t: datatype for an IP address. It's just 4 bytes.

To connect to a server "serverName" at TCP port 80, and send some data to it, you'd call:

skt_ip_t ip=skt_lookup_ip(serverName); to look up the server's IP address. In general, you can pass a DNS name, but NetRun only supports dotted-decimal IPs.
SOCKET s=skt_connect(ip,80,2); to connect to that server. "80" is the TCP port number. "2" is the timeout in seconds.
skt_sendN(s,"hello",5); to send the 5-byte string "hello" to the other side. You can now repeatedly send and receive data with the other side.
skt_close(s); to close the socket afterwards.

Here's an example in NetRun:

#include "osl/socket.h" /* <- Dr. Lawlor's funky networking library */
#include "osl/socket.cpp"

int foo(void) {
	skt_ip_t ip=skt_lookup_ip("127.0.0.1");
	unsigned int port=80;
	SOCKET s=skt_connect(ip,port,2);
	skt_sendN(s,"hello",5);
	skt_close(s);
	return 0;
}

(executable NetRun link)

Easy, right? The same program is a great deal longer in pure Berkeley sockets, since you've got to deal with error handling (and not all errors are fatal!), a long and complicated address setup process, etc. This same code works in Windows, too.

On NetRun, you can also "Download this file as a .tar archive" to get the socket.h and socket.cpp files.

Network Server

A network server waits for connections from clients. The calls you make are:

unsigned int port=8888; /* listen on this TCP/IP port (or use 0 to have the OS pick a port) */
SERVER_SOCKET srv=skt_server(&port); /* lay claim to that port number */
SERVER s=skt_accept(srv,0,0); /* wait until a client connects to our port */
skt_sendN and skt_recvN data to and from the client.
skt_close(s); /* stop talking to that client */
skt_close(srv); /* give up our claim on server port */

Again, between accept and close you can send and receive data any way you like. Your sends make data arrive at client receive calls, and your receives grab data from the client's sends. It's easy to screw up a network server by trying to receive data that isn't going to arrive!

You usually repeat steps 3-5 again and again to handle all the clients that try to connect. Many servers are designed as an infinite loop--they keep handling client requests until the machine is turned off. One thread can even have accepted connections from several different clients, and be sending and receiving data from them at the same time.

High-performance servers, like the Apache web server, often will call fork() either before step 3 (called "preforking", where several processes wait in accept) or before step 4 (one process accepts, then splits off a child process to handle each client).

Only root can open server ports numbered less than 1024 on most UNIX systems. Two programs can't listen on the same server port--the second program will get a socket error when he tries skt_server.

Here's an example network server that serves exactly one client and then exits.

#include "osl/socket.h"
#include "osl/socket.cpp" /* include body for easy linking */

int foo(void)
{
	unsigned int port=8888;
	SERVER_SOCKET serv=skt_server(&port);
	
	std::cout<<"Waiting for connections on port "
		<<port<<"\n";
	skt_ip_t client_ip; unsigned int client_port;
	SOCKET s=skt_accept(serv,&client_ip,&client_port);
	std::cout<<"Connection from "
		<<skt_print_ip(client_ip)
		<<":"<<client_port<<"!\n";
	
	/* Receive some data from the client */
	std::string buf(3,'?');
	skt_recvN(s,(char *)&buf[0],3);
	std::cout<<"Client sent data '"<<buf<<"'\n";
	
	/* Send some data back to the client */
	skt_sendN(s,"gdaymate\n",9);
	
	skt_close(s);
	std::cout<<"Closed socket to client\n";
	
	skt_close(serv);
	return 0;
}

(executable NetRun link)

In NetRun, the server will just hang while waiting for connections by default, so you'll have to run this on your own machine to connect.

Here's the corresponding client. Note the receives in the server have to be sent by the client, and vice versa.

#include "osl/socket.h"
#include "osl/socket.cpp" /* include body for easy linking */

int foo(void)
{
	skt_ip_t ip=skt_lookup_ip("127.0.0.1");
	unsigned int port=8888;
	SOCKET s=skt_connect(ip,port,2);
	
	/* Send some data to the server */
	skt_sendN(s,"dUd",3);
	
	/* Receive some data from the client */
	std::string buf(8,'?');
	skt_recvN(s,(char *)&buf[0],8);
	std::cout<<"Server sent data '"<<buf<<"'\n";
	
	
	skt_close(s);
	std::cout<<"Closed socket to server\n";
	return 0;
}

It's easier to write network clients, and it's more common. Network servers are more dangerous--anybody could connect to your server, and send anything, so servers are usually trickier to get right.