Network Programming 1
CS 493/693 Lecture,
Dr. Lawlor, 2006/01/23
Background
A network is just a way of getting information from one machine to
another. This is a simple idea, which means that everybody in the
world has tried to implement it from scratch. Read this and the
first 20 pages of "Network Intrusion Detection" for the basics.
You always start with a way to get bytes from one machine to the
other. For example, you can use the serial port, parallel port,
or a network card to send and receive bytes. Bytes actually
physically sent between machines are said to be "on the wire", even if
they're sent over a fiber optic cable or microwave radio link!
Just sending bytes back and forth, however, is almost never enough. You immediately find you need:
- Error checking, because almost no method of shipping bytes is fault-free.
- Error correction, like asking the other side to resend that piece, to use when an error occurs.
- Flow control, to keep a fast sender from swamping a slow
receiver. In a big network, you need congestion flow control,
where the sender and receiver can handle the traffic, but some piece in
between them can't. In a shared-bus network like ethernet, you
need collision control to keep several computers from using the same
wires to try to say two different things at once.
- Multiplexing, or the ability to use the same stream of bytes to handle several different ongoing communication streams.
There are quite a few different ways to handle these issues. The
standard way to do this is to wrap all data in little "packets". A
packet consists of a header, some data, and possibly a trailer.
The "header" indicates who the message is for, which piece of the
message it is, and other housekeeping. The trailer usually
includes a checksum for error detection.
The International Standards Organization (ISO) defined a very
complicated layered model for networking called the Open Systems
Interconnect (OSI) model. Almost nobody implements the thing, but
the conceptual model is pretty popular. The layers of the ISO OSI model are:
- Physical layer: how do you represent bits on the wire?
- Link layer: how do you decide who gets to put their bits on the wire?
- Network layer: routing and addressing--how do bits get where they need to go?
- Transport layer: correct bit errors and provide end-to-end reliable communication
- Session layer: manage connections between programs (handshaking)
- Presentation layer: compress, encrypt, and multiplex connections.
- Application layer: get stuff done for the user.
People have built lots and lots of different networking interfaces. Totally unique networking interfaces I've used include:
- Ethernet, the now-standard physical protocol. OSI network layer and below.
- PPP, the Point-to Point Protocol still spoken today by modems. OSI transport layer.
- NetBIOS/NetBEUI, the dying-out IBM PC network protocol. OSI session and transport layers.
- Appletalk, the almost extinct native Mac network protocol. OSI session and transport layers.
- Token Ring, the almost extinct cousin of ethernet. Used at IBM. OSI network layer.
Today, "the network" means TCP/IP, the standard protocol spoken on the
internet. TCP/IP is really at least three different protocols:
- IP, the Internet Protocol, is the lowest level protocol--close to the OSI network layer. An IP packet consists of 5 big-endian 32-bit integers.
- ARP, the Address Resolution Protocol, is a way to find out the
network-hardware addresses (Media Access Control, or MAC addresses) of
an IP address you want to talk to. ARP uses broadcasts "Hey,
anybody know who's using 10.0.0.2?", which makes it fundamentally
insecure.
- ICMP, the Internet Congestion and Messaging Protocol, is used for flow control and routing.
- UDP, the User Datagram Protocol, is an unreliable connectionless
(or "datagram") protocol built on IP. Datagram communication is
nice, because you don't have to tediously set up a connection before
you send a few bytes. But UDP is unreliable--if a UDP message is
lost on the network, it's up to the application to resend. Hence
it's almost never a good idea to use UDP--use TCP instead.
- DNS, the Domain Name System, is built on UDP. The
overhead of setting up TCP connections would make DNS even more of a
bottleneck than it already is.
- TCP, the Transmission Control Protocol, is a reliable connection
oriented protocol also built on IP. TCP is what the web's built
on--all HTTP accesses go over TCP. "Reliable" means TCP will do
retransmission in case of errors or packet loss. "Connection
oriented" means you have to set up a connection between two machines
before they can actually exchange information.
We'll look at the various TCP/IP protocols in detail, but for the next week we'll focus on TCP.