ASCII vs. Binary I/O

CS 321 Lecture, Dr. Lawlor, 2006/03/24

Check out these binary I/O examples (Directory, Zip, Tar-gzip).

Name	Code	Approx. Speed
Screen I/O	std::cout<<x<<"\n";	a few KB/sec
Text File I/O	std::ofstream s; s<<x<<"\n";	15 MB/sec
Big-endian I/O	std::ofstream s; unsigned char c[4]; c[0]=x>>24; ...; s.write(c,4);	40 MB/sec
Native Binary I/O	std::ofstream s; s.write((char *)&x,sizeof(x));	200 MB/sec

Screen output is apallingly slow because a terminal program has to worry about fonts, spacing, the cursor, escape codes, and control codes for each byte of output. It's a little faster if you use '\r' instead of '\n', which avoids scrolling.

Text (also known as ASCII) output is easy, and it's easy for humans to read, but you do have to worry about adding separator characters (space, tab, newline) or the output will be impossible to read! Text output is slow because each character has to be processed separately.

Native binary output is where you're writing a file that contains the same bytes on disk that your program has in memory. This is easy to do, and amazingly fast, but it doesn't work across different machines. For example, a big-endian machine (like a PowerPC Mac) can't directly read native binary files written on a little-endian machine (like an x86 PC), or vice versa. A 32-bit machine can't directly read 64-bit files, or vice versa. One fix for this is to always write native binary files (because they're convenient for you to write), but check the format and do any conversion at read time. The conversion needed (for example, from big-endian 32-bit to little-endian 64-bit) might be really nasty, though, and it's tricky to describe the file format in a portable way.

The other fix for the endianness problem is to always write a standard format, like 32-bit big-endian integers. This isn't too hard to do, especially if you write a little library to do it, although it does cost a little speed.

One huge advantage of binary-style files is that every integer is the same size--you don't have to store and look for spaces or tabs between integers, they're just right next to each other. This allows us to jump to anywhere in a file by computing the byte offset. For example, to jump to integer i in a file full of 32-bit intgers, we just "seek" to byte 4*i. std::ofstream can do this with the "seekg" routine.