Fixed-Width and Binary File I/O

Dr. Lawlor, CS 202, CS, UAF

There are several frustrating things about writing ordinary text files.

First, you need to remember to put spaces between each integer.  That is, "cout<<10<<12<<13" writes "101213" into the file, which reads as one big integer, not three little ones!

Second, each value written into a text file can use a different number of bytes, which makes it almost impossible to use "seekg" to jump to a given value in a file.  For example, if I've got a million integers stored in a text file, the integers are all at different byte offsets, as follows ("_" stands for a space here):
Byte
Data
Length
0
7_
2
2
117_
4
6
42_
3
9
6_
2

And so on.  At least the code is easy to write:
fstream f; f.open("test.txt",ios::out);
const int n=4;
int arr[n]={7,117,42,6};
for (int i=0;i<n;i++) {
f<<arr[i]<<" "; // don't forget the space!
}
f.close();

cat("test.txt");

(Try this in NetRun now!)

But we really can't seek to a given integer, because everything is a different size.  This is a bummer--if you've got a billion ints stored in a file, it's silly to have to read them all just to find the last one!

Fixed-Width Data allows Seeking

You can get random access in a file if all your data is the same size.   For example, if every int is exactly 5 bytes in the file, then int "i" is stored at byte "5*i", so you can just "f.seekg(5*i);" and then read the integer.  This is an extremely powerful method to speed up random access in long files!

Byte
Data
Length
0
____7
5
5
__117
5
10
___42
5
15
____6
5

C++ includes the "setw" manipulator, that lets you set the output width.  Here's the code that writes this type of "fixed width" file:
fstream f; f.open("test.txt",ios::out);
const int n=4;
int arr[n]={7,117,42,6};
for (int i=0;i<n;i++) {
f<<setw(5)<<arr[i]; // no space needed, IF arr[i]<=9999
}
f.close();

cat("test.txt");

(Try this in NetRun now!)

You can then seek to a given location in the file, and read any integer:
fstream f; f.open("test.txt",ios::out);
const int n=4;
int arr[n]={7,117,42,6};
for (int i=0;i<n;i++) {
f<<setw(5)<<arr[i]; // no space needed, IF arr[i]<=9999
}
f.close();

f.open("test.txt",ios::in); // open file for reading
int i=2; // index of the integer to read
f.seekg(5*i); // seek to integer i
int val=-1; f>>val;
f.close();

return val;

(Try this in NetRun now!)

One downside of fixed-width plain-text files is that people want to go in and edit the files by hand, for example in notepad, which destroys the fixed-width property.  Plain text is also fairly space-inefficient, especially with fixed-width mode; for example, an ordinary integer can hold values in the billions, up to ten digits worth, so you need at least eleven characters per integer counting the space character between numbers, or twelve counting a minus sign!

Binary File I/O

Writing your fixed-width file in binary format has a few effects:
The syntax is pointer-based, and looks a little bit weird: "f.write((char *)&i,sizeof(i));".  Here are the parts of that statement:
Here's an example where we write a short binary file, and then read it back.
fstream f;

// Write binary data:
f.open("that_file.dat",ios::out|ios::binary);
int i=3;
f.write((char *)&i,sizeof(i));
f.close();

cat("that_file.dat");

// Read binary data:
f.open("that_file.dat",ios::in|ios::binary);
int v=0;
f.read((char *)&v,sizeof(v));
if (f) cout<<"I read the integer "<<v<<" from the file!\n";
f.close();

(Try this in NetRun now!)

Here's a more complex example where we write several integers into a binary file, and then use seek to read them back several times:
fstream f;
f.open("that_file.dat",ios::out|ios::binary);
const int n=4;
int arr[n]={7,117,42,6};
for (int i=0;i<n;i++) {
int val=arr[i];
f.write((char *)&val,sizeof(val));
}
f.close();

cat("that_file.dat");

f.open("that_file.dat",ios::in|ios::binary);
for (int pass=0;pass<2;pass++) // make several passes through the file
{
f.seekg(0); // back to start of the file
while (f) {
int i=-1;
f.read((char *)&i,sizeof(i));
if (f) cout<<"I read the integer "<<i<<" from the file!\n";
}
f.clear(); // reset error state of f after hitting EOF
}
f.close();

(Try this in NetRun now!)

(I feel guilty about leaving out the error checking in these examples.  We'll see a better way to do error checking called exception handling before spring break!)