CS 321 Spring 2013  >  Lecture Notes for Friday, March 22, 2013

CS 321 Spring 2013
Lecture Notes for Friday, March 22, 2013

Introduction to Files

When we deal with files we read and write sequences of bytes. We do this using an interface that is optimized for use with a mass-storage device that has some or all of the following properties.

The file interface is complicated; it is nonetheless heavily used. As we have seen, virtual files abound. These may implement only a portion of the file interface. For example, a pipe allows the read, write, and close system calls, but does not make sense to do a seek operation on a pipe, or to get its size.

When the full interface is implemented, a file is part of a file system. This is a specialized key-value store. A key is a pathname (filename plus directory information); the associated value is the file contents, and perhaps its metadata.

The metadata for a file is everything stored other than the pathname, the location of the contents on the device, and the contents itself. Metadata might include the following.

Broad categories of files include directories, character special files, block special files, named pipes, and regular files.

Regular files are either text or binary. Often the contents of a file are divided into records, which in turn are divided into fields. For a text file, we might have one record per line with fields separated by blanks or commas. A file may be random-access, in which case records need to be fixed-length so that we can seek quickly to the location of a particular record.

The type of a file might be indicated in several ways. One way is to use the filename extension, which typically comes at the end of the filename, separated from the rest by a dot (“.”). For example, “.jpg” is one of the common extensions indicating a JPEG image file.

Another way to indicate the type is by using a magic number: a short sequence of bytes (perhaps 2 or 4) at the beginning of a file indicating the type of data in the file. For example, JPEG image files that follow the original format begin with the (hexadecimal) bytes ff d8. Other magic numbers are readable text. “%!” indicates a PostScript document, while, in the *ix world, “#!” indicates an executable script.

Some OSs have standard ways to store file type, often in a file’s metadata. For example, the original Macintosh OS indicated file types using two 4-character codes in the file metadata. One code indicated the program that should be used to open the file, while the other specified the type of data in the file.

CS 321 Spring 2013: Lecture Notes for Friday, March 22, 2013 / Updated: 6 May 2013 / Glenn G. Chappell / ggchappell@alaska.edu