File I/O: User Level

CS 321 Lecture, Dr. Lawlor, 2006/03/01

All a file really contains is a big array of bytes, exactly like a little image of memory.

But bytes aren't useful to people alone, so people interpret bytes as meaning something.

ASCII: Bytes as Letters

The ASCII (American Standard Code for Information Interchange) table is a way of interpreting bytes as letters.

ASCII 0 1 2 3 4 5 6 7 8 9 A B C D E F
        tab, \t
1            esc
2 space
! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ \ ] ^ _
6 ` a b c d e f g h i j k l m n o
7 p q r s t u v w x y z { | } ~ 
ƒ ˆ Š Œ
˜ š œ
ž Ÿ
A   ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­
® ¯
B ° ± ² ³ ´ µ · ¸ ¹ º » ¼ ½ ¾ ¿
D Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
E à á â ã ä å æ ç è é ê ë ì í î ï
F ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
ASCII table.  Horizontal axis gives the low hex digit, vertical axis the high hex digit, and the entry is ASCII for that hex digit.  E.g., "A" is 0x41.

The Stupidity of Newlines

In the ASCII table above, there are two bytes that could legitimately be treated as indicating "this line of text is over.  start a new one.":
Sadly, every major computer system nowdays treats newlines differently.

In DOS/Windows, '\r' only moves the cursor horizontally to the start of the line, and '\n' only moves the cursor vertically down.  This means to really start a new line of text in a DOS/Windows file, you need to use "\r\n", like this:
me> cat newline_win.txt
me> od -t c newline_win.txt
0000000 H e l l o \r \n T h e r e ! \r \n
In UNIX, '\n' starts a new line.  '\r' can be used to overwrite the previous line if you really want to.
me> od -t c newline_unix.txt 
0000000   H   e   l   l   o  \n   T   h   e   r   e   !  \n
On Macs, '\r' starts a new line.
me> od -t c newline_mac.txt 
0000000 H e l l o \r T h e r e ! \r
So the same text file written on three different machines will contain three different sets of bytes at the end of a line!

Luckily, more and more programs are accepting *anything* as indicating the end of a line:
Be careful, though!  Sometimes a compiler will choke on a particular line that looks fine in the editor because there's a stray foreign newline character there.  The editor might hide the foreign newline, but still write it out when the file is saved.  A binary file dump tool like "od" (on UNIX, like above) can be useful.

Newline Conversion

If you open a file in "text" mode, the Windows file output routines will silently insert "\r\n" in the file whenever your program says "\n".  The input routines will also silently translate "\r\n" into just "\n" when reading from text files.  No such conversion happens on any other machine, or if the file is opened in "binary mode".  In UNIX, there's no difference between opening a file in "binary" and "text" mode.

If you FTP a file in "text" mode from one machine to another, the FTP program will silently replace your newlines with whatever the other side needs.  Of course, if you're transferring a binary file (like a program, zip archive, etc.) that just happens to contain 0x0D 0x0A, this same replacement will corrupt the file, so you've got to be careful to transfer in "binary" mode.

I/O Interfaces:

Open Binary File
Read Binary Data
Write Binary Data
Close File
C Standard I/O
#include <stdio.h>
FILE *f=fopen("foo","rb");
int n=fread(buf,4,1,f);
int n=fwrite(buf,4,1,f);
#include <fstream>
std::ifstream s("foo",std::ifstream::bin);,4);
/* automatic */
#include <fcntl.h>
int fd=open("demo",O_RDONLY);
int n=read(fd,buf,4);
int n=read(fd,buf,4); close(fd);
Windows I/O
#include <windows.h>
HANDLE h=CreateFile(.....);
int n=ReadFile(h,buf,...);
int n=WriteFile(h,buf,...);

See examples of all four I/O methods: (Directory, Zip, Tar-gzip)