# File I/O: User Level

CS 321 Lecture, Dr. Lawlor, 2006/03/01

All a file really contains is a big array of bytes, exactly like a little image of memory.

But bytes aren't useful to people alone, so people interpret bytes as meaning something.

## ASCII: Bytes as Letters

The ASCII (American Standard Code for Information Interchange) table is a way of interpreting bytes as letters.

 ASCII 0 1 2 3 4 5 6 7 8 9 A B C D E F 0         tab, \t \n \r   1            esc 2 space ! " # \$ % & ' ( ) * + , - . / 3 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 4 @ A B C D E F G H I J K L M N O 5 P Q R S T U V W X Y Z [ \ ] ^ _ 6 ` a b c d e f g h i j k l m n o 7 p q r s t u v w x y z { | } ~  8 € � ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ � Ž � 9 � ‘ ’ “ ” • – — ˜ ™ š › œ � ž Ÿ A ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯ B ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ C À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï D Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß E à á â ã ä å æ ç è é ê ë ì í î ï F ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
ASCII table.  Horizontal axis gives the low hex digit, vertical axis the high hex digit, and the entry is ASCII for that hex digit.  E.g., "A" is 0x41.

## The Stupidity of Newlines

In the ASCII table above, there are two bytes that could legitimately be treated as indicating "this line of text is over.  start a new one.":
• 0x0A, '\n', Line Feed.  This is the standard C way to indicate a new line is starting.
• 0x0D, '\r', Carriage Return.  This actually moves the cursor back to the start of the line.
Sadly, every major computer system nowdays treats newlines differently.

In DOS/Windows, '\r' only moves the cursor horizontally to the start of the line, and '\n' only moves the cursor vertically down.  This means to really start a new line of text in a DOS/Windows file, you need to use "\r\n", like this:
`me> cat newline_win.txtHelloThere!me> od -t c newline_win.txt 0000000   H   e   l   l   o  \r  \n   T   h   e   r   e   !  \r  \n`
In UNIX, '\n' starts a new line.  '\r' can be used to overwrite the previous line if you really want to.
`me> od -t c newline_unix.txt 0000000   H   e   l   l   o  \n   T   h   e   r   e   !  \n`
On Macs, '\r' starts a new line.
`me> od -t c newline_mac.txt 0000000   H   e   l   l   o  \r   T   h   e   r   e   !  \r`
So the same text file written on three different machines will contain three different sets of bytes at the end of a line!

Luckily, more and more programs are accepting *anything* as indicating the end of a line:
• Web browsers (Internet Explorer, Mozilla, Firefox) work with files containing any newline setup, or even a mix of newline conventions.
• Wordpad and the Visual C++ IDE work properly with any newline.
Be careful, though!  Sometimes a compiler will choke on a particular line that looks fine in the editor because there's a stray foreign newline character there.  The editor might hide the foreign newline, but still write it out when the file is saved.  A binary file dump tool like "od" (on UNIX, like above) can be useful.

## Newline Conversion

If you open a file in "text" mode, the Windows file output routines will silently insert "\r\n" in the file whenever your program says "\n".  The input routines will also silently translate "\r\n" into just "\n" when reading from text files.  No such conversion happens on any other machine, or if the file is opened in "binary mode".  In UNIX, there's no difference between opening a file in "binary" and "text" mode.

If you FTP a file in "text" mode from one machine to another, the FTP program will silently replace your newlines with whatever the other side needs.  Of course, if you're transferring a binary file (like a program, zip archive, etc.) that just happens to contain 0x0D 0x0A, this same replacement will corrupt the file, so you've got to be careful to transfer in "binary" mode.

## I/O Interfaces:

 Name Header Open Binary File Read Binary Data Write Binary Data Close File C Standard I/O #include FILE *f=fopen("foo","rb"); int n=fread(buf,4,1,f); int n=fwrite(buf,4,1,f); fclose(f); C++ STL I/O #include std::ifstream s("foo",std::ifstream::bin); s.read(buf,4); s.write(buf,4); /* automatic */ UNIX I/O #include int fd=open("demo",O_RDONLY); int n=read(fd,buf,4); int n=read(fd,buf,4); close(fd); Windows I/O #include HANDLE h=CreateFile(.....); int n=ReadFile(h,buf,...); int n=WriteFile(h,buf,...); CloseHandle(h);

See examples of all four I/O methods: (Directory, Zip, Tar-gzip)