Directories

CS 321 2007 Lecture, Dr. Lawlor

So a directory (a folder) is just a list of files and other directories.   This list is stored as a set of bytes, usually with one fixed-size structure per file plus a variable-length name list.  So a directory is just a bunch of bytes, and you can store those bytes (a directory's list of files) inside another file!  That is, a directory is just a file that's marked "this file's bytes represent other files and directories".  Curious, no?

So every time you do anything to a directory, the OS is changing this list of files--just moving bytes around!

Reading the List of Files in a Directory

Reading the files in a directory is just exactly like reading a file, although the names have been changed to protect you from the details of the filesystem.

In UNIX systems: Linux, Mac OS X, etc.

Start with opendir, which takes a directory name and returns a "DIR *".

List each file with readdir, which takes a "DIR *" and returns a "struct dirent *", which has a "d_name" field telling you the name of the file.

Finish up with closedir, which frees the "DIR *".
#include <dirent.h> /* UNIX directory-list header */
#include <time.h> /* for "timespec", used in bits/stat.h (& whined about by icpc) */
#include <sys/stat.h> /* to tell if an item is a file or directory */

void unix_list(const char *dirName)
{
DIR *d=opendir(dirName);
if (d==0) return;
struct dirent *de;
while (NULL!=(de=readdir(d))) {
const char *name=de->d_name;
hit_file(dirName,name);
}
closedir(d);
}

In Windows

Start with FindFirstFile, which takes a directory plus filename pattern, and returns a HANDLE and a WIN32_FIND_DATA.  The WIN32_FIND_DATA struct contains the name of the first matching file in "cFileName", and the file's attributes (permissions) in "dwFileAttributes".

A call to FindNextFile will find the next matching file.

Call FindClose when done.
#include <windows.h>

void win_list(const char *dirName)
{
char dirNamePat[1024];
sprintf(dirNamePat,"%s\\*",dirName); /* dirName, with trailing slash-star */
WIN32_FIND_DATA f;
HANDLE h=FindFirstFile(dirNamePat,&f);
if (h==INVALID_HANDLE_VALUE) return;

do {
const char *name=f.cFileName;
if (strcmp(name,".")==0 || strcmp(name,"..")==0)
continue; /* Bogus self links */
// printf("---dirName: %s, file: %s\n",dirNamePat,name);
if (f.dwFileAttributes&FILE_ATTRIBUTE_DIRECTORY)
hit_directory(dirName,name);
else
hit_file(dirName,name);
} while (FindNextFile(h,&f));
FindClose(h);
}

Performance Impact of Unsorted-List Directories

Deep down, the OS stores the list of files as a literal list--and the list isn't even sorted.

So try a little program like this:
#include <fstream>
#include <sstream>

int foo(void)
{
for (int thou=0;thou<10;thou++) {
double start=time_in_seconds();
for (int i=0;i<100;i++) {
std::ostringstream ns;
ns<<"file"<<thou<<"thou"<<i<<".dat";
std::string name=ns.str();
std::ofstream of(name.c_str());
of<<"Ugnh.";
}
double end=time_in_seconds();
std::cout<<" Created "<<thou+1<<"th hundred files: "<<end-start<<" sec\n";
}
return 0;
}
(executable NetRun link)

Because to do anything with a file, the OS has to search through the huge list of existing files, every additional file slows down the directory access yet further!
  Created 1th hundred files: 0.00534678 sec
Created 2th hundred files: 0.00684118 sec
Created 3th hundred files: 0.00835299 sec
Created 4th hundred files: 0.010195 sec
Created 5th hundred files: 0.011657 sec
Created 6th hundred files: 0.0133369 sec
Created 7th hundred files: 0.014894 sec
Created 8th hundred files: 0.0167079 sec
Created 9th hundred files: 0.0177431 sec
Created 10th hundred files: 0.040544 sec
Creating/opening/deleting a file in a directory that contains just a thousand other files is *ten times* slower than creating a file in a directory with just a few dozen files.  This is ridiculous, and we shouldn't accept it, but every OS I know of does this!

One workaround is to create subdirectories, and store a fraction of the total set of files in each subdirectory.  For example, instead of having a million files named like "foo123456.txt", make a thousand subdirectories with a thousand files each, like "foo123/456.txt".  A *lot* of real programs end up doing this to work around this old, common OS bug!

Creating New Directories

You make a new directory with the MS-DOS or UNIX shell command "mkdir".  On UNIX systems, the C/C++ function to call to make a new directory is ... "mkdir".  On Windows, it's "CreateDirectory".  In both cases, you can specify the permissions you want.  I usually write a common interface to Windows and Linux like this:
#ifdef WIN32
#include <windows.h>
namespace osl {
inline bool mkdir(const char *pathname) {
return 0==CreateDirectory(pathname,0);
}
};
#else /* UNIX-like system */
#include <sys/stat.h>
namespace osl {
inline bool mkdir(const char *pathname) {
return 0==::mkdir(pathname,0777);
}
};
#endif
This lets me do osl::mkdir("foo"); on any system!