CS 321 Spring 2013  >  Lecture Notes for Friday, January 25, 2013

CS 321 Spring 2013
Lecture Notes for Friday, January 25, 2013

System Calls (cont’d) [1.6]

Creating a Process

Unix-Derived OSs

The standard way to create a new process in *ix OSs is the sys_fork system call. This clones the current process, resulting in two processes executing the same code, and whose variables, for the moment, have the same values. We tell the difference between the two processes by the return value of fork: zero for the child, and non-zero for the parent. In the latter case, a negative return value indicates a failed fork, while a positive return value is the PID of the child.

See syscall3.cpp (NetRun link) for C++ code that does multiple fork calls.

See syscall4.cpp (NetRun link) for C++ code that forks and then does extensive output in both parent and child processes. Note: Once again, these will probably work on most Unix-derived OSs; they may not work under Windows.

When we make a new process, we usually want that process to run a different executable (file that can be executed). In *ix OSs, we switch executables using the sys_execve system call. This takes a rather complicated set of parameters, which includes the path of the new executable, command-line parameters, and values of environment variables. It only returns if it fails.

There are a number of C wrapper functions for sys_execve, one of which is execl. This function takes a variable number of parameters; each parameter is execl is a (const char *). Each of these, except the last, should point to a null-terminated string. The first parameter of execl is the path of the executable. The others are command-line parameters, in order, beginning with the command used to execute the file. The last parameter should be a null pointer: (char *)(0). For example, to do the equivalent of the command-line “ls -l”, we would do

execl("/bin/ls", "ls", "-l", (char *)(0));

Since the executable for the ls command is found in the /bin directory. (In many *ix shells, you can get this information by typing “which ls”.)

Since the old executable is discarded, sys_execve does not return. But we often want to treat an executable as if it were a subroutine: we execute it, and then when it is done, we return to what we were doing before. (This is what the shell does, for example.) We do this by having the parent wait for the child process to exit. This is done using the sys_waitpid system call, whose primary wrapper function is waitpid. This takes three parameters. This first is the PID of the process to wait for. Other parameters allow for various options, but we can mostly ignore them. To wait for process p to terminate, do the following.

int dummyStatus;
waitpid(p, &dummyStatus, 0);

Now we put all this togther. We want to run a new executable, and when it finishes, resume what we were doing. We do a fork. The child does an exec of some sort (execl or other wrapper function for sys_execve). If the exec returns, then the child flags an error somehow, and exits. The parent does a waitpid for the child, then goes on to whatever it wants to do next. All this together is called “fork-exec-wait”.

Note that a child process might terminate before its parent. But the parent might still want to do a waitpid on the child. Therefore, the data structures holding information about the child process are not destroyed, and the process number is not released, so that there is still something to do a waitpid on. The remnant of the child process is called a zombie.

See syscall5.cpp for C++ code that does a fork-exec-wait. You can run this under NetRun if you want (NetRun link); however the exec will fail, as NetRun does not permit the sys_execve system call.


Process creation under Windows is rather more sane, but less versatile, than in the *ix OSs. Windows has a single system call, CreateProcess, that does the equivalent of a fork followed by an execve. That is, it creates a new process, which executes a specified executable with specified command-line parameters.

The CreateProcess call has ten parameters, most of which can be ignored. It might be called like this.

STARTUPINFO si={0};              // Required variables

    0,                           // Program name
    "new_executable.exe a b c",  // Command line
    0,0,false,0,0,0,             // Optional stuff
    &si,&pi);                    // startup info, process info

In practice, the second parameter is the one we really care about; it tells what the process is supposed to do.

Low-Level Details


When we make a system call, we are asking the OS to do something. Logically, we are calling a subroutine (a.k.a. function). However, our code is usually asking the OS to do something that it is not allowed to do. At least some portion of the implementation of a system call, will need to execute in a processor mode that allows access to the entire system. This is called supervisor mode, or kernel mode. In contrast, application code generally executes in a lower-privilege mode called user mode. So, a system call involves code that executes in low-privilege user mode, calling code that executes in high-privilege kernel mode. How can we do this without violating system security? This is the problem of privilege escalation.

The usual solution involves interrupts. When an interrupt occurs, the processor stops what it is doing, saves its registers, and executes interrupt handler code. When the handler finishes, the registers can be restored, and execution of the original code can resume where it left off.

The first kind of interrupt was a hardware interrupt. Usually at least one of the pins on a processor is dedicated to hardware interrupts; when the voltage on this line is changed to a certain level, the processor is interrupted.

Hardware interrupts were invented as a convenient way of dealing with I/O devices, and they are still used for that. Typically we want the processor to be able to do other things while waiting for input from a device (think about a mouse, which needs to be dealt with when it moves). One way to deal with this is polling: periodically checking whether input is available. But this requires code to periodically take breaks to deal with I/O devices. A simpler method is to use an interrupt. When input is available the devices signals a hardware interrupt, the handler code is executed, and the input is processed. Other code can mostly or entirely ignore the device.

Later came the software interrupt. A processor handles this essentially the same as a hardware interrupt, but it is signalled differently: by the execution of an interrupt instruction. For example, in x86 assembly language, the int instruction produces a software interrupt.

Software interrupts have many uses. They might be used to test hardware-interrupt handling code. They are a way to create breakpoints in code being debugged. And they are exactly what is needed to solve the problem of privilege escalation in the implementation of system calls.

On most modern systems, a system call is initiated with a software interrupt. This causes the processor to stop what it is doing, save its registers, switch to kernel mode, and begin execution of an interrupt handler. The interrupt handler can be stored in a portion of memory that user-mode code cannot change. Thus, the code can be trusted; it can be executed at any time without violating system security. The interrupt handler will then determine what system call should be executed, perform the call, and then return from the interrupt, restoring registers and (if appropriate) user mode, and resuming the code that made the system call.

System Calls on Various Systems

The x86 and x86_64 architectures use the int instruction to do a software interrupt. This instruction is therefore used to initiate a system call. There are also newer instructions, sysenter and syscall, which are optimized for making system calls, and thus run faster.

The int instruction has a one-byte parameter that differs for different OSs. MS-DOS used int 0x21 for system calls. Early Windows systems used int 0x2e. Since Windows XP, sysenter has been used. System calls made by the BIOS (Basic Input/Output System—the code that executes when a standard PC first starts up) use either int 0x10 or int 0x16. And the Linux kernel uses int 0x80.

Older systems sometimes used slightly different mechanisms. For example, on the original Macintosh computer, system calls were performed by executing unimplemented instructions. On the 68000 processor used by the original Macintosh, instructions beginning with a hexadecimal A were not legal, and caused an interrupt. The early Macintosh operating system made use of these interrupts to implement system calls.

When a Linux system call is made (again int 0x80), the selector code tells which system call to make. This is stored in the rax register (on x64_64; for 32-bit x86 change r** to e**, here and later). Parameters—as many as are needed—are stored in registers rbx, rcx, rdx, rsi, and rdi. If the system call returns, then the return value is in rax.

Suppose we want to do a write. Looking at documentation for Linux system calls, we see that the selector code for sys_write is 4. This call has 3 parameters: file descriptor, pointer to character buffer, and number of characters to write. It returns the number of characters actually written. (Note that this is just like the thin-wrapper C function write.)

See hello.asm (NetRun link) for a “Hello, World!” program in x86_64 assembly code (NASM syntax); this program does a sys_write call.

CS 321 Spring 2013: Lecture Notes for Friday, January 25, 2013 / Updated: 27 Jan 2013 / Glenn G. Chappell / ggchappell@alaska.edu