CS 321 Spring 2013 > Lecture Notes for Friday, January 25, 2013 |
The standard way to create a new process in *ix OSs
is the sys_fork
system call.
This clones the current process,
resulting in two processes
executing the same code,
and whose variables, for the moment, have the same values.
We tell the difference between the two processes
by the return value of fork
:
zero for the child, and non-zero for the parent.
In the latter case, a negative return value indicates a failed fork,
while a positive return value is the PID of the child.
See
syscall3.cpp
(NetRun link)
for C++ code that does multiple fork
calls.
See
syscall4.cpp
(NetRun link)
for C++ code that forks
and then does extensive output in both parent and child processes.
Note: Once again, these will probably work on most Unix-derived OSs;
they may not work under Windows.
When we make a new process, we usually want that process
to run a different executable
(file that can be executed).
In *ix OSs,
we switch executables using the sys_execve
system call.
This takes a rather complicated set of parameters,
which includes the path of the new executable,
command-line parameters,
and values of environment variables.
It only returns if it fails.
There are a number of C wrapper functions for sys_execve
,
one of which is execl
.
This function takes a variable number of parameters;
each parameter is execl
is a (const char *)
.
Each of these, except the last, should point to a null-terminated string.
The first parameter of execl
is the path of the executable.
The others are command-line parameters, in order, beginning with the
command used to execute the file.
The last parameter should be a null pointer: (char *)(0)
.
For example, to do the equivalent of the command-line
“ls -l
”,
we would do
execl("/bin/ls", "ls", "-l", (char *)(0));
Since the executable for the ls
command is found in the /bin
directory.
(In many *ix shells, you can get this information by typing
“which ls
”.)
Since the old executable is discarded, sys_execve
does not return.
But we often want to treat an executable as if it were a subroutine:
we execute it, and then when it is done, we return to what
we were doing before.
(This is what the shell does, for example.)
We do this by having the parent wait for the child process to exit.
This is done using the sys_waitpid
system call,
whose primary wrapper function is waitpid
.
This takes three parameters.
This first is the PID of the process to wait for.
Other parameters allow for various options,
but we can mostly ignore them.
To wait for process p
to terminate,
do the following.
int dummyStatus; waitpid(p, &dummyStatus, 0);
Now we put all this togther.
We want to run a new executable,
and when it finishes, resume what we were doing.
We do a fork
.
The child does an exec of some sort (execl
or other wrapper function for sys_execve
).
If the exec returns, then the child flags an error somehow,
and exits.
The parent does a waitpid
for the child,
then goes on to whatever it wants to do next.
All this together is called “fork-exec-wait”.
Note that a child process might terminate before its parent. But the parent might still want to do a waitpid on the child. Therefore, the data structures holding information about the child process are not destroyed, and the process number is not released, so that there is still something to do a waitpid on. The remnant of the child process is called a zombie.
See
syscall5.cpp
for C++ code that does a fork-exec-wait.
You can run this under NetRun if you want
(NetRun link);
however the exec will fail, as NetRun does not permit the
sys_execve
system call.
Process creation under Windows is rather more sane,
but less versatile, than in the *ix OSs.
Windows has a single system call, CreateProcess
,
that does the equivalent of a fork
followed by an execve
.
That is, it creates a new process,
which executes a specified executable
with specified command-line parameters.
The CreateProcess
call has ten parameters,
most of which can be ignored.
It might be called like this.
STARTUPINFO si={0}; // Required variables PROCESS_INFORMATION pi; CreateProcess( 0, // Program name "new_executable.exe a b c", // Command line 0,0,false,0,0,0, // Optional stuff &si,&pi); // startup info, process info
In practice, the second parameter is the one we really care about; it tells what the process is supposed to do.
When we make a system call, we are asking the OS to do something. Logically, we are calling a subroutine (a.k.a. function). However, our code is usually asking the OS to do something that it is not allowed to do. At least some portion of the implementation of a system call, will need to execute in a processor mode that allows access to the entire system. This is called supervisor mode, or kernel mode. In contrast, application code generally executes in a lower-privilege mode called user mode. So, a system call involves code that executes in low-privilege user mode, calling code that executes in high-privilege kernel mode. How can we do this without violating system security? This is the problem of privilege escalation.
The usual solution involves interrupts. When an interrupt occurs, the processor stops what it is doing, saves its registers, and executes interrupt handler code. When the handler finishes, the registers can be restored, and execution of the original code can resume where it left off.
The first kind of interrupt was a hardware interrupt. Usually at least one of the pins on a processor is dedicated to hardware interrupts; when the voltage on this line is changed to a certain level, the processor is interrupted.
Hardware interrupts were invented as a convenient way of dealing with I/O devices, and they are still used for that. Typically we want the processor to be able to do other things while waiting for input from a device (think about a mouse, which needs to be dealt with when it moves). One way to deal with this is polling: periodically checking whether input is available. But this requires code to periodically take breaks to deal with I/O devices. A simpler method is to use an interrupt. When input is available the devices signals a hardware interrupt, the handler code is executed, and the input is processed. Other code can mostly or entirely ignore the device.
Later came the software interrupt.
A processor handles this essentially the same as a hardware interrupt,
but it is signalled differently:
by the execution of an interrupt instruction.
For example, in x86 assembly language,
the int
instruction produces a software interrupt.
Software interrupts have many uses. They might be used to test hardware-interrupt handling code. They are a way to create breakpoints in code being debugged. And they are exactly what is needed to solve the problem of privilege escalation in the implementation of system calls.
On most modern systems, a system call is initiated with a software interrupt. This causes the processor to stop what it is doing, save its registers, switch to kernel mode, and begin execution of an interrupt handler. The interrupt handler can be stored in a portion of memory that user-mode code cannot change. Thus, the code can be trusted; it can be executed at any time without violating system security. The interrupt handler will then determine what system call should be executed, perform the call, and then return from the interrupt, restoring registers and (if appropriate) user mode, and resuming the code that made the system call.
The x86 and x86_64 architectures
use the int
instruction to do a software interrupt.
This instruction is therefore used to initiate a system call.
There are also newer instructions,
sysenter
and syscall
,
which are optimized for making system calls,
and thus run faster.
The int
instruction has a one-byte parameter
that differs for different OSs.
MS-DOS used int 0x21
for system calls.
Early Windows systems used int 0x2e
.
Since Windows XP, sysenter
has been used.
System calls made by the BIOS
(Basic Input/Output System—the
code that executes when a standard PC first starts up)
use either int 0x10
or int 0x16
.
And the Linux kernel uses int 0x80
.
Older systems sometimes used slightly different mechanisms.
For example, on the original Macintosh computer,
system calls were performed by executing unimplemented instructions.
On the 68000 processor used by the original Macintosh,
instructions beginning with a hexadecimal A
were not legal,
and caused an interrupt.
The early Macintosh operating system made use of these interrupts
to implement system calls.
When a Linux system call is made (again int 0x80
),
the selector code tells which system call to make.
This is stored in the rax
register
(on x64_64;
for 32-bit x86 change r**
to e**
,
here and later).
Parameters—as many as are needed—are stored in
registers
rbx
,
rcx
,
rdx
,
rsi
, and
rdi
.
If the system call returns, then
the return value is in rax
.
Suppose we want to do a write
.
Looking at
documentation for Linux system calls,
we see that the selector code for sys_write
is 4
.
This call has 3 parameters:
file descriptor, pointer to character buffer,
and number of characters to write.
It returns the number of characters actually written.
(Note that this is just like the thin-wrapper C function
write
.)
See
hello.asm
(NetRun link)
for a “Hello, World!” program
in x86_64 assembly code (NASM syntax);
this program does a sys_write
call.
ggchappell@alaska.edu