Syscalls
CS 301 Lecture, Dr. Lawlor
Interrupts as a cry for help
The OS normally doesn't care what you're working on. It doesn't
hover
over your shoulder, watching your reads and writes to registers or
memory. It's got better things to do. Instead, if you want
the OS's attention, you've got to break something--do something the CPU
will generate an error for! The standard unignorable operation is
to throw an
"software interrupt", which on x86 CPUs you can do with the special
"INT" (generate
software interrupt) instruction.
Hardware interrupts are used by the hardware to get the CPU's
attention--for example, the IDE controller will raise an I/O interrupt
when a read is complete. The OS (the BIOS, or Windows, or Linux)
handles all interrupts, both hardware and software. We'll talk a
lot more about hardware interrupts when we talk about I/O.
Interrupts
- Basically the process of stopping the CPU and getting it to pay attention to you.
- Caused by lots of totally different things:
- Hardware Interrupt: from a network card (a packet arrived), a disk drive (your read is finished), a mouse
- Timer interrupt: time's up!
- Software Interrupt: from a bad memory access, illegal instruction, generic "int 0x80" instruction
- When interrupted, a CPU looks up in its "interrupt table" (or interrupt vector) to figure out what to do.
- Interrupt table is usually at a known fixed address in physical memory (not user memory!)
- Interrupt table points to machine code that will handle the interrupt ("interrupt handler").
- The OS sets up the interrupt table, and points it to OS code.
- The first thing the interrupt handler does is save the CPU's old registers--this is needed to return from the interrupt!
- The interrupt handler then figures out what needs to be done in
response to the interrupt--for example, read the data from the disk.
- The interrupt handler finally restores the CPU registers, and
the interrupted program keeps running along (oblivious to the whole
process!).
- Interrupts can be delivered to user code, by registering a signal handler.
The "protection bit" ("supervisor bit")
- Lives in a special CPU register--on x86, "eflags".
- Controls access to the hardware
- If protection bit is off (0), the CPU allows you to access the hardware directly (totally unprotected, "kernel mode").
- If protection bit is on (1), the CPU interrupts if you try to access the hardware ("protected mode", or "user mode").
- Hardware means the memory map, the interrupt system, the I/O devices, and other dangerous machine instructions.
- The protection bit is a one-way feature:
- You can always turn the protection bit on, with normal code--this locks down the machine!
- Analogy: anybody can lock your car doors, if they were unlocked.
- You *can't* just turn the protection bit off, at least not with ordinary code. Only an interrupt can turn the
- Analogy: you don't want anybody to be able to unlock your car doors, if they were locked.
- The machine boots up with the protection bit off. Ancient
OSs that predate the protection bit never turn it on ("I hate these
new-fangled car door locks, so I never use 'em!"), so any application
can do anything to the machine (e.g., MS-DOS, MacOS 9).
- In any even relatively modern OS (Windows 95, Linux, Mac OS X), the protection bit is always on during normal user code. To get anything done, you MUST ask the OS--and the only way to do that is with a...
System call
- Normal user process needs to access the hardware.
- The CPU protection bit prevents it from doing this directly.
- So the user process asks the OS via a system call.
- On x86 Linux machines:
- Load up values into registers. (eax==system call number. ebx==first parameter. ecx==second parameter. ...)
- Issue "int 0x80", a software interrupt instruction. (Or "sysenter" on modern machines).
- There are system calls to do anything you can't do yourself--
- Read and write files
- Create new processes, threads, etc.
How interrupts work as system calls ("syscalls")
So the standard way to ask the OS to do something is to issue a special interrupt, normally called a system call or "syscall":
- Stash the information about what you want done.
Usually there's some sort of "selector code" that tells the OS what you
want done (on Linux, the "syscall number", which goes into eax), and
then a set of arguments. The selector and arguments can be stored:
- Inside processor registers. It's pretty common to store a selector code in register eax.
- On the runtime stack, just like arguments to a normal
subroutine. For some reason, this is fairly rare, probably
because the OS uses its own separate stack, so it's a pain to access
your stack.
- Inside
the interrupt instruction itself. The x86 "INT"
instruction can initiate 256 different interrupts. Similarly, the
Motorola
68000 had a 16-bit "OS Trap" instruction 0xA??? that left 12 bits for
the "trap number", which could be any of 4096 different values (and the
classic 68K MacOS used a substantial fraction of these for different
functions!).
- Call the OS, by issuing an interrupt. You could also imagine a
machine where you ask the OS to do stuff by just segfaulting--accessing
a special off-limits memory location.
- The
OS's "interrupt service routine" (just normal code, called by the CPU
when an interrupt happens) then reads the selector code from your
registers, does what you want done, and then returns control back to
you.
- The OS might have stashed return information (like an error code)
in registers, the stack, or elsewhere. As usual,
you've got to read each OS's docs to figure out how it works.
Linux can use either interrupt 0x80 (INT 0x80) for system calls, or the
slightly-faster SYSENTER instruction. The selector code goes in
eax, arguments go in ebx, ecx, edx, esi, and edi. Nothing goes on
the stack.
The PC BIOS uses interrupts 0x10 and 0x16. MS-DOS mostly uses interrupt 0x21. Ralph Brown's Interrupt List is the definitive reference for all BIOS and MS-DOS interrupt functions. The selector code goes in ax.
Windows XP now uses the special SYSENTER x86 instruction for system
calls. Windows 2000 and earlier versions of NT used INT
0x2e to access OS. Unfortunately, Windows randomly reassigns
system call numbers, so it's quite uncommon to make Windows system
calls directly (except possibly from inside a virus!).
Nowadays, you almost never make system calls directly, since the system
call interfaces require assembly code to load parameters into registers. Instead,
from C/C++ you normally call nice system library routines, like UNIX "write" or Windows "WriteFile",
that hide the ugly assembly. Or you use Visual Basic/Perl/PHP to
hide the ugly C/C++. Or you just surf the net. Whichever.
Syscall example--Linux
Konstantin Boldyshev has a good writeup and examples of Linux, BSD, and BeOS x86 syscalls, and a list of common Linux syscalls. He uses NASM for the examples. Here's a slightly cleaned up version of his Linux example:
; To make a Linux syscall, we load up the registers and call INT 0x80
mov eax,4 ;system call number (sys_write)
mov ebx,1 ;file descriptor (stdout)
mov ecx,msg ;message to write
mov edx,8 ;message length, in bytes
int 0x80 ;call kernel
; Kernel call return value is in eax-- it'll do as a function return code.
ret
section ".data" ;<- this puts the string into writeable memory...
msg:
db 'Wazzup?',0xa ; our little string, followed by a newline
(Executable NetRun Link)