Syscalls

Interrupts as a cry for help

The OS normally doesn't care what you're working on. It doesn't hover over your shoulder, watching your reads and writes to registers or memory. It's got better things to do. Instead, if you want the OS's attention, you've got to break something--do something the CPU will generate an error for! The standard unignorable operation is to throw an "software interrupt", which on x86 CPUs you can do with the special "INT" (generate software interrupt) instruction.

Hardware interrupts are used by the hardware to get the CPU's attention--for example, the IDE controller will raise an I/O interrupt when a read is complete. The OS (the BIOS, or Windows, or Linux) handles all interrupts, both hardware and software. We'll talk a lot more about hardware interrupts when we talk about I/O.

Interrupts

Basically the process of stopping the CPU and getting it to pay attention to you.
Caused by lots of totally different things:

Hardware Interrupt: from a network card (a packet arrived), a disk drive (your read is finished), a mouse
Timer interrupt: time's up!
Software Interrupt: from a bad memory access, illegal instruction, generic "int 0x80" instruction

When interrupted, a CPU looks up in its "interrupt table" (or interrupt vector) to figure out what to do.

Interrupt table is usually at a known fixed address in physical memory (not user memory!)
Interrupt table points to machine code that will handle the interrupt ("interrupt handler").
The OS sets up the interrupt table, and points it to OS code.

The first thing the interrupt handler does is save the CPU's old registers--this is needed to return from the interrupt!
The interrupt handler then figures out what needs to be done in response to the interrupt--for example, read the data from the disk.
The interrupt handler finally restores the CPU registers, and the interrupted program keeps running along (oblivious to the whole process!).
Interrupts can be delivered to user code, by registering a signal handler.

The "protection bit" ("supervisor bit")

Lives in a special CPU register--on x86, "eflags".
Controls access to the hardware

If protection bit is off (0), the CPU allows you to access the hardware directly (totally unprotected, "kernel mode").
If protection bit is on (1), the CPU interrupts if you try to access the hardware ("protected mode", or "user mode").

Hardware means the memory map, the interrupt system, the I/O devices, and other dangerous machine instructions.
The protection bit is a one-way feature:

You can always turn the protection bit on, with normal code--this locks down the machine!

Analogy: anybody can lock your car doors, if they were unlocked.

You *can't* just turn the protection bit off, at least not with ordinary code. Only an interrupt can turn the

Analogy: you don't want anybody to be able to unlock your car doors, if they were locked.

The machine boots up with the protection bit off. Ancient OSs that predate the protection bit never turn it on ("I hate these new-fangled car door locks, so I never use 'em!"), so any application can do anything to the machine (e.g., MS-DOS, MacOS 9).
In any even relatively modern OS (Windows 95, Linux, Mac OS X), the protection bit is always on during normal user code. To get anything done, you MUST ask the OS--and the only way to do that is with a...

System call

Normal user process needs to access the hardware.
The CPU protection bit prevents it from doing this directly.
So the user process asks the OS via a system call.
On x86 Linux machines:

Load up values into registers. (eax==system call number. ebx==first parameter. ecx==second parameter. ...)
Issue "int 0x80", a software interrupt instruction. (Or "sysenter" on modern machines).

There are system calls to do anything you can't do yourself--

Read and write files
Create new processes, threads, etc.

How interrupts work as system calls ("syscalls")

So the standard way to ask the OS to do something is to issue a special interrupt, normally called a system call or "syscall":

Stash the information about what you want done. Usually there's some sort of "selector code" that tells the OS what you want done (on Linux, the "syscall number", which goes into eax), and then a set of arguments. The selector and arguments can be stored:

Inside processor registers. It's pretty common to store a selector code in register eax.
On the runtime stack, just like arguments to a normal subroutine. For some reason, this is fairly rare, probably because the OS uses its own separate stack, so it's a pain to access your stack.
Inside the interrupt instruction itself. The x86 "INT" instruction can initiate 256 different interrupts. Similarly, the Motorola 68000 had a 16-bit "OS Trap" instruction 0xA??? that left 12 bits for the "trap number", which could be any of 4096 different values (and the classic 68K MacOS used a substantial fraction of these for different functions!).

Call the OS, by issuing an interrupt. You could also imagine a machine where you ask the OS to do stuff by just segfaulting--accessing a special off-limits memory location.
The OS's "interrupt service routine" (just normal code, called by the CPU when an interrupt happens) then reads the selector code from your registers, does what you want done, and then returns control back to you.
The OS might have stashed return information (like an error code) in registers, the stack, or elsewhere. As usual, you've got to read each OS's docs to figure out how it works.

Linux can use either interrupt 0x80 (INT 0x80) for system calls, or the slightly-faster SYSENTER instruction. The selector code goes in eax, arguments go in ebx, ecx, edx, esi, and edi. Nothing goes on the stack.

The PC BIOS uses interrupts 0x10 and 0x16. MS-DOS mostly uses interrupt 0x21. Ralph Brown's Interrupt List is the definitive reference for all BIOS and MS-DOS interrupt functions. The selector code goes in ax.

Windows XP now uses the special SYSENTER x86 instruction for system calls. Windows 2000 and earlier versions of NT used INT 0x2e to access OS. Unfortunately, Windows randomly reassigns system call numbers, so it's quite uncommon to make Windows system calls directly (except possibly from inside a virus!).

Nowadays, you almost never make system calls directly, since the system call interfaces require assembly code to load parameters into registers. Instead, from C/C++ you normally call nice system library routines, like UNIX "write" or Windows "WriteFile", that hide the ugly assembly. Or you use Visual Basic/Perl/PHP to hide the ugly C/C++. Or you just surf the net. Whichever.

Syscall example--Linux

Konstantin Boldyshev has a good writeup and examples of Linux, BSD, and BeOS x86 syscalls, and a list of common Linux syscalls. He uses NASM for the examples. Here's a slightly cleaned up version of his Linux example:

	; To make a Linux syscall, we load up the registers and call INT 0x80
        mov     eax,4   ;system call number (sys_write)
        mov     ebx,1   ;file descriptor (stdout)
        mov     ecx,msg ;message to write
        mov     edx,8 ;message length, in bytes
        int     0x80    ;call kernel
	; Kernel call return value is in eax-- it'll do as a function return code.
	ret

section ".data"  ;<- this puts the string into writeable memory...
msg:
	db      'Wazzup?',0xa     ; our little string, followed by a newline

(Executable NetRun Link)