Normally, to interact with the outside world (files, network, etc) from assembly language, you just call an existing function, usually the exact same function you'd call from C or C++. But sometimes, such as when you're implementing a C library, when there is no C library call to access the functionality you need, or when your code needs to operate behind enemy lines, you want to talk to the OS kernel directly.
In fact, to get anything dangerous done, like talking directly to
the screen or keyboard, you *must* talk to the OS kernel, because
the hardware itself has been instructed to not let you access
those things. On our x86 machines, the IOPL (I/O
Privilege Level) flag values range from 0 (kernel mode,
anything is allowed) to 3 (user mode, no hardware access is
allowed). The hardware lets you give up IOPL rights easily,
but there's only one way to gain IOPL rights: make a system
call. This makes system calls the primary gateway to kernel
functionality.
How you do a syscall depends on your OS and architecture.
For Linux:
Architecture | Syscall instruction |
Syscall number in |
return value |
arg0 | arg1 | arg2 | arg3 | arg4 | arg5 |
---|---|---|---|---|---|---|---|---|---|
x86_64 | syscall | rax | rax | rdi | rsi | rdx | r10 | r8 | r9 |
x86 | int 0x80 | eax | eax | ebx | ecx | edx | esi | edi | ebp |
arm | svc 0 | r7 | r0 | r0 | r1 | r2 | r3 | r4 | r5 |
arm64 | svc 0 | x8 | x0 | x0 | x1 | x2 | x3 | x4 | x5 |
On a 64-bit x86 Linux machine, there's a special instruction "syscall" to make system calls: a request to the kernel to do something.
You identify which system call you'd like to make by loading a
syscall number into register rax. A full
list of syscall numbers is here, or on /usr/include/asm/unistd_64.h.
Here we're calling the magic operating system call "fork();",
which creates a new process.
mov rax,57 ; the system call number of "fork". syscall ; Issue the system call ret ; return rax directlySyscall parameters are passed in registers rdi, rsi, rdx, r10, r8, r9, which you'll notice is *somewhat* like a function call but with slightly different registers! The return value, normally an integer error code, is returned in rax.
; "syscall" instruction call numbers are listed in "asm/unistd_64.h"
mov rax,1 ; the (syscall) system call number of "write".
mov rdi,1 ; first parameter: 1, the stdout file descriptor
mov rsi,myStr ; data to write
mov rdx,3 ; bytes to write
syscall ; Issue the system call
; leave syscall's return value in rax
ret
section .data
myStr:
db "Yo",0xa
section .text
The older 32-bit x86 syscall interface uses a software interrupt, "int 0x80". As with the 64-bit interface, register rax describes what to do (open a file, write data, etc). The registers ebx, ecx, edx, esi, and edi have the parameters describing how to do it. This register-based parameter passing is similar to how we call functions in 64-bit x86, but using different register numbers and smaller 32-bit registers, and the Linux kernel allows the use of this convention both in 32 and 64 bit mode.
"int 0x80" uses a special x86 instruction to do this called "int", short for interrupt. More generally, an "interrupt" is a hardware feature where the CPU saves what it was doing and does something else for a while. For example, every time a packet arrives from the network, the network card will interrupt the CPU, so some low-level operating system code can look at the packet and decide if it should keep running the current program, or switch to some new program (such as the web browser, or a network server). Handling interrupts is the central responsibility of the operating system.
The operating system allows you to perform a wide variety of almost magical features. For example, Linux syscall number 2 is "fork", which creates a complete duplicate of your process.
; System call numbers are listed in "asm/unistd.h" mov rax,2 ; the system call number of "fork" int 0x80 ; Issue "fork" system call (Linux 32-bit interface) ret
The return value comes back in rax. If it's negative, that indicates an error, which are listed in errno.h.
Konstantin Boldyshev has a list of common Linux syscalls. The full list of Linux syscalls is in /usr/include/asm/unistd_32.h. Here's a netrun-friendly version of his Linux example:
push rbx ; <- we'll be using ebx below
; System calls are listed in "asm/unistd.h"
mov rax,4 ; the system call number of "write".
mov rbx,1 ; first parameter: 1, the stdout file descriptor
mov rcx,myStr ; data to write
mov rdx,3 ; bytes to write
int 0x80 ; Issue the system call
pop rbx ; <- restore ebx to its old value
ret
section .data
myStr:
db "Yo",0xa
push {r7,lr} mov r7,2 @ syscall number of 'fork' svc 0 pop {r7,pc}The full list of syscalls is in /usr/src/linux-headers-4.9.35+/arch/arm/include/uapi/asm/unistd.h, but the Chromium OS pages have a better syscall description.
push {r7,lr} /* syscall write(int fd, const void *buf, size_t count) */ mov r0, 1 /* fd -> stdout */ ldr r1, =msg /* buf -> msg */ ldr r2, =len /* count -> len(msg) */ mov r7, 4 /* write is syscall #4 */ svc 0 /* invoke syscall */ pop {r7,pc} .data msg: .ascii "Hello, ARM!\n" len = . - msgBefore the 2006 "embedded API (EABI)", the syscall number wasn't in r7, it was inside the svc/swi instruction, which was a pain for the kernel because it needed to go back and read the instruction to extract those bits.
Other operating systems such as BSD UNIX store syscall parameters on the stack, like the 32-bit x86 function call interface.
In ancient 16-bit DOS mode, you access DOS functionality via INT 0x21, with the equivalent of a system call number in register AH.
In a PC BIOS boot block, you can access BIOS functionality via several interrupts, including INT 0x10 for screen access, or INT 0x13 for disk access.
On modern 32-bit Windows, you use the sysenter instruction, with the system call number in eax (table of Windows system call numbers). Older pre-XP windows used interrupt 0x2E.