CS 331 Spring 2025 > Compilers & Interpreters

CS 331 Spring 2025
Compilers & Interpreters

This is a brief discussion of compilation and interpretation of programming languages, and related terminology.

Runtime

When a program is executed, the computations that it specifies actually occur. The time during which a program is being executed is runtime.

An implementation of a programming language will include a runtime system (often simply a runtime): code that assists in, or sometimes performs, execution of a program.

A runtime system typically includes low-level I/O operations. Another service that is typically provided by a runtime system is memory management; here are some examples.

In C++, new and delete call code in the runtime system.
Programming languages like Lua, Python, and Java have runtime systems that do garbage collection: cleaning up no-longer-used memory blocks, so a programmer does not need to write code to free up memory.

Executable files will generally include the runtime system—or at least the parts of it that the program uses.

But some execution methods never create an executable file or any machine code at all. In such cases, the runtime system will be a separate program that handles all code execution.

Compilers

Introduction

A compiler takes code in one programming language (the source language) and transforms it into code in another programming language (the target language); the compiler is said to target this second programming language. What a compiler does is called compilation.

In the illustrations, a compiler is represented by a GREEN box.

A compiler might target native code—the machine language directly executed by a computer’s processor, or it might target some other programming language.

Some programming languages are intended solely as target languages and are not aimed at readability for humans; such a programming language is typically called a byte code. For example, Java programs are usually compiled to Java byte code, which can be executed by the Java virtual machine (JVM).

In practice, we usually only use the term “compiler” when all of the following are true.

The source and target programming languages differ significantly.
The target language is lower-level than the source.
The transformation is done with execution as the goal.

When the above conditions are not met, programs that are technically compilers might be called something else. For example, software that transforms code in some programming language to code in a very similar programming language is usually called a preprocessor or an assembler, but rarely a compiler.

Software that compiles code in some programming language to code in another programming language that operates at roughly the same level of abstraction might be called a compiler. But often we use the term transcompiler or transpiler. For example, a transpiler might compile Python to JavaScript, for execution as part of a webpage. Note that a transpiler is a compiler, even if it is often not referred to as a compiler.

Multi-Stage Compilation

Good compilers proceed in a number of distinct steps. Code is transformed into an intermediate representation (IR), which is then transformed into the ultimate target language.

For example, the C++ compiler used by Apple’s Xcode is called Clang. Clang itself does not target native code but rather the intermediate machine-independent code specifed by the LLVM (Low-Level Virtual Machine) project. This intermediate code is then converted to native code. This multi-stage compilation process is illustrated below.

Arrangements like that above have important advantages.

To add a new source programming language to the above system, we only need to write a compiler for this programming language that targets LLVM code. Native code generators already exist for all major platforms.
Similarly, to target a new platform, we only need to write one code generator. Then all supported source programming languages can be compiled with our new platform as the target.

Optimization

To optimize code means to transform it so that it still performs the same task (or nearly the same task) but is better somehow—usually faster.

A compiler that can perform optimization is an optimizing compiler. Today, virtually all major compilers are optimizing.

Many integrated development environments (IDEs) have a debug build and a release build. One way these typically differ is that some optimizations are done in the release build, but not the debug build.

Interpreters

An interpreter takes code in some programming language and executes it. What an interpreter does is called iterpretation.

In the illustrations, an interpreter is represented by a RED box.

Remember: a compiler translates; an interpreter executes.

There are two common misconceptions about interpretation.

The first misconception is that interpretation is inherent to a programming languge—that there are interpreted programming languages and non-interpreted programming languages, forever distinct.

Certainly there are programming languages that are usually interpreted. But one can still write a compiler for one of these, targeting native code. On the other hand, the usual build process for C++ involves the creation of an executable file, which is often not executed immediately. But one could write a C++ interpreter; indeed, these exist.

The second misconception is that compilation and interpretation are completely separate notions. But many interpreters actually begin with a compilation step.

For example, the Lua programming language is usually interpreted. The standard Lua interpreter begins by compiling Lua source code to a low-level programming language called Lua byte code. This byte code is then interpreted directly.

A design like that above is very common. For example, to describe the operation of the Python interpreter in the standard Python distribution (CPython), replace each “Lua” above with “Python”.

Some interpreters allow code to be executed in an interactive environment. A user can type in a statement, expression, or other chunk of code—whatever is appropriate for the programming language. This is executed, and its output, or its value, as appropriate, is printed. Then the user is prompted for more code. Such an environment is sometimes called a REPL (Read-Eval-Print Loop), a term that originated with the Lisp family of programming languages.

JIT Compilers

Some code transformations require information that is only available at runtime. Examples of these are profile-based optimizations, which transform code based on what portions of the code spend the most time executing. A newer kind of compiler is able to perform such transformations while the code runs.

It may seem impossible to transform code while it is executing, but such dynamic compilation is actually becoming common. The resulting concurrent compilation and execution is termed Just-In-Time (JIT) compilation, and a program that does it is a JIT compiler, or sometimes simply a JIT.

A typical strategy is to do static compilation of source code into a byte code. Then execution begins, with the byte code being replaced, section by section, with native code that performs the same task more quickly. This is in fact the strategy used by LuaJIT, a Lua interpreter that does JIT compilation.

Note, that, because it involves code execution, a JIT compiler will always be part of an interpreter.

CS 331 Spring 2025 Compilers & Interpreters