CS 331 Spring 2018 > A Primer on Type Systems
CS 331 Spring 2018
A Primer on Type Systems
By Glenn G. Chappell
Text updated: 29 Jan 2018
Table of Contents
- Introduction to Typing
- Type Systems: Static vs. Dynamic
- Specifying Types: Manifest vs. Implicit
- Checking Types: Nominal vs. Structural
- Type Safety: Strong [ick!], Weak [ick!], Sound
- References
- Copyright & License
1. Introduction to Typing
This article introduces types, type systems, and associated terminology—all from a practical point of view. Sections 2–4 describe how type systems can be placed on three axes: static vs. dynamic, manifest vs. implicit, and nominal vs. structural. Section 5 closes with a discussion of type safety.
Basic Concepts
A type system is a way of classifying entities in a program (expressions, variables, etc.) by the kinds of values they represent, in order to prevent undesirable program states. The classification assigned to an entity its type. Most programming languages include some kind of type system.
Consider the following C++ code.
[C++]
int abc;
In C++, int
is a type.
Variable abc
has type int
.
More C++ code:
[C++]
cout << 123 + 456;
Literals 123
and 456
also represent values of type int
in C++,
although this does not need to be explicitly stated.
In C++, applying a binary +
operator to two values of type int
results
in another value of type int
.
Therefore, the expression 123 + 456
also
has type int
.
In the past, programming languages often had a fixed set of types. In contrast, modern programming languages typically have an extensible type system: one that allows programmers to define new types. For example, in C++, a class is a type.
[C++]
class Zebra { // New type named "Zebra" …
Type checking means checking and enforcing the restrictions associated with a type system. The various actions involved with a type system (determining types, type checking) are collectively known as typing.
How Types are Used
In programming languages, types are used primarily in three ways.
Usage #1: Determining Legal Values
First, types are used to determine which values an entity may take on.
Being of type int
,
the above variable abc
may be
set to 42
,
which also has type int
.
[C++]
abc = 42;
Being of type int
also makes some values
illegal for abc
.
[C++]
vector<int> v; abc = v; // ILLEGAL!
We cannot set a variable of type int
to a value of type vector<int>
.
The above code contains a type error.
When a C++ compiler flags a type error,
it prevents the program from compiling successfully.
Usage #2: Determining Legal Operations
Second, types are used to determine which operations are legal.
Since variable abc
is of type int
,
the binary +
operator may be used with it.
[C++]
cout << abc + abc;
Some operations are forbidden for type int
.
For example, there is no unary *
operator
for this type.
A C++ compiler would flag a type error in the following code.
[C++]
cout << *abc; // ILLEGAL!
A type whose values can be manipulated
with the ease and facility of a type like int
in C++,
is said to be first-class.
More formally, a first-class type
is one for which
- new values may be created from existing values at runtime,
- values may be passed as arguments to functions and returned from functions, and
- values may be stored in containers.
In some programming languages, such has Haskell, Scheme, Python, and Lua, function types are first-class. Such programming languages are said to have first-class functions.
Usage #3: Determining Which Operation
Third, types are used to determine which of multiple possible operations to perform.
Consider the following C++ function template addEm
.
[C++]
template <typename T, typename U> T addEm(T a, U b) { return a + b; }
We can call
addEm
with two arguments of type int
or with two arguments of type string
.
The operation performed by the +
operator
is determined by the argument type:
addition for int
arguments,
concatentation for string
arguments.
Python similarly uses the binary +
operator
for both numeric addition and string concatenation.
The following Python function, like its C++ counterpart,
may be called with two integers or with two strings.
[Python]
def addEm(a, b): return a + b
An important application of this third use of types
is to provide a single interface for
entities of different types.
This called polymorphism.
Both versions of addEm
are polymorphic functions.
2. Type Systems: Static vs. Dynamic
Definitions
A type system can be characterized as static or dynamic.
In a static type system, types are determined and checked before program execution. This is typically done by a compiler. Type errors flagged during static type checking generally prevent a program from being executed.
Programming languages with static type systems include C, C++, Java, Haskell, Objective-C, Go, Rust, and Swift.
In a dynamic type system, types are determined and checked during program execution. Types are tracked by attaching to each value a tag indicating its type. Type errors in a particular portion of code are flagged only when that code actually executes.
Programming languages with dynamic type systems include Python, Lua, JavaScript, Ruby, Scheme, and PHP.
Static typing and dynamic typing are actually two very different things. They are handled at different times, are implemented very differently, give different information, and allow for the solution of different problems.
Static and dynamic typing are so different that some people prefer not to use the same word for both. They typically reserve the term “type” for use with a static type system, referring to the categories in a dynamic “type” system as tags. In this article, we will use “type” for both the static and dynamic flavors, but it should be noted that some consider “dynamic typing” to be a misnomer.
Regardless of the characteristics of any particular type system, remember that type systems generally share a common purpose.
The purpose of a type system is to prevent undesirable program states.
For example, we use types to prevent the execution of operations that are incorrect for a type, because it is undesirable that such operations execute.
Consequences of Static & Dynamic Typing
Now we consider some of the ways that static vs. dynamic type systems affect programming. Our examples will use C++, which is statically typed, and Python & Lua, which are dynamically typed.
Compilation vs. Runtime Errors
In C++, Python, and Lua, it is illegal to divide by a string. A C++ compiler will flag a type error in the following code.
[C++]
cout << "Hello" << endl; cout << 1 / "bye" << endl; // ILLEGAL!
The static type checking in C++ means that the above code will not compile. An executable will not be created—much less executed.
Similarly, the following Python code will result in a type error being flagged.
[Python]
print("Hello") print(1 / "bye") # ILLEGAL!
As will the following Lua code.
[Lua]
io.write("Hello\n") io.write(1 / "bye" .. "\n") -- ILLEGAL!
However, because of the dynamic type checking in Python and Lua,
the programs above can still compile successfully and begin execution.
The type error will not be flagged until execution reaches
the second statement.
So both of the above programs will print
“Hello
”,
and then the type error will be flagged.
In some dynamically typed programming languages,
type errors raise exceptions that can be caught and handled.
Here is a Python function ff
that takes one parameter
and attempts to print 1 divided by that parameter.
[Python]
def ff(x): try: print(1 / x) # MAYBE illegal, depending on type of x except TypeError: print("TYPE ERROR")
Above, if a type error is flagged
in print(1 / x)
,
then we catch the resulting exception and print a message.
If we do ff(2)
,
then “0.5
” will be printed.
If we do ff("bye")
,
then “TYPE ERROR
” will be printed.
In both cases, execution will continue.
Typing of Both Variables and Values vs. Only Values
In a static type system, types are generally applied to both variables and values.
[C++]
int x = 42;
Above, “x
” is an identifier naming
a variable of type int
,
and 42
is a value of type int
.
In a dynamic type system, types are represented by tags attached to values. So generally only values have types in a dynamic type system.
[Python OR Lua]
x = 42 x = "bye"
The above presents no problem
in a dynamically typed programming language like Python or Lua.
Variable x
does not have a type.
The value 42
does have a type.
And the value "bye"
has a different type.
But in both cases, x
is merely a reference to the value.
A bit more subtly, in dynamically typed programming languages container items typically do not have types; only their values do. So there is generally no problem with a container holding values of different types. Here are Python and Lua lists containing a number, a string, and a boolean.
[Python]
mylist = [ 123, "hello", True ]
[Lua]
mylist = { 123, "hello", true }
3. Specifying Types: Manifest vs. Implicit
Manifest & Implicit Typing
When we specify the type of an entity by explicitly stating it, we are doing manifest typing.
The typing of variables and functions in C, C++, and Java is mostly manifest.
[C++]
double sq47(double n) { double result = 4.7 * n * n; return result; }
Above, the types of n
(double
),
sq47
(function taking double
and returning double
),
and result
(double
)
are explicitly stated.
Such an explicit specification of a type is a type annotation.
When types are not specified explicitly, we have implicit typing.
The typing in Python and Lua is mostly implicit. Here is a Python function that is more or less equivalent to the above C++ function.
[Python]
def sq47(n): result = 4.7 * n * n return result
In the above code, there are no type annotations at all.
In dynamically typed programming languages, typing is usually mostly implicit. It is therefore tempting to conflate manifest typing with static typing; however, the two are not the same.
For example, here is function sq47
in Haskell,
which has a static type system.
[Haskell]
sq47 n = result where result = 4.7 * n * n
Again, there are no type annotations.
However, identifiers sq47
, n
,
and result
in the above code
still have types.
This is because a Haskell compiler performs
type inference,
determining types
from the way entities are used in the code.
Haskell types are said to be inferred.
However, while type annotations
are mostly not required in Haskell,
they are still allowed,
as below.
[Haskell]
sq47 :: Double -> Double -- Above is Haskell's optional manifest typing: -- sq47 is a function taking Double, returning Double sq47 n = result where result = 4.7 * n * n
The following table shows how various programming languages can be classified.
Type Specification | |||
Mostly Manifest | Mostly Implicit | ||
---|---|---|---|
Type System |
Static | C, C++, Java | Haskell, OCaml |
Dynamic | Not much goes here | Python, Lua, Ruby, JavaScript, Scheme |
Since 2011, C++ standards have allowed for the increasing use of type inference in C++. For example, the following is legal under the 2014 C++ Standard.
[C++14]
auto sq47(double n) { auto result = 4.7 * n * n; return result; }
While the type of parameter n
is explicitly
specified above,
the other two type annotations are no longer required.
On the other hand, some implicitly typed programming languages are moving in the direction of optional type annotations. For example, beginning in 2015, Python has allowed for type annotations on function parameters.
Explicit & Implicit Type Conversions
A type conversion takes a value of one type and returns an equivalent, or at least similar, value of another type. When we specify in our code that a type conversion is to be done, we are doing an explicit type conversion; other conversions are implicit.
For example, C++ does implicit type conversion from
int
to double
.
[C++]
int n = 32; auto z = sq47(n);
Function sq47
takes a parameter
of type double
,
while n
has type int
.
This mismatch is dealt with via an implicit type conversion:
a double
version of the value of n
is computed, and this is passed to function sq47
.
We could also do the conversion explicitly,
as below.
[C++]
int n = 32; auto z = sq47(static_cast<double>(n));
Haskell has few implicit type conversions.
With function sq47
defined as above,
the following code will not compile,
as we are attempting to pass an Integer
to a function that takes a Double
.
[Haskell]
n = 32 :: Integer z = sq47 n -- ILLEGAL!
An explicit type conversion fixes the mismatch.
[Haskell]
n = 32 :: Integer z = sq47 (fromIntegral n)
Interestingly, when we write the above explicit type conversion, we need not state the type we are converting to.
4. Type Checking: Nominal vs. Structural
Nominal Typing
Various standards can be applied when type checking is done.
Consider the following C++ types A
and B
.
[C++]
struct A { int h; int m; }; struct B { int h; int m; };
Should types A
and B
be considered
the same, for type-checking purposes?
For example, should we allow the following function gg
to be called with an argument of type B
?
[C++]
void gg(A x) { ...
It might seem that there is no possible reason to distinguish
between A
and B
.
But here is one:
they are different types.
The programmer made them distinct,
and perhaps we should honor that decision.
For example, perhaps the two data members in type A
represent hours and minutes in a time of day,
while the two data members in type B
represent hits and misses in a shooting competition.
It might be a good idea to avoid mixing up the two.
Type checking by this standard is nominal typing
(“nominal” because we check whether a type
has the right name).
C++ checks ordinary function parameters using nominal typing.
And indeed, if we try to call the above function gg
with an argument of type B
,
a C++ compiler will flag a type error.
There are looser ways to apply nominal typing. For example, in the context of C++ inheritance, when a function takes a parameter of base-class pointer type, the type system allows a derived-class pointer to be passed as an argument.
[C++]
class Derived : public Base { ... }; void hh(Base * bp); Derived * derp; hh(derp); // Legal, even though different type
Structural Typing
Another possible standard for type checking is structural typing,
which considers two types to be interchangeable if they have
the same structure and support the same operations.
Under structural typing,
the types A
and B
defined above
would be considered the same.
Structural typing may also be applied in a relatively loose manner. Perhaps the loosest variation on structural typing allows an argument to be passed to a function as long as every operation that the function actually uses is defined for the argument. This is duck typing. (The name comes from the Duck Test: “If it looks like a duck, swims like a duck, and quacks like a duck, then it’s a duck.”)
C++ checks template-parameter types using duck typing.
The following function template ggt
can be called with
arguments of type A
or B
.
[C++]
template <typename T> void ggt(T x) { cout << x.h << " " << x.m << endl; }
The addEm
functions discussed earlier
are examples of duck typing.
Here they are again.
[C++]
template <typename T, typename U> T addEm(T a, U b) { return a + b; }
[Python]
def addEm(a, b): return a + b
We have noted that
C++ template-parameter types are checked using duck typing.
Python, Lua, and some other dynamic languages
check all function parameter types using duck typing.
Both versions of addEm
above
can be called with arguments of any type,
as long as all the operations used are defined for those types.
In particular,
both versions of addEm
may be called with two integer arguments
or with two string arguments.
No Type Checking
An alternative to the nominal and structural versions of type checking is no type checking at all.
Consider the programming language Forth, as defined in the 1994 ANSI standard. Forth distinguishes between integer and floating-point values, so it arguably has a notion of type. However, these two types are dealt with using different syntax. There is no need to check whether an integer parameter is actually an integer; Forth provides no facilities for passing a floating-point value in its place.
Thus, while Forth has types, it has no type checking.
This idea was once common in programming-language design. However, if different types involve different syntax, then it is difficult to implement an extensible type system. Virtually all modern programming languages include some form of type checking.
5. Type Safety: Strong [ick!], Weak [ick!], Sound
Type Safety
A programming language or programming-language construct is type-safe if it forbids operations that are incorrect for the types on which they operate.
Some programming languages/constructs may discourage incorrect operations or make them difficult, without completely forbidding them. We may thus compare the level of type safety offered by two programming languages/constructs.
For example, the C++ printf
function,
inherited from the C Standard Library,
is not type-safe.
This function takes an arbitrary number of parameters.
The first should be a format string containing references to
the other parameters.
[C++]
printf("I am %d years old.", age);
The above inserts the value of variable age
in
place of the %d
in the format string,
on the assumption that age
has type int
.
However, the C++ Standard specifies that
the type of age
is not checked.
It could be a floating-point value, a pointer, or a struct
;
the code would then compile,
and a type error would slip by unnoticed.
In contrast, C++ stream I/O is type-safe.
[C++]
cout << "I am " << age << " years old.";
When the above code is compiled,
the correct output function is chosen
based on the type of the variable age
.
Strong [ick!] & Weak [ick!]
Two unfortunate terms are often unfortunately used in discussions of type safety; the results are—you guessed it—unfortunate. The terms are strong typing (or strongly typed) and weak typing (or weakly typed). These generally have something to do with the overall level of type safety offered by a programming language.
The problem with this terminology is that it has no standard definition; it is used in different ways by different people, many of whom seem to assume that their definition is universally agreed upon. I have seen at least three different definitions of “strongly typed” in common use. The C programming language is strongly typed by one of them and weakly typed by the other two.
See the references for a page listing many different definitions of “strongly typed”.
Therefore:
Avoid using the terms “strong” and “weak” typing, or “strongly” and “weakly” typed.
Nonetheless, the issue of how strictly type rules are enforced is an important one. When discussing it, we might speak informally of how one type system is “stronger” than another. If we wish to be more formal, then we can talk about type safety, or simply explain in detail what we mean. When we are discussing a static type system, we can also talk about soundness.
Soundness
A static type system is sound if it guarantees that operations that are incorrect for a type will not be performed; otherwise it is unsound.
Haskell has a sound type system. The type system of C (and thus C++) is unsound, since there is always a way of treating a value of one type as if it has a different type. This might appear to be a criticism. However, the type system of C was deliberately designed to be unsound. Being able to interpret a value in memory in arbitrary ways makes C useful for low-level systems programming.
There does not seem to be any equivalent of the notion of soundness in the world of dynamic typing. However, we can still talk about whether a dynamic type system strictly enforces type safety.
6. References
The following were particularly helpful in writing this article.
- Michael R. Bernstein, “What is a Type System For?” (blog post), February 17, 2014.
- Benjamin C. Pierce, Types and Programming Languages (book), MIT Press, 2002.
- Chris D. Smith, “What to know before debating type systems” (blog post), reposting of a deleted article; original webpage dated March 3, 2008.
- Tom Stuart, “Consider static typing” (article), February 5, 2015.
The following page summarizes various definitions for strongly typed and gives examples of inconsistent use of the term.
- StronglyTyped from the C2 Wiki, a.k.a. the WikiWikiWeb.
7. Copyright & License
© 2015–2018 Glenn G. Chappell
A Primer on Type Systems
is licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License.
If different licensing is desired, please
contact the author.