CS 331 Spring 2013  >  Lecture Notes for Friday, February 15, 2013

CS 331 Spring 2013
Lecture Notes for Friday, February 15, 2013

Specifying Semantics

Introduction

Recall that semantics (note: singular noun!; “semantic” is an adjective) refers to the meaning of code, as opposed to syntax, which refers to its structure. There are a number of reasons why we might want to be precise about the semantics of code.

We will look at a few proposed methods for formally specifying semantics. Unfortunately, efforts to create such methods have not been nearly as successful as the similar efforts for syntax.

The semantics of a programming language can generally be thought of in two parts:

In a dynamically typed programming language, type checking is done at runtime, so there is little distinction between the two parts. But in a statically typed programming language, type checking will generally be done at compile time. Thus, we refer to the type-system semantics as static semantics; this can include other compile-time semantic checks, such as analyzing dependencies between statements (look into dataflow analysis sometime), and verifying that labels inside a switch statement are distinct. On the other hand, the semantics of a running-program is dynamic semantics.

Formal semantics means an exact description of semantics, generally using mathematical notation. We will look briefly at four categories of methods for specifying formal semantics:

Attribute Grammars
Associate attributes with each symbol in a CFG. Compute values of attributes for parse-tree nodes. Check that required rules are followed.
Operational Semantics
Specify the steps taken when a program is executed, usually in terms of a formal abtract machine, but possibly in terms of other programming-language constructs.
Axiomatic Semantics
Specify the effect of each statement in a program on the program’s invariants.
Denotational Semantics
Specify the meaning of parts of a program as an abstract mathematical object. The meaning of portion of a program is built up out of the meanings of its parts.

Attribute grammars are aimed at static semantics, while the other three methods are used to specify dynamic semantics. Each of these methods comes with its own specialized notation. We will avoid using these, instead giving an informal description of each method.

Static Semantics: Attribute Grammars

A method of formal specification of semantics that has had some success with static semantics is the attribute grammar. This is a way of using additional information grafted onto to a parse tree based on a CFG.

An attribute grammar is always associated with a CFG. To each symbol—both terminal and nonterminal—it associates a number of attributes. These are the keys in key-value pairs. Each node in a parse tree will have values associated with its keys. The attribute grammar includes rules that the values must follow, for a program to be correct; these rules are known as predicates.

For example, a typical attribute might be type.

Attributes for a parse-tree node can be computed based on the attributes of other nodes. For example, the type of an expression would probably be computed based on the types of the parts making up the expression. Then type checking could be done using the appropriate predicates.

Dynamic Semantics I: Operational Semantics

In operational semantics, we describe the steps taken when a program executes. Usually actions are described in terms of a simple abstract machine. Alternatively, they may be described in terms of another programming language, or even in the same programming language. (Note that, in this last case, we would need to specify some parts of the language using a different method.)

Operational semantics is used; do a web search, and you will find any number of places where it has been applied. Later, we will look at a use of operational semantics to partially specify the Haskell programming language.

Note that it is common today for a programming language to consist of a small, simple core, augmented with a standard library that is written in the core language. If we consider the library to be a part of the programming language in the larger sense, then the source code for the library constitutes an operational semantics for the language as a whole, in terms of the core.

Dynamic Semantics II: Axiomatic Semantics

In axiomatic semantics, we describe the semantics of statements in a programming language, by specifying their effect on invariants.

Recall: an invariant is a statement that is true at a particular point in a program. Preconditions are invariants before something executes, and postconditions are invariants after it executes.

One type of axiomatic semantics is predicate transformer semantics. This handles statements backwards: given the postconditions of a statement, it gives a process for finding the weakest precondition that will force the postcondtions to be true after the statement has executed.

Consider the following “C” statement.

a = b;

Given postconditions on a and b, we proceed as follows. The preconditions are the postconditions on a, but applied to b instead, along with the postconditions not mentioning a.

For example, suppose the postcondtions are a == 5 and b == 5. Then the process takes these, and applies them only to b; the resulting preconditions are b == 5 and b == 5, which is the same as simply b == 5. And, indeed, if b == 5, and we execute a = b;, then afterwards, we will have a == 5 and b == 5.

On the other hand suppose the postconditions are a == 2 and b == 3. Then the preconditions are b == 2 and b == 3. This is impossible; there are no preconditions that would make the postconditions true.

Predicate transformer semantics can be used as a framework for program verification. We begin with a description of what a program is supposed to do, written as postconditions for the entire program. Then we work backwards through the program: given postconditions for a statement, we use the process to find the weakest preconditions, which are then used as postconditions for the previous statement. When we finish, we have preconditions for the entire program, that is, conditions under which the program will perform the desired task. If there are no preconditions required, that is, if the program always performs the desired task, then it is a correct program.

Dynamic Semantics III: Denotational Semantics

In denotational semantics, we represent values and the state of a program using mathematical objects. We are given rules for building up the semantics of a portion of a program, based on the semantics of its parts, as well as the environment the program executes in.

We look at a very simple example. Here is a grammar for base-ten integers.

integer → digit | integer digit
digit → “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9

For each production—and we note that there are actually twelve—we give the left-hand side a numeric value, by defining a function \(M\) that takes the left-hand side, and returns a number.

For the production digit → “0”, let \(d\) be the digit on the left-hand side. We set \(M(d) = 0\). Similarly, for digit → “1”, we set \(M(d) = 1\), and so on.

For the production integer → digit, let \(i\) be the integer on the left-hand side, and let \(d\) be the digit on the right-hand side. We set \(M(i) = M(d)\).

For the production integer → integer digit, let \(i\) be the integer on the left-hand side, let \(i'\) be the integer on the right-hand side, and let \(d\) be the digit on the right-hand side. We set \(M(i) = 10 M(i') + M(d)\).

I would guess that denotational semantics is the most studied of the formal semantics-specification methods. It has certainly generated a great deal of published research. Some attempts have been made to use this method in the semantics specifications for practical programming languagages, but these appear to have met with limited success.

Specification of Programming-Language Semantics in Practice

C++

The C++ programming language is described in a very long (over 1300 pages) ISO Standard. Semantics specifications are in paragraph form. For example, here is the opening paragraph of the “Program execution” section of the 2011 Standard. (Note: The C++ Standard is copyright 2011 ISO/IEC. Exerpts are reproduced here under the “fair-use” provision of U.S. copyright law.)

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

This suggests that the Standard will give an operational semantics for C++. However, there is no formal semantics in the Standard. On the other hand, much of the text following the above quote might be considered as an informal operational semantics.

Haskell

The Haskell programming language is described in a series of medium-length reports. The most important Haskell standard was introduced in the Haskell 98 Report, which, in revised form, stands at 277 pages. The Preface of the Report, in the section discussing the goals of the language, says,

[The Haskell programming language] should be completely described via the publication of a formal syntax and semantics.

However, elsewhere in the Report, in the Introduction, we find the following.

This report defines the syntax for Haskell programs and an informal abstract semantics for the meaning of such programs.

Note the word “informal”. So it appears that the initial goal was abandoned. And, indeed, the Report does not include a full formal specification of Haskell semantics. However, in contrast to C++, some of the specifications in the Report are written as formal semantics.

Later in the Introduction is this curious statement, which concerns the core of the Haskell programming language, known as the “kernel”.

Although the kernel is not formally specified, it is essentially a slightly sugared variant of the lambda calculus with a straightforward denotational semantics.

But despite this claimed straightforwardness, and the stated goal of publishing a formal semantics for Haskell, we did not get one. This seems to underscore the point that we are really not very good at specifying programming-language semantics.

As mentioned above, the Report does include some formal semantics. For example, there is a section titled “Formal Semantics of Pattern Matching”. This section begins as follows.

The semantics of all pattern matching constructs other than case expressions are defined by giving identities that relate those constructs to case expressions. The semantics of case expressions themselves are in turn given as a series of identities, ....

In other words, the semantics of some Haskell pattern matching constructions are specified using an operational semantics, giving translations of these constructions into other Haskell constructions. As an example, here is one of the translations.

case \(v\) of { \(x+k\) -> \(e\); _ -> \(e'\) }
= if \(v\) >= \(k\) then (\\(x\) -> \(e\)) (\(v-k\)) else \(e'\)
where \(k\) is a numeric literal

This tells how a case construction can be translated into an if...then...else construction.

This ends our study of syntax and semantics. Next we begin a survey of programming languages.


CS 331 Spring 2013: Lecture Notes for Friday, February 15, 2013 / Updated: 15 Feb 2013 / Glenn G. Chappell / ggchappell@alaska.edu