CS 331 Spring 2013  >  Lecture Notes for Friday, January 18, 2013

# CS 331 Spring 2013 Lecture Notes for Friday, January 18, 2013

## Course Overview

In this class, we study programming languages with a view toward the following.

• How languages are specified.
• How these specifications are used.
• How certain language features differ between various languages.
• Categories of programming languages.

We are not here to learn the latest, greatest, coolest programming languages; we will, however, study several specific languages.

In roughly the first third of the class, we will look at syntax (structure) and semantics (meaning) of programming languages: how these are specified, and how these specifications make their way into a compiler. We will look at the processes of lexical analysis (lexing) and parsing.

In the remainder of the class, we will consider various language features (for example, typing). We will cover the basics of a number of programming languages, and we will discuss how these features appear in the languages we cover.

## Introduction to Syntax & Semantics

### Static & Dynamic

In our study of programming languages, we will run across two words repeatedly: “static” and “dynamic”.

Dynamic” refers to things that happen while a program is running. For example, in C++, allocation using new & delete is dynamic allocation. And Python has dynamic typing.

Static” refers to things that happen before a program runs. C++ global variables are created using static allocation. And both C++ and Haskell have static typing; types are determined by the compiler. Note that “static” can refer to an enormous range of time, everything from the design of the language, to the writing of the compiler, to the compilation, linking, and finally loading of the program.

### Syntax & Semantics

Syntax refers to the structure of a program. The string “a + b” is a syntactically correct C++ expression. The string “a b +” is not.

Semantics refers to the meaning of a program. For example, in C++ “a + b” means that a and b are passed to the function operator+, which is executed. The return value of this function is the value of the expression.

Certain language features exist in a gray area between syntax and semantics. For example, in C++, “3+string("abc")” will probably cause a type error. Is this a problem with structure or meaning? One way to answer this question is by referring to such issues as static semantics, while our earlier example involves dynamic semantics.

## Languages & Grammars

### Languages

A formal language (usually just language) is a collection of strings. The characters in the strings all lie in some alphabet. An epsilon (“$$\varepsilon$$”) will represent the empty string.

Here are some examples of languages.

• $$\{\textit{abc}, \textit{ccbg}, \textit{x}\}$$
• $$\{\varepsilon, \textit{b}, \textit{bb}, \textit{bbb}, \textit{bbbb}, \textit{bbbbb}, \dots\}$$
• The set of all legal C++ identifiers
• The set of all syntactically correct Python programs

The reason we are interested in languages is because of examples like the last two, above.

How do we describe a language? We can use a generator or a recognizer. A generator is a way of producing all strings in the language. A recognizer is a way of determining whether a given string lies in the language.

An important question, when we deal with languages, is, given a string, does it lie in the language? To that end, we need a recognizer (indeed, every compiler must include one, right?). But we typically find generators easier to come up with. A common technique is to write a generator, and then have a program use it to produce a recognizer automatically.

### Grammars

A phrase-structure grammar (usually just grammar) is one form of language generator.

In a grammar, we have a collection of terminal symbols; this is our alphabet. We also have a collection of nonterminal symbols; these are like variables that eventually turn into something else. One nonterminal symbol is the start symbol.

Here, I will use lower-case letters for terminals, upper-case letters for nonterminals, and “$$S$$” for the start symbol.

We also have one or more productions, which are rules for turning some string into some other string. Beginning with the start symbol, repeatedly applying rules, and ending with a string consisting only of terminals is a derivation of the final string.

For example, here is a grammar.

$S \rightarrow A$ $A \rightarrow AA$ $A \rightarrow \varepsilon$ $A \rightarrow bcd$

Here is a derivation, based on the above grammar.

$S$ $A$ $AA$ $AAA$ $AbcdA$ $bcdbcdA$ $bcdbcd$

Thus, the string “$$bcdbcd$$” lies in the language generated by the above grammar.

Languages & Grammars will be continued next time.

CS 331 Spring 2013: Lecture Notes for Friday, January 18, 2013 / Updated: 18 Jan 2013 / Glenn G. Chappell / ggchappell@alaska.edu