CS 331 Spring 2013  >  Lecture Notes for Monday, February 18, 2013

CS 331 Spring 2013
Lecture Notes for Monday, February 18, 2013

Introduction to Survey of Programming Languages

Now we begin the second section of the course: a look at various programming languages. As we study programming languages, we will consider how they differ in five main areas.

Syntax Issues
Some languages, like C++, have a great deal of “punctuation”: semicolons, braces, commas, parentheses, etc. Python is a bit simpler, using indentation to indicate blocks. Haskell programs are simpler still, being essentially a stream of words, with occasional parentheses for grouping, and indentation to indicate blocks. Forth programs are simpler still, being entirely a stream of words. Related to this is the handling of identifiers and operators. The C language has a fixed set of operators. C++ has a fixed set of operator names, but allows them to be overloaded. Haskell allows new operator names to be declared.
Package Build/Execute Process
An interpreter executes source code as-is. A compiler translates one language into another. Compilation can be ahead of time (AOT) as is usually done with C++: compile and link, which creates an executable file that can be executed later. Or compilation can be just in time (JIT) as is usually done with Javascript: we hand the source code to a program charged with executing it; the source code is compiled, and, if possible, executed immediately. When a program is JIT compiled, there is often no saved executable file. C++ programs are usually compiled to machine code. Java and Python programs are usually compiled to a system-independent byte code, which is then interpreted. A related issue is how packages are structured. How is a library imported into a program?
Values & Typing
C++ and Haskell have static typing of variables and values. Python has dynamic typing* of values, but no typing of variables. ANSI Forth has no typing at all. In C++ some values can be modified—they are mutable— and some cannot. In ANSI Forth virtually everything is mutable; for example, one can change the value of “6”. C++ programs spend much of their time modifying values. Haskell, on the other hand, has no mutable values; nothing can be modified. Haskell also has first-class functions, that is, functions that can manipulated with the same ease and versatility as (say) integers can in almost any programming language. We thus consider functions to be ordinary values in Haskell.
Flow of Control
Flow of control in C++ mainly involves function calls, selection (if, switch), and loops. Haskell has no loops; we use tail recursion. An important consideration is how flow of control is used in error handling. C++ provides throw and try...catch, for use in handling exceptions; what do other languages do?
I/O
C++ does I/O using function calls that perform the I/O immediately. Haskell does I/O by having the return value of a program be an indication of the I/O tasks to be performed (roughly speaking). Javascript, as it is generally used, does not really have a notion of I/O; instead, functions manipulate the internal representation of a webpage, letting the web browser handle user interaction.

We will also discuss categories of programming languages: functional, concatenative, logic, etc.

*Static types and dynamic types are actually fundamentally different things. Thus, some people—particularly those who study the discipline of type theory—reserve the term “type” for static types, preferring to refer to the dynamic what-do-you-call-thems as tags. Nonetheless, the term “dynamic typing” is common, and we will use it in this class; however, you should know that some consider it a misnomer.

Haskell: Introduction

History

The first programming language we will look at is Haskell, named for logician Haskell Curry. Haskell was created as a result of a meeting in 1987. Some members of the functional-programming community decided that their efforts were too fragmented; they created a single language intended to support research or development by large numbers of people. The language was standardized in 1998, with a new standard issued in 2010.

The 1998 standard was implemented in a simple interactive Haskell environment called Hugs; this was supported on all major platforms. There was also a more full-featured compiler: the Glasgow Haskell Compiler, or GHC. Development on Hugs seems to have stopped in the mid 2000s. Today, GHC supports an interactive environment, called GHCi, which is very similar to the old Hugs environment.

On Function Programming & Functional Languages

Haskell is intended to support functional programming (FP), a programming style which generally has the following characteristics.

One can do functional programming, in some sense, in just about any programming language. However, some languages support it better than others. C offers rather poor support. Support in C++ is improving; the C++11 standard has added features to enable FP. Python has better support, Javascript’s is even better, and the various dialects of Lisp offer very good support. Haskell is pretty much at the top of the heap.

A functional language is a language designed to support functional programming well. No one calls C a functional language. Opinions vary about Javascript. But everyone agrees that Haskell is a functional language.

Not only does Haskell support FP, it offers little support for anything else. Haskell is a pure functional language, meaning that it does not allow for side effects. Values in Haskell are not mutable; nothing can be modified.

Haskell also includes first-class functions. A data type is first-class if all operations on its values are fully available at runtime.

For example, in C++, int is a first-class type. Consider what we can do with ints in C++.

[C++]

int a = b;
cout << 2+3;

Above, we declare a variable of type int and set it equal to another. Then we operate on two ints to create a new int value that has no name. Before the 2011 standard, we could do none of these things with functions in C++. C++11 does allow for unnamed functions, but it still does not permit us to manipulate functions with quite the same ease as int values.

But functions are first-class in Haskell. Creating a list of functions, or writing a function that manipulates functions (a higher-order function) are common, ordinary operations in Haskell.

Characteristics

As mentioned above, Haskell is a pure functional language (no mutable data or side effects). It has first-class functions and thus supports higher-order functions.

It is difficult to do loops without things like loop counters. And, indeed, Haskell has no iterative constructs. It uses recursion instead, with tail recursion preferred. The latter will generally be optimized using tail call optimization (or TCO).

Haskell has a simple syntax, without as much “punctuation” as C++. Here is a function call in C++:

[C++]

foo(a, b, c)

Here is a more-or-less equivalent function call in Haskell:

[Haskell]

foo a b c

Haskell has significant indentation. In C++, indenting is only for people who read the code; the compiler ignores it. In Haskell, indenting is one way to tell the compiler where a block begins and ends. Here is a function in C++:

[C++]

int bar(int a)
{
    int b = 7;
    int c = 42;
    return foo(a, b, c);
}

And here is a more-or-less equivalent function in Haskell:

[Haskell]

bar a = foo a b c where
    b = 7
    c = 42

Like C++, Haskell has static typing of both variables and values. Unlike C++, Haskell typing is mostly implicit; that is, types usually do not need to be specified. In C++, the typing is mostly explicit; we write

[C++]

int x = 3;

while in Haskell, we can simply say

[Haskell]

x = 3

The variable x still has a type (Integer in this case), but the compiler is able to figure this out for us, using a type inference algorithm (the Hindley-Milner Algorithm).

Similarly, a function in C++:

[C++]

bool blug(int a, int b)
{
    return a == b+1;
}

And in Haskell:

[Haskell]

blug a b = (a == b+1)

Haskell still allows for explicit typing, if desired. For example, we can say

[Haskell]

x :: Integer
x = 3

to mark x as an Integer explicitly. This also allows for our intentions to be communicated to the compiler. So this is legal:

[Haskell]

s = "abc"

But this will not compile:

[Haskell]

s :: Integer
s = "abc"  -- Type error!

since "abc" is not an Integer value.

Haskell’s type system is sound, meaning that operations not defined on a type are not permitted. In contrast, the type system of C++ is unsound (which does not mean “bad”!), since we can convert any type to any other, using the various kinds of ..._cast functionality.

Note: Many people like to talk about strong(er) and weak(er) type systems. A type system is generally consider stronger if its rules are applied more strictly. But these terms are not used consistently, and I prefer to avoid them.

By default Haskell does lazy evaluation, meaning that expressions are not evaluated until they need to be. C++ does the opposite, evaluating as soon as an expression is encountered; this is called strict evaluation, or eager evaluation. For example, here is a C++ function:

[C++]

int f(int x, int y)
{
    return x+1;
}

Suppose we do “f(g(1), g(2))”. This is executed as follows. First function g is called with argument 1. It is called again with argument 2. Then function f is called. Note that the value of g(2) is determined, but not used.

Here is the corresponding Haskell function:

[Haskell]

f x y = x+1

We can call f as we did in C++, passing it two values of function g: “f (g 1) (g 2)”. But since f never uses its second parameter, the expression “(g 2)” will never be evaluated. Indeed, if the return value of f is not used, then “(g 1)” will never be evaluated either.

Later we will see that lazy evaluation has other interesting consequences.

Build & Execution

GHC is an AOT compiler that usually generates machine code. The GHCi interactive environment does JIT compilation to a bytecode, which is then interpreted.

GHCi allows for the loading of source files, as well as evaluation of Haskell expressions that are entered interactively. This kind of environment is often called a Read-Eval-Print Loop, or REPL (the term comes from Lisp).

Haskell source code is stored in text files whose name ends in “.hs”.

On the command line, GHC is used much like g++ or any other command-line compiler.

[*ix command line]

ghc myprog.hs -o myprog

If there are no errors, then an executable named “myprog” will be generated. Running that file will execute function main in module Main.

Of course, if you are not using the command line, then things will be handled differently. GHC is supported by various IDEs, including Eclipse.

GHCi is a program you run, which presents you with a prompt. Commands for the environment all begin with colon (“:”). Some important ones:

:l FILENAME.hs
Load and JIT compile the given Haskell source file. After a file is loaded, functions in it may be executed by typing their name at the prompt. Parameters are given after the name of the function, separated by blanks.
:r
Reload the last file loaded. Useful if you change a file.
:t EXPRESSION
Print the type of the given Haskell expression. This expression can involve variables and functions defined in a file that has been loaded.
:i IDENTIFIER
Get information about the identifier: its type; also precedence and associativity if it is an operator; perhaps the file it is defined in.
:e FILENAME.hs
Open the given file in whatever editor GHCi is configured to use. I generally do not use the :e command, preferring to start an editor in the usual way. But you may find it convenient—or not.

If you type something at the GHCi prompt that does not begin with a colon, then it is taken as a Haskell expression. This is evaluated, and its value is printed.

When a compiled Haskell program is executed, function main in module Main is called. To do this with GCHi, load (:l) the source file, and then type “Main.main” (usually just “main” works). On the other hand, the interactive environment gives you the ability to call any function defined in a source file, not just main.

Some Programming

Basic Syntax

A Haskell expression is a stream of words separated by blanks where necessary, with optional parentheses for grouping. For example:

[Haskell]

2+3
(2+3)*5
reverse "abcde"
map (\ x -> x*x) [1,2,3,4]

Each line above is a single Haskell expression. Type it at the GHCi prompt and press Enter to see its value.

Single-line comments begin with two dashes (“--”) and continue to the end of the line. The two dashes must not form part of a legal lexeme; thus, I encourage you to put a blank after them.

Multi-line comments begin with “{-” and end with “-}”.

A Haskell identifier begins with a letter or underscore (“_”) and includes only letters, digits, underscores and single quotes (“'”). (That last character is because Haskell was designed by mathematicians; they want to be able to write “y'”.)

“Normal” Haskell identifiers begin with lower-case letters or the underscore; these name variables and functions. “Special” identifiers begin with upper-case letters; these name types, modules, and constructors. (Recall that function main goes in module Main.)

Haskell allows new operators to be defined. The names of these must consist of special characters:

! # $ % & * + . / < = > ? @ \ ^ | - ~ :

The “normal” ones, used for infix binary operators, do not begin with colon. The “special” ones, used for constructors, begin with colon.

Defining Variables

We define a variable (note that there are no mutable values; “variables” cannot vary) by giving its name, an equals sign, and an expression giving its value.

[Haskell]

a = 3
myNineVariable = 4+5

The above are not expressions in Haskell. they are not legal at the GHCi prompt, and must be typed in a source file.

Once the source file is loaded, we can use the identifiers (I use “>” to represent the GHCi prompt).

[Interactive Haskell]

> a
3
> a+5
8
> :t a
a :: Integer

Note that a has a type, even though we did not declare it. But we can declare it if we want.

[Haskell]

b :: Integer
b = 3

Here is an alternate form, which gives the type of the value, rather than the variable.

[Haskell]

c = 3::Integer

Some important numeric types are the following.

Int
Machine integer, like C++ int.
Integer
Arbitrary precision integer. The primary Haskell integral type. Not like any C++ built-in type.
Double
Machine floating-point value, like C++ double.

The difference between Int and Integer can be awe-inspiring. Note that “^” is the Haskell exponentiation operator.

[Interactive Haskell]

> (3::Int)^1000
-742892767
> (3::Integer)^1000
1322070819480806636890455259752144365965422032752148167664920368226828597346704899540778313850608061963909777696872582355950954582100618911865342725257953674027620225198320803878014774228964841274390400117588618041128947815623094438061566173054086674490506178125480344405547054397038895817465368254916136220830268563778582290228416398307887896918556404084898937609373242171846359938695516765018940588109060426089671438864102814350385648747165832010614366132173102768902855220001

Defining Functions

We define a function just like a variable, except that we include parameters.

[Haskell]

square x = x*x

Now we can do this.

[Interactive Haskell]

> square 5
25

The above allows any number as a parameter. We can restrict this to Integer values, as follows.

[Haskell]

square2::Integer -> Integer
square2 x = x*x

Note that function application has very high precedence.

[Interactive Haskell]

> square 5 - 3
22
> (square 5) - 3
22
> square (5 - 3)
4
> square -6  -- subtract 6 from function square (makes no sense)
[error message printed here]
> square (-6)
36

A very useful technique for defining functions is pattern matching. For a given argument, the first matching pattern is used. Here is a Fibonacci-number computation function.

[Haskell]

fibo 0 = 0
fibo 1 = 1
fibo n = fibo (n-1) + fibo (n-2)

The above is very inefficient. In C++ we would prefer an iterative version, but Haskell has no loops. But we can still do a fast Fibonacci function.

[Haskell]

fiboFast n = first (advance n [0,1]) where
    advance 0 [a,b] = [a,b]
    advance n [a,b] = advance (n-1) [b,a+b]
    first [a,b] = a

Function advance above is tail-recursive. Note that we use significant indentation to define our block. The where keyword introduces local definitions.

Infinite Lists

Lazy evaluation allows easy definition of infinite data structures.

[Haskell]

allnonneg = [0..]

The above is a list of all nonnegative integers. If we print the list (type “allnonneg” at the GHCi prompt), then it does on forever, but we can look at just the first few. Here are the first 20.

[Interactive Haskell]

> take 20 allnonneg
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]

Now we can apply fiboFast to each number in the infinite list.

[Haskell]

allfibos = map fiboFast [0..]

The result is a list of all Fibonacci numbers. Again, we can print the first 20.

[Interactive Haskell]

> take 20 allfibos
[0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181]

Notice what is going on here: we have done a computation using an infinite list as an intermediate result. With lazy evaluation, this is not a problem.

See haskell_intro.hs for Haskell source code related to today’s lecture.


CS 331 Spring 2013: Lecture Notes for Monday, February 18, 2013 / Updated: 27 Feb 2013 / Glenn G. Chappell / ggchappell@alaska.edu