CS 331 Spring 2013  >  Lecture Notes for Friday, March 1, 2013

CS 331 Spring 2013
Lecture Notes for Friday, March 1, 2013

Haskell: Just a Bit More

File I/O

Haskell’s IO wrapper supports not only terminal I/O, but also file I/O. To use it, import the IO module.

[Haskell]

import IO

Open a file with function openFile. This takes two arguments:

Function openFile returns an IO-wrapped value of type Handle.

[Haskell]

do
    h <- openFile "abc.txt" ReadMode  -- Open file & get handle
    ...

A file handle plays much the same role in Haskell as an fstream object in C++, a FILE * in C, or a file descriptor in the traditional low-level C file interface. All access to an open file is done through its handle.

The handle-based file-access functions are much the same as the terminal-I/O functions, except that “h” is prepended to each function name, and a file handle is passed as an additional first argument. For example:

[Haskell]

putStr "cat"     -- Terminal output
hPutStr h "cat"  -- File output

Here is a function printFile, which reads the entire contents of a file, given its path, and prints them to the standard output. This uses the handle variant of function getContents, which reads until end-of-file and returns an IO-wrapped String.

[Haskell]

printFile filePath = do
    h <- openFile filePath ReadMode
    lines <- hGetContents h
    putStr lines

Types & Classes

Defining Types

We can define new Haskell Haskell types using data.

[Haskell]

data IandB = IB (Integer, Bool)

This defines type IandB, which holds a value of type Integer and a value of type Bool. We have defined a constructor IB, which begins literals of type IandB, and can be used in pattern matching. Note that the names of both types and constructors must begin with an upper-case letter.

[Haskell]

incMaybe :: IandB -> IandB  -- optional type declaration
incMaybe (IB (i, b)) = IB (newI, b) where
    newI = if b then (i+1) else i

At this point, we can type “IB (1, True)” at the GHCi prompt, but we will get an error, since there is no way to output an IandB value.

[Interactive Haskell]

> IB (1, True)

<interactive>:1:0:
    No instance for (Show IandB)
    ...

Type Classes

A Haskell type class is a collection of types each of which implements some interface. Type classes are Haskell’s mechanism for overloading functions.

For example, class Show consists of all types whose values can be converted to type String in the usual manner. We can place type IandB into class Show as follows; when we do so, we must overload function show for our type.

[Haskell]

instance Show IandB where
    show (IB (i, b)) = "IB(" ++ show i ++ "," ++ show b ++ ")"

And now:

[Interactive Haskell]

> IB (1, True)
IB(1,True)
> [ IB(x,y) | x <- [8,2], y <- [True,False] ]
[IB(8,True),IB(8,False),IB(2,True),IB(2,False)]
> incMaybe (IB(5,True))
IB(6,True)

Class Eq consists of all types whose equality can be compared. Using this class, we can overload operators == and /=. When we place a type into this class, we only need to define one of the two operators; the other is defined for us (in the obvious way).

[Haskell]

instance Eq IandB where
    IB (i1, b1) == IB (i2, b2)  =  (i1 == i2) && (b1 == b2)

Now we can do this:

[Interactive Haskell]

> IB (1, True) == IB(1, False)
False
> IB (1, True) /= IB(1, False)
True

Monads

Type class Monad consists of those (parametrized) types that can be used in a do block. We have seen the monad IO. Another monad is [ ... ] (that is, lists). In fact, do blocks actually form a generalization of list comprehensions. They are thus called “monad comprehensions”.

For example, here is a list comprehension.

[Haskell]

com = [ (x,y) | x <- [1,2], y <- [1,3], x <= y ]

Here is the same idea, expressed as a monad comprehension. Note that this does not do I/O! We are in the list monad.

[Haskell]

com' = do
    x <- [1,2]
    y <- [1,3]
    if x <= y
        then return (x,y)
        else mzero

We need to import that Monad module to use mzero. Alternatively, we can replace “return (x,y)” with “[(x,y)]” and replace “mzero” with “[]”.

So the above code creates a number of lists, concatenates them, and returns the result.

Note that the line “x <- [1,2]” essentially says to execute the following code twice, once with x set to 1 and once with x set to 2. The list monad is thus a way of modelling parallel computation.

See haskell_more.hs for Haskell source code related to today’s lecture.

Python: Introduction

History

Our next programming language is Python, named after the Monty Python comedy troupe. Python was created in 1989 by Guido van Rossum, who still oversees development of the language. Python was created to be a high-level language, in the scripting-language category, that supported multiple programming paradigms (imperative, OO, functional, ...) and easily interfaced with C code.

There are now a large number of people involved in the design of Python. Community members may submit Python Enhancement Proposals (PEPs) which, if they survive a review process, are ruled on by van Rossum (who, in the spirit of Monty Python, has assumed the title of “Benevolent Dictator For Life”). New revisions of Python appear on a roughly yearly basis.

In 2008 version 3.0 of Python was released. Previously, Python revisions were mostly backward compatible with previous versions. But version 3.0 was an effort to clean up a number of “warts” in the language, and so broke backward compatibility. There is an ongoing transition from 2.x to 3.x, with both currently widely used. We will be studying Python 3.

There are a number of Python implementations. The primary implementation is called CPython; it is now available on every major platform. Jython is a Python implementation on top of the Java Virtual Machine; it allows linking with programs written in Java and other JVM languages. A curious implementation is PyPy, which is Python written in Python. This has made great strides in performance, and now generally exceeds CPython in speed.

Python is heavily used in server-side web programming. In recent years it has made inroads in scientific modelling and other large-scale numerical computation.

On Dynamic Languages

Since the first interactive computers, commands have been typed at a command line. The program that presents the command prompt to the user, and handles execution of typed commands, is a shell.

When a number of commands need to executed, it is convenient to place them in a file, which the shell can then read and execute. Such a file was originally called a batch file. Later, variables and flow of control handling were added to shells. A batch file became a program in a full-fledged programming language. Such a program is a shell script; the language is a scripting language.

In my opinion—and you may disagree—these programming languages that grew out of command-line shells are uniformly substandard. Apparently I am not the only one that thinks this way. In the 1970s, Alfred Aho, Peter Weinberger, and Brian Kernighan developed a simple text-processing programming language to handle some of the use cases of shell scripts. They named it with their final initials: AWK. Although AWK was not associated with any shell, it had many characteristics in common with the shell-script languages: there was no compile-link-execute cycle and no object or other binary files—one simply ran the program. Programs tended to be short and deal only with high-level concepts, and text processing was the primary aim. Thus AWK was also called a scripting language.

In 1987 Larry Wall developed a more sophisticated programming language to handle tasks formerly done by shell scripts, AWK programs, or various common Unix utilities: grep, sed, etc. This language was called Perl (which stood either for “Practical Extraction and Report Language” or for “Pathologically Eclectic Rubbish Lister”, depending on which end of the manual you started from). Again, there was no compile-link-execute cycle, and again it was called a scripting language.

Perl became very popular—and it still is—and a plethora of similarly aimed languages appeared: Python, Ruby, Lua, Squirrel, Falcon, etc., etc. Again these were called “scripting languages”, but these days they have outgrown that name. No longer are they restricted to quick text-processing scripts and utility command wrappers. These programming languages now support thriving ecosystems with huge, well-maintained collections of libraries. They are used to write compilers, major websites, and various mission-critical codebases in any number of organizations. A more modern term for a language of this kind is dynamic language.

Generally speaking, the dynamic languages have the following characteristics.

As an illustration of the final characterstic above, consider the following hello-world program in ANSI C++.

[C++]

#include <iostream>
using std::cout;
using std::endl;

int main()
{
    cout << "Hello, world!" << endl;

    return 0;
}

Here are equivalent programs in three different dynamic languages.

[Perl 5]

print "Hello, world!\n"

[Ruby]

puts "Hello, world!"

[Python 3]

print("Hello, world!")

Characteristics

Python is a dynamic language. Its code looks roughly like C++; however it has significant indentation (like Haskell), less punctuation, and an emphasis on readability. It also allows executable statements at global scope (as above). Like C++, Python uses strict evaluation.

Python’s typing is entirely dynamic. For example, a type error raises an exception, which can then be caught and handled at runtime. It uses implicit typing; unlike Haskell it does not allow for explicit type declarations at all. Python applies types only to values; a Python variable is simply a reference to a value, and has no type of its own. Its type rules are enforced; unlike C++, Python has no provision for arbitrary type conversions.

The default behavior of Python’s type-checking is called duck typing. This comes from the saying, “If it looks like a duck and quacks like a duck, then it is a duck.” What this means is that a Python function will accept a value of a certain type as a parameter if all the operations the function uses on that value are defined. It does not matter whether the value is of the “right” type. (You have had experience with this; it is much the same as the type-checking rules for C++ function templates.)

Python includes support for OOP, including classes (in the C++ sense, not the Haskell sense), inheritance of type information, and polymorphism. (However, in my experience, duck-typing, along with lack of static typing, mean that inheritance is generally of little value.)

Python includes some support for FP. It has first-class functions, higher-order functions, and support for lazy sequences (called generators). It includes list comprehensions, along with three other kinds of comprehensions: set, dictionary (hash table), and generator. Sadly, Python quite explicitly does not have tail-call optimization.

Python is very much an imperative language (i.e., programs consist of statements, which do things). In fact, Python function and class definitions are executable statements. One can, for example, add member functions to a class at runtime.

The Python language environment manages storage and ownership. Applications do no memory management. There are no pointers. There is little need to write anything like a copy constructor. Object lifetimes are handled via reference counting (in CPython; Jython is necessarily influenced by the JVM lifetime rules).

Built-in datatypes include arbitrary-precision integer, floating-point value, string (as of Python 3.0, all strings are Unicode), list, tuple, set, and dictionary (hash-table-based key-value store). Container objects may include items of differing types. (In Python the distinction between lists and tuples is that the former are mutable, while the latter are not.) Applications may define their own classes.

Python execution is heavily based on hash tables. Of course, they form the underlying implementation of sets and dictionaries. The enironment also keeps track of all functions in a hash table, with function names as keys. A function call involves a hash-table look-up. So does a member access in an object.

Build & Exec

Non-Interactive Execution

Python is typically JIT-compiled to a language-specific bytecode, which is then interpreted. CPython uses its own bytecode; Jython uses JVM bytecode. Python is often mistakenly described as an interpreted language; it is not. As with Haskell, a REPL is available. Unlike GHCi, the Python REPL allows definition of variables, functions, and classes.

I will describe the traditional execution process for the *ix command line. Other operating systems, along with IDEs, will do things differently.

The Python JIT compiler/execution handler is traditionally named “python”. With the switch to Python 3 it is now typically named “python3”. Execute a Python program by passing the source-file name to the JIT compiler.

[*ix Command Line]

$ python3 myprog.py

Above, “$” represents the shell prompt and is not to be typed. The standard Python source-file suffix is “.py”.

This will compile the source, and, if there are no errors, it will execute the result as well. Sometimes “.pyc” files are created as part of the compilation process; these can be ignored.

The Shebang Convention

In the *ix world, there is a standard trick for executing a script with an interpreter or JIT compiler. The first line of the script begins with “#!” (say “sharp bang”, or shebang for short). Following this is the pathname of the interpreter/JIT compiler. For example, on my system, the Python 3 JIT compiler is found at /usr/bin/python3, so I could begin myprog.py with the following line.

[Python Script]

#!/usr/bin/python3

Then I make myprog.py an executable file, and to run it I simply type its name.

[*ix Command Line]

$ myprog.py

What happens when I type the above? The shell looks at the beginning of the file, sees the shebang, and effectively types “python3 myprog.py” for me. Then python3 gets the file, looks at the first line and—sees a comment, since Python comments begin with “#” and continue to the end of the line.

Many dynamic languages begin comments with “#” for this reason. And, among those that do not, virtually all will ignore the first line of a source file, if it begins with a shebang.

Note that, while there are many ways to execute Python programs, the shebang convention will not interfere with the others, since the shebang line is a Python comment.

Now, the above process actually has one problem. Some other system might place the Python 3 JIT compiler in a different directory. Then the shebang line will point to the wrong place, and my file will not be executed. For this reason, the env program was written. The job of env is to know whether all the other commands are. So now I begin my file with:

[Python Script]

#!/usr/bin/env python3

To execute, I type the filename as before. The shell hands the file to env. Then env hands it to python3. Then python3 starts compiling. It skips the shebang line, since it begins with “#”, and moves on to the rest of the program. If there is a compilation error, then everything comes crashing down. If not, then the program executes.

That is all good and well, of course, but you may note that there remains one sticky issue.

Q: For all that to work, env has to be in /usr/bin. What if it is in some other directory?

A: It isn’t.

Q: But what if—

A: IT ISN’T!!!

Q: Okay, okay. Geez ....

Interactive Execution

As mentioned above, Python has a standard REPL. At the *ix command line, we get to this by running python2 with no parameters.

[*ix Command Line]

$ python3

After some version information, we get the REPL prompt, which, by convention, is “>>>”. As with Haskell, we can type expressions to see their values. But we can type statements as well. The prompt changes to “...” when an expression or statement continues to another line.

[Interactive Python 3]

>>> 3+5
8
>>> (3+
... 5
... )
8
>>> "abc" + "def"
'abcdef'
>>> x = 4+5
>>> x
9
>>> sin(1)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'sin' is not defined
>>> import math
>>> math.sin(1)
0.8414709848078965
>>> quit()

To exit the REPL, type “quit()”, or send EOF (<Ctrl-D> in most *ix shells).

To run our own program, do an import in the REPL, leaving off the filename suffix.

[Interactive Python 3]

>>> import myprog

If the file contains commands at global scope, then these will execute.

Alternatively, if the source file contains only function definitions, import as above, and then run any function defined in the file by prefacing it with “FILENAME.”. Say I have defined a function foo in file myprog.py.

[Interactive Python 3]

>>> import myprog
>>> myprog.foo()

Alternatively, do “from FILENAME import *”. Then you can run functions without the namespace prefix.

[Interactive Python 3]

>>> from myprog import *
>>> foo()

I am not fond of this last method, for the same reason I am not fond of “using namespace std;” in C++.

Some Programming

Like C++ programs, a Python program consists of statements, one after another, with execution modified by flow-of-control constructions. Lexical conventions are similar to those of C++, except that a newline ends a statement, where possible. A comment begins with “#” and continues to the end of the line.

Variables are declared simply by assigning to them. The command to print is function print. Function parameters are placed in parentheses, and separated by commas.

[Python 3]

# Some commands
print("Hello")
x = 3+4
print("x =", x)

By default the output of print separates parameters with blanks and ends with a newline. These can be modified with the optional named parameters sep and end.

[Python 3]

print("abc", "def")
# Below is same as above
print("abc ", "def", sep="", end="")
# Below is also the same
print("abc def\n", end="")
# Again, the same
print("abc def")

Since only values have types, we may assign a variable to a value of a different type than that of its current value.

[Python 3]

x = 1
print("An integer:", x)
x = "abc"
print("A string:", x)

Define a function with def. Then give the name of the function, its parameter list in parentheses (without types!), and then a colon. The body of the function should follow, indented. Any indentation amount works, but 4 characters is the official Python standard.

[Python 3]

def print_sum(a, b):
    c = a + b
    print("The sum is:", c)

Return values to the caller with return, as in C++. However, no return type is declared.

[Python 3]

def f1(x):
    return x+1

def f2(x):
    return 2*f1(x)

We can document the above functions with docstrings and doctests.

[Python 3]

def f1(x):
    """Return param + 1.

    >>> f1(5)
    6

    """
    return x+1

def f2(x):
    """Return 2 * param + 2.

    >>> f2(5)
    11

    """
    return 2*f1(x)

The lines between the triple double quotes are the docstrings. These can be read by automated documentation-generating programs (see Pydoc).

Inside the docstrings. the lines beginning with “>>>” are the doctests. These are essentially a picture of an interactive Python session. Standard library module doctest (look it up) allows execution of these and checking the results against the given values. The result is both documentation and test.

Here is an example of a lazy-list function, called a generator.

[Python 3]

def fibo_gen():
    """ Generator. Yield Fibonacci numbers: 0,1,1,2,..."""
    prev = 1
    curr = 0
    while True:
        yield curr
        new = prev + curr
        prev = curr
        curr = new

Note the yield, which sends a value back to the caller, but also saves the current state so that we can pick up where we left off if the caller wants more.

Here is one way to use the above generator.

[Python 3]

count = 0
for f in fibo_gen():  # Iterate through values yielded by fibo_gen
    print(f)
    count += 1        # Python has no "++"
    if count == 20:
        break

Here is a more “pythonic” way to do the same thing.

[Python 3]

# Must import itertools
# itertools.islice is like Haskell's "take"
for f in itertools.islice(fibo_get(), 20):
    print(f)

See python_intro.py for Python source code related to today’s lecture.


CS 331 Spring 2013: Lecture Notes for Friday, March 1, 2013 / Updated: 1 Mar 2013 / Glenn G. Chappell / ggchappell@alaska.edu