CS 331 Spring 2013 > Lecture Notes for Friday, March 1, 2013 |
Haskell’s IO
wrapper
supports not only terminal I/O,
but also file I/O.
To use it, import the IO
module.
[Haskell]
import IO
Open a file with function openFile
.
This takes two arguments:
String
giving the path of the file.
(Officially, the type of this argument is FilePath
,
but that is merely an alias for String
.)IOMode
indicating
how to open the file.
Options are
ReadMode
,
WriteMode
,
AppendMode
, and
ReadWriteMode
.
Function openFile
returns an IO
-wrapped
value of type Handle
.
[Haskell]
do h <- openFile "abc.txt" ReadMode -- Open file & get handle ...
A file handle plays much the same role in Haskell as
an fstream
object in C++,
a FILE *
in C,
or a file descriptor in the traditional low-level C file interface.
All access to an open file is done through its handle.
The handle-based file-access functions are much the same
as the terminal-I/O functions,
except that “h
” is prepended to each
function name,
and a file handle is passed as an additional first argument.
For example:
[Haskell]
putStr "cat" -- Terminal output hPutStr h "cat" -- File output
Here is a function printFile
,
which reads the entire contents of a file,
given its path, and prints them to the standard output.
This uses the handle variant of function getContents
,
which reads until end-of-file
and returns an IO
-wrapped String
.
[Haskell]
printFile filePath = do h <- openFile filePath ReadMode lines <- hGetContents h putStr lines
We can define new Haskell Haskell types using data
.
[Haskell]
data IandB = IB (Integer, Bool)
This defines type IandB
,
which holds a value of type Integer
and a value of type Bool
.
We have defined a constructor IB
,
which begins literals of type IandB
,
and can be used in pattern matching.
Note that the names of both types and constructors
must begin with an upper-case letter.
[Haskell]
incMaybe :: IandB -> IandB -- optional type declaration incMaybe (IB (i, b)) = IB (newI, b) where newI = if b then (i+1) else i
At this point, we can type “IB (1, True)
”
at the GHCi prompt,
but we will get an error,
since there is no way to output an IandB
value.
[Interactive Haskell]
> IB (1, True) <interactive>:1:0: No instance for (Show IandB) ...
A Haskell type class is a collection of types each of which implements some interface. Type classes are Haskell’s mechanism for overloading functions.
For example, class Show
consists of all types
whose values can be converted to type String
in the usual manner.
We can place type IandB
into class Show
as follows;
when we do so, we must overload function show
for our type.
[Haskell]
instance Show IandB where show (IB (i, b)) = "IB(" ++ show i ++ "," ++ show b ++ ")"
And now:
[Interactive Haskell]
> IB (1, True) IB(1,True) > [ IB(x,y) | x <- [8,2], y <- [True,False] ] [IB(8,True),IB(8,False),IB(2,True),IB(2,False)] > incMaybe (IB(5,True)) IB(6,True)
Class Eq
consists of all types whose equality can be
compared.
Using this class, we can overload operators
==
and /=
.
When we place a type into this class,
we only need to define one of the two operators;
the other is defined for us (in the obvious way).
[Haskell]
instance Eq IandB where IB (i1, b1) == IB (i2, b2) = (i1 == i2) && (b1 == b2)
Now we can do this:
[Interactive Haskell]
> IB (1, True) == IB(1, False) False > IB (1, True) /= IB(1, False) True
Type class Monad
consists of those
(parametrized) types that can be used in a do
block.
We have seen the monad IO
.
Another monad is [ ... ]
(that is, lists).
In fact, do
blocks actually form a generalization of
list comprehensions.
They are thus called “monad comprehensions”.
For example, here is a list comprehension.
[Haskell]
com = [ (x,y) | x <- [1,2], y <- [1,3], x <= y ]
Here is the same idea, expressed as a monad comprehension. Note that this does not do I/O! We are in the list monad.
[Haskell]
com' = do x <- [1,2] y <- [1,3] if x <= y then return (x,y) else mzero
We need to import that Monad
module to use mzero
.
Alternatively,
we can replace
“return (x,y)
”
with
“[(x,y)]
”
and replace
“mzero
”
with
“[]
”.
So the above code creates a number of lists, concatenates them, and returns the result.
Note that the line “x <- [1,2]
”
essentially says to execute the following code twice,
once with x
set to 1
and once with x
set to 2
.
The list monad is thus a way of modelling parallel computation.
See
haskell_more.hs
for Haskell source code related to today’s lecture.
Our next programming language is Python, named after the Monty Python comedy troupe. Python was created in 1989 by Guido van Rossum, who still oversees development of the language. Python was created to be a high-level language, in the scripting-language category, that supported multiple programming paradigms (imperative, OO, functional, ...) and easily interfaced with C code.
There are now a large number of people involved in the design of Python. Community members may submit Python Enhancement Proposals (PEPs) which, if they survive a review process, are ruled on by van Rossum (who, in the spirit of Monty Python, has assumed the title of “Benevolent Dictator For Life”). New revisions of Python appear on a roughly yearly basis.
In 2008 version 3.0 of Python was released. Previously, Python revisions were mostly backward compatible with previous versions. But version 3.0 was an effort to clean up a number of “warts” in the language, and so broke backward compatibility. There is an ongoing transition from 2.x to 3.x, with both currently widely used. We will be studying Python 3.
There are a number of Python implementations. The primary implementation is called CPython; it is now available on every major platform. Jython is a Python implementation on top of the Java Virtual Machine; it allows linking with programs written in Java and other JVM languages. A curious implementation is PyPy, which is Python written in Python. This has made great strides in performance, and now generally exceeds CPython in speed.
Python is heavily used in server-side web programming. In recent years it has made inroads in scientific modelling and other large-scale numerical computation.
Since the first interactive computers, commands have been typed at a command line. The program that presents the command prompt to the user, and handles execution of typed commands, is a shell.
When a number of commands need to executed, it is convenient to place them in a file, which the shell can then read and execute. Such a file was originally called a batch file. Later, variables and flow of control handling were added to shells. A batch file became a program in a full-fledged programming language. Such a program is a shell script; the language is a scripting language.
In my opinion—and you may disagree—these programming languages that grew out of command-line shells are uniformly substandard. Apparently I am not the only one that thinks this way. In the 1970s, Alfred Aho, Peter Weinberger, and Brian Kernighan developed a simple text-processing programming language to handle some of the use cases of shell scripts. They named it with their final initials: AWK. Although AWK was not associated with any shell, it had many characteristics in common with the shell-script languages: there was no compile-link-execute cycle and no object or other binary files—one simply ran the program. Programs tended to be short and deal only with high-level concepts, and text processing was the primary aim. Thus AWK was also called a scripting language.
In 1987 Larry Wall developed a more sophisticated programming language to handle tasks formerly done by shell scripts, AWK programs, or various common Unix utilities: grep, sed, etc. This language was called Perl (which stood either for “Practical Extraction and Report Language” or for “Pathologically Eclectic Rubbish Lister”, depending on which end of the manual you started from). Again, there was no compile-link-execute cycle, and again it was called a scripting language.
Perl became very popular—and it still is—and a plethora of similarly aimed languages appeared: Python, Ruby, Lua, Squirrel, Falcon, etc., etc. Again these were called “scripting languages”, but these days they have outgrown that name. No longer are they restricted to quick text-processing scripts and utility command wrappers. These programming languages now support thriving ecosystems with huge, well-maintained collections of libraries. They are used to write compilers, major websites, and various mission-critical codebases in any number of organizations. A more modern term for a language of this kind is dynamic language.
Generally speaking, the dynamic languages have the following characteristics.
As an illustration of the final characterstic above, consider the following hello-world program in ANSI C++.
[C++]
#include <iostream> using std::cout; using std::endl; int main() { cout << "Hello, world!" << endl; return 0; }
Here are equivalent programs in three different dynamic languages.
[Perl 5]
print "Hello, world!\n"[Ruby]
puts "Hello, world!"[Python 3]
print("Hello, world!")
Python is a dynamic language. Its code looks roughly like C++; however it has significant indentation (like Haskell), less punctuation, and an emphasis on readability. It also allows executable statements at global scope (as above). Like C++, Python uses strict evaluation.
Python’s typing is entirely dynamic. For example, a type error raises an exception, which can then be caught and handled at runtime. It uses implicit typing; unlike Haskell it does not allow for explicit type declarations at all. Python applies types only to values; a Python variable is simply a reference to a value, and has no type of its own. Its type rules are enforced; unlike C++, Python has no provision for arbitrary type conversions.
The default behavior of Python’s type-checking is called duck typing. This comes from the saying, “If it looks like a duck and quacks like a duck, then it is a duck.” What this means is that a Python function will accept a value of a certain type as a parameter if all the operations the function uses on that value are defined. It does not matter whether the value is of the “right” type. (You have had experience with this; it is much the same as the type-checking rules for C++ function templates.)
Python includes support for OOP, including classes (in the C++ sense, not the Haskell sense), inheritance of type information, and polymorphism. (However, in my experience, duck-typing, along with lack of static typing, mean that inheritance is generally of little value.)
Python includes some support for FP. It has first-class functions, higher-order functions, and support for lazy sequences (called generators). It includes list comprehensions, along with three other kinds of comprehensions: set, dictionary (hash table), and generator. Sadly, Python quite explicitly does not have tail-call optimization.
Python is very much an imperative language (i.e., programs consist of statements, which do things). In fact, Python function and class definitions are executable statements. One can, for example, add member functions to a class at runtime.
The Python language environment manages storage and ownership. Applications do no memory management. There are no pointers. There is little need to write anything like a copy constructor. Object lifetimes are handled via reference counting (in CPython; Jython is necessarily influenced by the JVM lifetime rules).
Built-in datatypes include arbitrary-precision integer, floating-point value, string (as of Python 3.0, all strings are Unicode), list, tuple, set, and dictionary (hash-table-based key-value store). Container objects may include items of differing types. (In Python the distinction between lists and tuples is that the former are mutable, while the latter are not.) Applications may define their own classes.
Python execution is heavily based on hash tables. Of course, they form the underlying implementation of sets and dictionaries. The enironment also keeps track of all functions in a hash table, with function names as keys. A function call involves a hash-table look-up. So does a member access in an object.
Python is typically JIT-compiled to a language-specific bytecode, which is then interpreted. CPython uses its own bytecode; Jython uses JVM bytecode. Python is often mistakenly described as an interpreted language; it is not. As with Haskell, a REPL is available. Unlike GHCi, the Python REPL allows definition of variables, functions, and classes.
I will describe the traditional execution process for the *ix command line. Other operating systems, along with IDEs, will do things differently.
The Python JIT compiler/execution handler is traditionally named
“python
”.
With the switch to Python 3 it is now typically named
“python3
”.
Execute a Python program by passing the source-file name to
the JIT compiler.
[*ix Command Line]
$ python3 myprog.py
Above, “$
” represents the shell prompt
and is not to be typed.
The standard Python source-file suffix is “.py
”.
This will compile the source, and, if there are no errors,
it will execute the result as well.
Sometimes “.pyc
” files are created
as part of the compilation process;
these can be ignored.
In the *ix world, there is a standard trick for executing a script
with an interpreter or JIT compiler.
The first line of the script begins with “#!
”
(say “sharp bang”, or shebang for short).
Following this is the pathname of the interpreter/JIT compiler.
For example, on my system, the Python 3 JIT compiler
is found at /usr/bin/python3
,
so I could begin myprog.py
with the following line.
[Python Script]
#!/usr/bin/python3
Then I make myprog.py
an executable file,
and to run it I simply type its name.
[*ix Command Line]
$ myprog.py
What happens when I type the above?
The shell looks at the beginning of the file,
sees the shebang,
and effectively types “python3 myprog.py
” for me.
Then python3
gets the file, looks at the first line
and—sees a comment,
since Python comments begin with “#
”
and continue to the end of the line.
Many dynamic languages begin comments with “#
”
for this reason.
And, among those that do not,
virtually all will ignore the first line of a source file,
if it begins with a shebang.
Note that, while there are many ways to execute Python programs, the shebang convention will not interfere with the others, since the shebang line is a Python comment.
Now,
the above process actually has one problem.
Some other system might place the Python 3 JIT compiler in a different
directory.
Then the shebang line will point to the wrong place,
and my file will not be executed.
For this reason, the env
program was written.
The job of env
is to know whether all the other commands are.
So now I begin my file with:
[Python Script]
#!/usr/bin/env python3
To execute, I type the filename as before.
The shell hands the file to env
.
Then env
hands it to python3
.
Then python3
starts compiling.
It skips the shebang line, since it begins with “#
”,
and moves on to the rest of the program.
If there is a compilation error, then everything comes crashing down.
If not, then the program executes.
That is all good and well, of course, but you may note that there remains one sticky issue.
Q: For all that to work,
env
has to be in/usr/bin
. What if it is in some other directory?A: It isn’t.
Q: But what if—
A: IT ISN’T!!!
Q: Okay, okay. Geez ....
As mentioned above, Python has a standard REPL.
At the *ix command line,
we get to this by running python2
with no parameters.
[*ix Command Line]
$ python3
After some version information, we get the REPL prompt,
which, by convention, is “>>>
”.
As with Haskell, we can type expressions to see their values.
But we can type statements as well.
The prompt changes to “...
” when
an expression or statement continues to another line.
[Interactive Python 3]
>>> 3+5 8 >>> (3+ ... 5 ... ) 8 >>> "abc" + "def" 'abcdef' >>> x = 4+5 >>> x 9 >>> sin(1) Traceback (most recent call last): File "", line 1, in NameError: name 'sin' is not defined >>> import math >>> math.sin(1) 0.8414709848078965 >>> quit()
To exit the REPL, type “quit()
”,
or send EOF (<Ctrl-D> in most *ix shells).
To run our own program, do an import in the REPL, leaving off the filename suffix.
[Interactive Python 3]
>>> import myprog
If the file contains commands at global scope, then these will execute.
Alternatively, if the source file contains only function definitions,
import as above,
and then run any function defined in the file
by prefacing it with “FILENAME.
”.
Say I have defined a function foo
in file myprog.py
.
[Interactive Python 3]
>>> import myprog >>> myprog.foo()
Alternatively, do
“from
FILENAME import *
”.
Then you can run functions without the namespace prefix.
[Interactive Python 3]
>>> from myprog import * >>> foo()
I am not fond of this last method,
for the same reason I am not fond of
“using namespace std;
” in C++.
Like C++ programs, a Python program consists of statements,
one after another,
with execution modified by flow-of-control constructions.
Lexical conventions are similar to those of C++,
except that a newline ends a statement, where possible.
A comment begins with “#
”
and continues to the end of the line.
Variables are declared simply by assigning to them.
The command to print is function print
.
Function parameters are placed in parentheses,
and separated by commas.
[Python 3]
# Some commands print("Hello") x = 3+4 print("x =", x)
By default the output of print
separates parameters with blanks and ends with a newline.
These can be modified with the optional named parameters
sep
and end
.
[Python 3]
print("abc", "def") # Below is same as above print("abc ", "def", sep="", end="") # Below is also the same print("abc def\n", end="") # Again, the same print("abc def")
Since only values have types, we may assign a variable to a value of a different type than that of its current value.
[Python 3]
x = 1 print("An integer:", x) x = "abc" print("A string:", x)
Define a function with def
.
Then give the name of the function,
its parameter list in parentheses (without types!),
and then a colon.
The body of the function should follow, indented.
Any indentation amount works,
but 4 characters is the official Python standard.
[Python 3]
def print_sum(a, b): c = a + b print("The sum is:", c)
Return values to the caller with return
, as in C++.
However, no return type is declared.
[Python 3]
def f1(x): return x+1 def f2(x): return 2*f1(x)
We can document the above functions with docstrings and doctests.
[Python 3]
def f1(x): """Return param + 1. >>> f1(5) 6 """ return x+1 def f2(x): """Return 2 * param + 2. >>> f2(5) 11 """ return 2*f1(x)
The lines between the triple double quotes are the docstrings. These can be read by automated documentation-generating programs (see Pydoc).
Inside the docstrings.
the lines beginning with “>>>
”
are the doctests.
These are essentially a picture of an interactive
Python session.
Standard library module doctest
(look it up)
allows execution of these and checking the results
against the given values.
The result is both documentation and test.
Here is an example of a lazy-list function, called a generator.
[Python 3]
def fibo_gen(): """ Generator. Yield Fibonacci numbers: 0,1,1,2,...""" prev = 1 curr = 0 while True: yield curr new = prev + curr prev = curr curr = new
Note the yield
,
which sends a value back to the caller,
but also saves the current state so that we can pick up where
we left off if the caller wants more.
Here is one way to use the above generator.
[Python 3]
count = 0 for f in fibo_gen(): # Iterate through values yielded by fibo_gen print(f) count += 1 # Python has no "++" if count == 20: break
Here is a more “pythonic” way to do the same thing.
[Python 3]
# Must import itertools # itertools.islice is like Haskell's "take" for f in itertools.islice(fibo_get(), 20): print(f)
See
python_intro.py
for Python source code related to today’s lecture.
ggchappell@alaska.edu