CS 331 Spring 2011 > Lecture Notes for Monday, January 24, 2011 |
A language is a collection of strings. The characters in the strings all lie in some alphabet.
The kind of language (in the above sense) that we will be primarily interested in is the set of all syntactically correct programs in some programming langugage.
How do we describe a language? We can use a generator or a recognizer. A generator is a way of producing all strings in the language. A recognizer is a way of determining whether a given string lies in the language.
We typically find generators easier to write, but recognizers more useful (every compiler must have one, right?). Often we write a generator, and then have a program use it to write a recognizer.
A phrase-structure grammar (usually just grammar) is one form of language generator.
In a grammar, we have a collection of terminal symbols; this is our alphabet. We also have a collection of nonterminal symbols; these are like variables that eventually turn into something else. One nonterminal symbol is the start symbol.
I will use lower-case letters for terminals, upper-case letters for nonterminals, and “S” for the start symbol. An epsilon (“ε”) will represent the empty string.
We also have one or more productions, which are rules for turning some string into some other string. Starting from the start symbol, repeatedly applying rules, and ending with a string consisting only of terminals is a derivation of the final string.
For example, here is a grammar.
S → A
A → AA
A → ε
A → bcd
Here is a derivation, based on the above grammar.
S
A
AA
AAA
AbcdA
bcdbcdA
bcdbcd
Thus, the string “bcdbcd” lies in the language generated by the above grammar.
In the late 1950s, linguist Noam Chomsky published a hierarchy of types of languages, defined in terms of the kinds of grammars that could describe them. Chomsky was trying to develop a framework for studying natural languages (e.g., English); however, his hierarchy has proved to be useful in the theoretical study of computing.
Below is the hierarchy. In the Grammar column, upper-case letters represent nonterminal symbols, while lower-case letters represent terminal symbols.
Language Category | Rules Allowed in Grammar (Generator) |
Recognizer | Why We Care | |
---|---|---|---|---|
Chomsky’s Number | Name | |||
Type 3 | Regular |
|
Finite Automaton
Think: Program that uses a small, fixed amount of memory |
|
Type 2 | Context Free |
|
Nondeterministic Push-Down Automaton Think: Finite Automaton + Stack (roughly) |
|
Type 1 | Context Sensitive |
|
Don’t worry about it (“Linear Bounded Automaton”, if you must know) |
|
Type 0 | Recursively Enumerable |
|
Turing Machine Think: Computer |
|
Notes
ggchappell@alaska.edu