CS 331 Spring 2013  >  Assignment 2

CS 331 Spring 2013
Assignment 2

Assignment 2 is due at 5 p.m. Tuesday, February 12. It is worth 25 points.


E-mail answers to the exercises below to ggchappell@alaska.edu, using the subject “PA2”.

Exercises (25 pts total)

Exercise A — Lexer Class


In this exercise, you will write a C++ class that does lexical analysis.

In the next assignment, you will be building a parser on top of your lexer.


Implement a C++ class that performs lexical analysis, according to the Lexeme Description found later in this document.

The following properties of class Lex should hold for Lex2 as well.


Lexeme Description

No lexeme contains a whitespace character (blank, tab, vertical-tab, new-line, carriage-return, form-feed). A whitespace character, or any contiguous group of whitespace characters, is considered as a separator between lexemes. However, pairs of lexemes are not required to be separated by whitespace.

Comments are old-fashioned C-style comments only. They begin with “/*” and end with either “*/” or the end of the input. Any characters at all may occur inside a comment. Note that “/*/” is not a complete comment. It does begin a comment, which may end with a later “*/”.

Legal characters outside comments are whitespace and printable ASCII (values 32 [blank] to 126 [tilde]). Any other characters outside comments are illegal.

Once a lexeme has begun, the complete lexeme is considered to be the longest substring beginning from the starting point that can be interpreted as a lexeme, except in the following special case. If the lexeme being read is not the first lexeme in the input, and the previous lexeme was an Identifier, Integer, or Float, and the current lexeme begins with either “+” or “-”, then a single-character operator is returned (“+” or “-”, as appropriate). Otherwise, the above longest-lexeme rule is followed.

There are six tokens: Keyword, Identifier, Operator, Punctuation, Integer, Float.

One of “begin”, “end”, “print”.
Begins with letter (upper- or lower-case) or underscore (“_”), contains only letters, digits, underscores, and is not a Keyword.
One of “+”, “-”, “*”, “/”, “=”, “==”, “!=”.
Any single legal character that is not a letter, digit, underscore (“_”), or whitespace. And is not an Operator.
A sequence of one or more digits, with an optional “+” or “-” at the beginning. Note that no decimal point is allowed.
A sequence of one or more digits and exactly one decimal point (which may be anywhere in the sequence), with an optional “+” or “-” at the beginning, and an optional exponent at the end. An exponent is the letter “e” or “E” followed by a sequence of characters that meets the requirements for an Integer. For example, here are some valid Float lexemes.
1.  +.23  1.23e+37  -00.0E00

The following are not valid Float lexemes.

123  e  +.  1.23e+  -00..00

Note: The first string above is an Integer, and the second is an Identifier, while the last two begin with valid Float lexemes.

Test Program

A test program is available: lex2_test.cpp. If you compile and run your package with this program—unmodified—then it will test whether your package works properly.

Do not turn in lex2_test.cpp.

Coding Standards

The following are standards for all programming assignments in this class.

The above requirement is absolute; if your code does not compile, then there is no point in turning it in.

In addition, to receive full credit, submitted code should satisfy the following conditions.

CS 331 Spring 2013: Assignment 2 / Updated: 6 Feb 2013 / Glenn G. Chappell / ggchappell@alaska.edu