# CS 331 Spring 2013 Assignment 2

Assignment 2 is due at 5 p.m. Tuesday, February 12. It is worth 25 points.

## Procedures

E-mail answers to the exercises below to ggchappell@alaska.edu, using the subject “PA2”.

• Your answers should consist of two files: lex2.h and lex2.cpp, from Exercise A. The two files (or a single archive file containing them) should be attached to your e-mail message.
• I may not read your homework e-mail immediately. If you wish to discuss the assignment (or anything else) with me, send me a separate message with a different subject line.

## Exercises (25 pts total)

### Exercise A — Lexer Class

#### Purpose

In this exercise, you will write a C++ class that does lexical analysis.

In the next assignment, you will be building a parser on top of your lexer.

#### Instructions

Implement a C++ class that performs lexical analysis, according to the Lexeme Description found later in this document.

• Name your class Lex2, and implement it in files lex2.h and lex2.cpp.
• The interface of class Lex2 is essentially the same as that of Lex, which was written in class (with some differences, to be listed shortly). In particular, Lex2 has:
• Member types Token and Lexeme
• A constructor taking an optional string object.
• Copy constructor, copy assignment operator, and destructor, as usual.
• Public member functions set, done, advance, current.
• The interface of Lex2 differs from that of Lex as follows:
• Lexing is done based on the Lexeme Description in this document, not the one distributed in class.
• Member type Token has possible values NONE, ILLEGAL, KEY, ID, OP, PUNCT, INT, and FLOAT, represting a past-the-end lexeme, an illegal character, and the six kinds of lexemes described below.
• Class Lex2 should have no public members other than those listed above. You may write any private members you want. Similarly, the only things declared in lex2.h should be class Lex2 and its members. You may declare anything you want in lex2.cpp.

The following properties of class Lex should hold for Lex2 as well.

• Repeated calls to current, with no intervening calls to advance or set, should return the same lexeme.
• A past-the-end lexeme should be returned as Lexeme("", NONE).
• If an illegal character is found outside a comment, then it should be returned as a one-character lexeme with token ILLEGAL.
• A default-constructed lexer object should be the same as one constructed with an empty string.
• Default-constructing a lexer object, and then passing a string to set, should give the same results as constructing the object with that string.

#### Hints

• The longest-lexeme rule does not always apply here. Your lexer needs to behave differently, based on the previous lexeme.
• Be careful when reading the first lexeme in the input. At this point, there is no previous lexeme.
• You may wish to begin from class Lex. Aside from sticking 2s in the names of things, you might only need to modify functions advance and skipSpace, and type Token.
• However, you are not required to keep the same internal design as class Lex. Indeed, feel free to add a bit of modularity to the gigantic hunk of code making up function advance.

#### Lexeme Description

No lexeme contains a whitespace character (blank, tab, vertical-tab, new-line, carriage-return, form-feed). A whitespace character, or any contiguous group of whitespace characters, is considered as a separator between lexemes. However, pairs of lexemes are not required to be separated by whitespace.

Comments are old-fashioned C-style comments only. They begin with “/*” and end with either “*/” or the end of the input. Any characters at all may occur inside a comment. Note that “/*/” is not a complete comment. It does begin a comment, which may end with a later “*/”.

Legal characters outside comments are whitespace and printable ASCII (values 32 [blank] to 126 [tilde]). Any other characters outside comments are illegal.

Once a lexeme has begun, the complete lexeme is considered to be the longest substring beginning from the starting point that can be interpreted as a lexeme, except in the following special case. If the lexeme being read is not the first lexeme in the input, and the previous lexeme was an Identifier, Integer, or Float, and the current lexeme begins with either “+” or “-”, then a single-character operator is returned (“+” or “-”, as appropriate). Otherwise, the above longest-lexeme rule is followed.

There are six tokens: Keyword, Identifier, Operator, Punctuation, Integer, Float.

Keyword
One of “begin”, “end”, “print”.
Identifier
Begins with letter (upper- or lower-case) or underscore (“_”), contains only letters, digits, underscores, and is not a Keyword.
Operator
One of “+”, “-”, “*”, “/”, “=”, “==”, “!=”.
Punctuation
Any single legal character that is not a letter, digit, underscore (“_”), or whitespace. And is not an Operator.
Integer
A sequence of one or more digits, with an optional “+” or “-” at the beginning. Note that no decimal point is allowed.
Float
A sequence of one or more digits and exactly one decimal point (which may be anywhere in the sequence), with an optional “+” or “-” at the beginning, and an optional exponent at the end. An exponent is the letter “e” or “E” followed by a sequence of characters that meets the requirements for an Integer. For example, here are some valid Float lexemes.
1.  +.23  1.23e+37  -00.0E00

The following are not valid Float lexemes.

123  e  +.  1.23e+  -00..00

Note: The first string above is an Integer, and the second is an Identifier, while the last two begin with valid Float lexemes.

#### Test Program

A test program is available: lex2_test.cpp. If you compile and run your package with this program—unmodified—then it will test whether your package works properly.

Do not turn in lex2_test.cpp.

## Coding Standards

The following are standards for all programming assignments in this class.

• Code must compile & execute using a standard-conforming compiler.
• When a test program is given, the code must compile & execute with the test program.

The above requirement is absolute; if your code does not compile, then there is no point in turning it in.

In addition, to receive full credit, submitted code should satisfy the following conditions.

• Code should be neat and readable.
• Code should conform to standard conventions (e.g., regarding the use of header and source files, const-correctness, etc.).
• Comments should be included, indicating filename, authorship, and last revision date of each file, as well as the purpose of each file and each module that is larger than a function (e.g., a C++ class).
• All comments in the code should be correct.
• Code should actually perform the required computations; hacks that only work with the exact input given by the test program will not be counted as fully correct.

