|CS 331 Spring 2013 > Assignment 2|
Assignment 2 is due at 5 p.m. Tuesday, February 12. It is worth 25 points.
using the subject
lex2.cpp, from Exercise A. The two files (or a single archive file containing them) should be attached to your e-mail message.
In this exercise, you will write a C++ class that does lexical analysis.
In the next assignment, you will be building a parser on top of your lexer.
Implement a C++ class that performs lexical analysis, according to the Lexeme Description found later in this document.
Lex2, and implement it in files
Lex2is essentially the same as that of
Lex, which was written in class (with some differences, to be listed shortly). In particular,
Lex2differs from that of
Tokenhas possible values
FLOAT, represting a past-the-end lexeme, an illegal character, and the six kinds of lexemes described below.
Lex2should have no public members other than those listed above. You may write any private members you want. Similarly, the only things declared in
lex2.hshould be class
Lex2and its members. You may declare anything you want in
The following properties of class
should hold for
Lex2 as well.
current, with no intervening calls to
set, should return the same lexeme.
set, should give the same results as constructing the object with that string.
Lex. Aside from sticking
2s in the names of things, you might only need to modify functions
skipSpace, and type
Lex. Indeed, feel free to add a bit of modularity to the gigantic hunk of code making up function
No lexeme contains a whitespace character (blank, tab, vertical-tab, new-line, carriage-return, form-feed). A whitespace character, or any contiguous group of whitespace characters, is considered as a separator between lexemes. However, pairs of lexemes are not required to be separated by whitespace.
Comments are old-fashioned C-style comments only.
They begin with “
and end with either “
*/” or the end of the input.
Any characters at all may occur inside a comment.
Note that “
is not a complete comment.
It does begin a comment, which may end with a later
Legal characters outside comments are whitespace and printable ASCII (values 32 [blank] to 126 [tilde]). Any other characters outside comments are illegal.
Once a lexeme has begun, the complete lexeme is considered to be
the longest substring beginning from the starting point
that can be interpreted as a lexeme,
except in the following special case.
If the lexeme being read is not the first lexeme in the input,
and the previous lexeme was
an Identifier, Integer, or Float,
and the current lexeme begins with either
then a single-character operator is returned
Otherwise, the above longest-lexeme rule is followed.
There are six tokens: Keyword, Identifier, Operator, Punctuation, Integer, Float.
_”), contains only letters, digits, underscores, and is not a Keyword.
_”), or whitespace. And is not an Operator.
+” or “
-” at the beginning. Note that no decimal point is allowed.
+” or “
-” at the beginning, and an optional exponent at the end. An exponent is the letter “
e” or “
E” followed by a sequence of characters that meets the requirements for an Integer. For example, here are some valid Float lexemes.
1. +.23 1.23e+37 -00.0E00
The following are not valid Float lexemes.
123 e +. 1.23e+ -00..00
Note: The first string above is an Integer, and the second is an Identifier, while the last two begin with valid Float lexemes.
A test program is available:
If you compile and run your package
with this program—unmodified—then
it will test whether your package works properly.
Do not turn in
The following are standards for all programming assignments in this class.
The above requirement is absolute; if your code does not compile, then there is no point in turning it in.
In addition, to receive full credit, submitted code should satisfy the following conditions.